lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021
@ 2021-11-28 23:27 James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash James Simmons
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Backport patches from the OpenSFS to the native Linux lustre client.
Track down a bug unique to Linux client exposed with sanity-lnet
test 212.

Alex Zhuravlev (2):
  lnet: libcfs: add timeout to cfs_race() to fix race
  lustre: ptlrpc: fix timeout after spurious wakeup

Andreas Dilger (1):
  lustre: ptlrpc: remove bogus LASSERT

Andriy Skulysh (1):
  lustre: llite: skip request slot for lmv_revalidate_slaves()

Bobi Jam (2):
  lustre: llite: tighten condition for fault not drop mmap_sem
  lustre: llite: mend the trunc_sem_up_write()

Chris Horn (4):
  lnet: o2iblnd: map_on_demand not needed for frag interop
  lnet: o2iblnd: Fix logic for unaligned transfer
  lnet: Reset ni_ping_count only on receive
  lnet: Fail peer add for existing gw peer

James Simmons (2):
  lnet: fix delay rule crash
  lnet: Netlink improvements

Mr NeilBrown (4):
  lnet: change tp_nid to 16byte in lnet_test_peer.
  lnet: extend preferred nids in struct lnet_peer_ni
  lnet: switch to large lnet_processid for matching
  lnet: libcfs: separate daemon_list from cfs_trace_data

Patrick Farrell (1):
  lustre: llite: Do not count tiny write twice

Sebastien Buisson (1):
  lustre: quota: optimize capability check for root squash

Serguei Smirnov (1):
  lnet: set eth routes needed for multi rail

 fs/lustre/include/lustre_dlm.h             |   7 +-
 fs/lustre/include/obd.h                    |   1 +
 fs/lustre/ldlm/ldlm_request.c              |  18 ++-
 fs/lustre/llite/file.c                     |   2 -
 fs/lustre/llite/llite_internal.h           |   2 +
 fs/lustre/llite/llite_mmap.c               |  13 +-
 fs/lustre/llite/statahead.c                |   1 +
 fs/lustre/lmv/lmv_intent.c                 |   2 +
 fs/lustre/mdc/mdc_dev.c                    |   3 +-
 fs/lustre/mdc/mdc_locks.c                  |   5 +-
 fs/lustre/osc/osc_cache.c                  |  23 ++--
 fs/lustre/osc/osc_request.c                |   2 +-
 fs/lustre/ptlrpc/niobuf.c                  |  32 +++--
 fs/lustre/ptlrpc/ptlrpcd.c                 |  11 +-
 include/linux/libcfs/libcfs_fail.h         |  10 +-
 include/linux/lnet/api.h                   |   2 +-
 include/linux/lnet/lib-lnet.h              |  29 ++--
 include/linux/lnet/lib-types.h             |  18 ++-
 include/uapi/linux/lnet/lnet-nl.h          |  29 +++-
 include/uapi/linux/lnet/nidstr.h           |   3 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c           |   2 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h           |   6 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c        |  31 ++---
 net/lnet/klnds/socklnd/socklnd_modparams.c |   5 +
 net/lnet/libcfs/tracefile.c                | 213 +++++++++++++++--------------
 net/lnet/libcfs/tracefile.h                |  17 +--
 net/lnet/lnet/api-ni.c                     |  37 ++---
 net/lnet/lnet/lib-me.c                     |   4 +-
 net/lnet/lnet/lib-move.c                   |  86 ++++++------
 net/lnet/lnet/lib-msg.c                    |   6 +-
 net/lnet/lnet/lib-ptl.c                    |  45 +++---
 net/lnet/lnet/net_fault.c                  |  16 +--
 net/lnet/lnet/nidstrings.c                 |   9 +-
 net/lnet/lnet/peer.c                       | 111 +++++++++------
 net/lnet/lnet/udsp.c                       |  38 ++---
 net/lnet/selftest/rpc.c                    |  10 +-
 36 files changed, 473 insertions(+), 376 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 02/19] lnet: change tp_nid to 16byte in lnet_test_peer James Simmons
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

The following crash was captured in testing:

LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) ASSERTION( list_empty(&rule->dl_sched_link) ) failed:
LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) LBUG
Pid: 25912, comm: lnet_dd 5.7.0-rc7+ #1 SMP PREEMPT Fri Aug 20 16:17:11 EDT 2021
Call Trace:
libcfs_call_trace+0x62/0x80 [libcfs]
lbug_with_loc+0x41/0xa0 [libcfs]
delay_rule_decref+0x6e/0xe0 [lnet]
lnet_delay_rule_check+0x65/0x110 [lnet]
lnet_delay_rule_daemon+0x76/0x120 [lnet]

The fix is revert the list changes in lnet_delay_rule_check().

Fixes: da4bdd3701 ("lustre: use list_first_entry() in lnet/lnet subdirectory.")
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/net_fault.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index 06366df..02fc1ae 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -744,15 +744,15 @@ struct delay_daemon_data {
 			break;
 
 		spin_lock_bh(&delay_dd.dd_lock);
-		rule = list_first_entry_or_null(&delay_dd.dd_sched_rules,
-						struct lnet_delay_rule,
-						dl_sched_link);
-		if (!rule)
-			list_del_init(&rule->dl_sched_link);
-		spin_unlock_bh(&delay_dd.dd_lock);
-
-		if (!rule)
+		if (list_empty(&delay_dd.dd_sched_rules)) {
+			spin_unlock_bh(&delay_dd.dd_lock);
 			break;
+		}
+
+		rule = list_entry(delay_dd.dd_sched_rules.next,
+				  struct lnet_delay_rule, dl_sched_link);
+		list_del_init(&rule->dl_sched_link);
+		spin_unlock_bh(&delay_dd.dd_lock);
 
 		delayed_msg_check(rule, false, &msgs);
 		delay_rule_decref(rule); /* -1 for delay_dd.dd_sched_rules */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 02/19] lnet: change tp_nid to 16byte in lnet_test_peer.
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 03/19] lnet: extend preferred nids in struct lnet_peer_ni James Simmons
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

This updates 'struct lnet_test_peer' to store a large address nid.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 7e89b556ea7dc4b4c ("LU-10391 lnet: change tp_nid to 16byte in lnet_test_peer.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43595
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h |  2 +-
 net/lnet/lnet/lib-move.c       | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 380a7b9..1901ad2 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -233,7 +233,7 @@ struct lnet_libmd {
 struct lnet_test_peer {
 	/* info about peers we are trying to fail */
 	struct list_head	tp_list;	/* ln_test_peers */
-	lnet_nid_t		tp_nid;		/* matching nid */
+	struct lnet_nid		tp_nid;		/* matching nid */
 	unsigned int		tp_threshold;	/* # failures to simulate */
 };
 
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 170d684..b9f5973 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -190,13 +190,15 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 }
 
 int
-lnet_fail_nid(lnet_nid_t nid, unsigned int threshold)
+lnet_fail_nid(lnet_nid_t nid4, unsigned int threshold)
 {
 	struct lnet_test_peer *tp;
 	struct list_head *el;
 	struct list_head *next;
+	struct lnet_nid nid;
 	LIST_HEAD(cull);
 
+	lnet_nid4_to_nid(nid4, &nid);
 	/* NB: use lnet_net_lock(0) to serialize operations on test peers */
 	if (threshold) {
 		/* Adding a new entry */
@@ -218,9 +220,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	list_for_each_safe(el, next, &the_lnet.ln_test_peers) {
 		tp = list_entry(el, struct lnet_test_peer, tp_list);
 
-		if (!tp->tp_threshold ||    /* needs culling anyway */
-		    nid == LNET_NID_ANY ||       /* removing all entries */
-		    tp->tp_nid == nid) {	  /* matched this one */
+		if (!tp->tp_threshold ||	/* needs culling anyway */
+		    LNET_NID_IS_ANY(&nid) ||	/* removing all entries */
+		    nid_same(&tp->tp_nid, &nid)) {	/* matched this one */
 			list_move(&tp->tp_list, &cull);
 		}
 	}
@@ -237,14 +239,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 }
 
 static int
-fail_peer(lnet_nid_t nid, int outgoing)
+fail_peer(lnet_nid_t nid4, int outgoing)
 {
 	struct lnet_test_peer *tp;
 	struct list_head *el;
 	struct list_head *next;
+	struct lnet_nid nid;
 	LIST_HEAD(cull);
 	int fail = 0;
 
+	lnet_nid4_to_nid(nid4, &nid);
 	/* NB: use lnet_net_lock(0) to serialize operations on test peers */
 	lnet_net_lock(0);
 
@@ -264,8 +268,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 			continue;
 		}
 
-		if (tp->tp_nid == LNET_NID_ANY || /* fail every peer */
-		    nid == tp->tp_nid) {	/* fail this peer */
+		if (LNET_NID_IS_ANY(&tp->tp_nid) ||	/* fail every peer */
+		    nid_same(&nid, &tp->tp_nid)) {	/* fail this peer */
 			fail = 1;
 
 			if (tp->tp_threshold != LNET_MD_THRESH_INF) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 03/19] lnet: extend preferred nids in struct lnet_peer_ni
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 02/19] lnet: change tp_nid to 16byte in lnet_test_peer James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 04/19] lnet: switch to large lnet_processid for matching James Simmons
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

union lpni_pref in struct lnet_peer_ni how has 'struct lnet_nid'
rather than lnet_nid_t.

Also, lnet_peer_ni_set_no_mr_pref_nid() allows the pref nid to be NULL
and is a no-op in that case.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 47cc77462343533b4 ("LU-10391 lnet: extend prefered nids in struct lnet_peer_ni")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43596
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h    | 25 +++++++----
 include/linux/lnet/lib-types.h   |  4 +-
 include/uapi/linux/lnet/nidstr.h |  3 +-
 net/lnet/lnet/api-ni.c           | 16 +++----
 net/lnet/lnet/lib-move.c         | 58 ++++++++++++------------
 net/lnet/lnet/nidstrings.c       |  9 ++--
 net/lnet/lnet/peer.c             | 95 +++++++++++++++++++++++-----------------
 net/lnet/lnet/udsp.c             | 38 ++++++++--------
 8 files changed, 138 insertions(+), 110 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 104c98d..fb2f42fcb 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -486,6 +486,7 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid,
 int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
 int lnet_nid2cpt(struct lnet_nid *nid, struct lnet_ni *ni);
 struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
+struct lnet_ni *lnet_nid_to_ni_locked(struct lnet_nid *nid, int cpt);
 struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid);
 struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt);
 struct lnet_ni *lnet_net2ni_addref(u32 net);
@@ -538,9 +539,11 @@ int lnet_get_peer_list(u32 *countp, u32 *sizep,
 extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni,
 						 struct list_head *queue,
 						 time64_t now);
-extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni,
+				  struct lnet_nid *nid);
 extern void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni);
-extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni,
+				  struct lnet_nid *nid);
 void lnet_peer_ni_set_selection_priority(struct lnet_peer_ni *lpni,
 					 u32 priority);
 extern void lnet_ni_add_to_recoveryq_locked(struct lnet_ni *ni,
@@ -565,7 +568,7 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src,
 int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
 struct lnet_net *lnet_get_net_locked(u32 net_id);
 void lnet_net_clr_pref_rtrs(struct lnet_net *net);
-int lnet_net_add_pref_rtr(struct lnet_net *net, lnet_nid_t gw_nid);
+int lnet_net_add_pref_rtr(struct lnet_net *net, struct lnet_nid *gw_nid);
 
 int lnet_islocalnid4(lnet_nid_t nid);
 int lnet_islocalnid(struct lnet_nid *nid);
@@ -838,6 +841,9 @@ struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer,
 						  struct lnet_peer_ni *prev);
 struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref,
 					    int cpt);
+struct lnet_peer_ni *lnet_peerni_by_nid_locked(struct lnet_nid *nid,
+					       struct lnet_nid *pref,
+					       int cpt);
 struct lnet_peer_ni *lnet_nid2peerni_ex(struct lnet_nid *nid, int cpt);
 struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp,
 					     lnet_nid_t nid);
@@ -859,13 +865,16 @@ struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp,
 void lnet_debug_peer(lnet_nid_t nid);
 struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer,
 					       u32 net_id);
-bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid);
-int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni,
+				  struct lnet_nid *nid);
+int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid);
 void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni);
-bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, lnet_nid_t gw_nid);
+bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni,
+				  struct lnet_nid *gw_nid);
 void lnet_peer_clr_pref_rtrs(struct lnet_peer_ni *lpni);
-int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, lnet_nid_t nid);
-int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, struct lnet_nid *nid);
+int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni,
+				     struct lnet_nid *nid);
 int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr, bool temp);
 int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid);
 int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 1901ad2..bde7249 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -568,7 +568,7 @@ struct lnet_ping_buffer {
 
 struct lnet_nid_list {
 	struct list_head nl_list;
-	lnet_nid_t nl_nid;
+	struct lnet_nid nl_nid;
 };
 
 struct lnet_peer_ni {
@@ -635,7 +635,7 @@ struct lnet_peer_ni {
 	time64_t		lpni_last_alive;
 	/* preferred local nids: if only one, use lpni_pref.nid */
 	union lpni_pref {
-		lnet_nid_t	 nid;
+		struct lnet_nid	nid;
 		struct list_head nids;
 	} lpni_pref;
 	/* list of router nids preferred for this peer NI */
diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h
index bfc9644..482cfb2 100644
--- a/include/uapi/linux/lnet/nidstr.h
+++ b/include/uapi/linux/lnet/nidstr.h
@@ -108,7 +108,8 @@ static inline char *libcfs_nidstr(const struct lnet_nid *nid)
 int cfs_parse_nidlist(char *str, int len, struct list_head *list);
 int cfs_print_nidlist(char *buffer, int count, struct list_head *list);
 int cfs_match_nid(lnet_nid_t nid, struct list_head *list);
-int cfs_match_nid_net(lnet_nid_t nid, __u32 net, struct list_head *net_num_list,
+int cfs_match_nid_net(struct lnet_nid *nid, __u32 net,
+		       struct list_head *net_num_list,
 		      struct list_head *addr);
 int cfs_match_net(__u32 net_id, __u32 net_type,
 		  struct list_head *net_num_list);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 0f4feda..340cc84e 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1388,7 +1388,7 @@ struct lnet_net *
 
 int
 lnet_net_add_pref_rtr(struct lnet_net *net,
-		      lnet_nid_t gw_nid)
+		      struct lnet_nid *gw_nid)
 __must_hold(&the_lnet.ln_api_mutex)
 {
 	struct lnet_nid_list *ne;
@@ -1399,7 +1399,7 @@ struct lnet_net *
 	 * lock.
 	 */
 	list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
-		if (ne->nl_nid == gw_nid)
+		if (nid_same(&ne->nl_nid, gw_nid))
 			return -EEXIST;
 	}
 
@@ -1407,7 +1407,7 @@ struct lnet_net *
 	if (!ne)
 		return -ENOMEM;
 
-	ne->nl_nid = gw_nid;
+	ne->nl_nid = *gw_nid;
 
 	/* Lock the cpt to protect against addition and checks in the
 	 * selection algorithm
@@ -1420,11 +1420,11 @@ struct lnet_net *
 }
 
 bool
-lnet_net_is_pref_rtr_locked(struct lnet_net *net, lnet_nid_t rtr_nid)
+lnet_net_is_pref_rtr_locked(struct lnet_net *net, struct lnet_nid *rtr_nid)
 {
 	struct lnet_nid_list *ne;
 
-	CDEBUG(D_NET, "%s: rtr pref emtpy: %d\n",
+	CDEBUG(D_NET, "%s: rtr pref empty: %d\n",
 	       libcfs_net2str(net->net_id),
 	       list_empty(&net->net_rtr_pref_nids));
 
@@ -1433,9 +1433,9 @@ struct lnet_net *
 
 	list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
 		CDEBUG(D_NET, "Comparing pref %s with gw %s\n",
-		       libcfs_nid2str(ne->nl_nid),
-		       libcfs_nid2str(rtr_nid));
-		if (rtr_nid == ne->nl_nid)
+		       libcfs_nidstr(&ne->nl_nid),
+		       libcfs_nidstr(rtr_nid));
+		if (nid_same(rtr_nid, &ne->nl_nid))
 			return true;
 	}
 
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index b9f5973..2f7c37d 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1137,9 +1137,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 * preferred, then let's use it
 		 */
 		if (best_ni) {
-			/* FIXME need to handle large-addr nid */
 			lpni_is_preferred = lnet_peer_is_pref_nid_locked(lpni,
-									 lnet_nid_to_nid4(&best_ni->ni_nid));
+									 &best_ni->ni_nid);
 			CDEBUG(D_NET, "%s lpni_is_preferred = %d\n",
 			       libcfs_nidstr(&best_ni->ni_nid),
 			       lpni_is_preferred);
@@ -1318,7 +1317,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	struct lnet_route *route;
 	int rc;
 	bool best_rte_is_preferred = false;
-	lnet_nid_t gw_pnid;
+	struct lnet_nid *gw_pnid;
 
 	CDEBUG(D_NET, "Looking up a route to %s, from %s\n",
 	       libcfs_net2str(rnet->lrn_net), libcfs_net2str(src_net));
@@ -1328,7 +1327,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
 		if (!lnet_is_route_alive(route))
 			continue;
-		gw_pnid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid);
+		gw_pnid = &route->lr_gateway->lp_primary_nid;
 
 		/* no protection on below fields, but it's harmless */
 		if (last_route && (last_route->lr_seq - route->lr_seq < 0))
@@ -1352,7 +1351,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 			if (!lpni) {
 				CDEBUG(D_NET,
 				       "Gateway %s does not have a peer NI on net %s\n",
-				       libcfs_nid2str(gw_pnid),
+				       libcfs_nidstr(gw_pnid),
 				       libcfs_net2str(src_net));
 				continue;
 			}
@@ -1368,7 +1367,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 			best_gw_ni = lpni;
 			best_rte_is_preferred = true;
 			CDEBUG(D_NET, "preferred gw = %s\n",
-			       libcfs_nid2str(gw_pnid));
+			       libcfs_nidstr(gw_pnid));
 			continue;
 		} else if ((!rc) && best_rte_is_preferred)
 			/* The best route we found so far is in the preferred
@@ -1397,7 +1396,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		if (!lpni) {
 			CDEBUG(D_NET,
 			       "Gateway %s does not have a peer NI on net %s\n",
-			       libcfs_nid2str(gw_pnid),
+			       libcfs_nidstr(gw_pnid),
 			       libcfs_net2str(src_net));
 			continue;
 		}
@@ -1789,8 +1788,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		CDEBUG(D_NET, "Setting preferred local NID %s on NMR peer %s\n",
 		       libcfs_nidstr(&lni->ni_nid),
 		       libcfs_nidstr(&lpni->lpni_nid));
-		lnet_peer_ni_set_non_mr_pref_nid(lpni,
-						 lnet_nid_to_nid4(&lni->ni_nid));
+		lnet_peer_ni_set_non_mr_pref_nid(lpni, &lni->ni_nid);
 	}
 }
 
@@ -2314,7 +2312,8 @@ struct lnet_ni *
 		if (lpni_entry->lpni_pref_nnids == 0)
 			continue;
 		LASSERT(lpni_entry->lpni_pref_nnids == 1);
-		best_ni = lnet_nid2ni_locked(lpni_entry->lpni_pref.nid, cpt);
+		best_ni = lnet_nid_to_ni_locked(&lpni_entry->lpni_pref.nid,
+						cpt);
 		break;
 	}
 
@@ -4208,7 +4207,7 @@ void lnet_monitor_thr_stop(void)
 }
 
 int
-lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
+lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid4,
 	   void *private, int rdma_req)
 {
 	struct lnet_peer_ni *lpni;
@@ -4217,6 +4216,7 @@ void lnet_monitor_thr_stop(void)
 	lnet_pid_t dest_pid;
 	lnet_nid_t dest_nid;
 	lnet_nid_t src_nid;
+	struct lnet_nid from_nid;
 	bool push = false;
 	int for_me;
 	u32 type;
@@ -4225,6 +4225,8 @@ void lnet_monitor_thr_stop(void)
 
 	LASSERT(!in_interrupt());
 
+	lnet_nid4_to_nid(from_nid4, &from_nid);
+
 	type = le32_to_cpu(hdr->type);
 	src_nid = le64_to_cpu(hdr->src_nid);
 	dest_nid = le64_to_cpu(hdr->dest_nid);
@@ -4233,7 +4235,7 @@ void lnet_monitor_thr_stop(void)
 
 	/* FIXME handle large-addr nids */
 	for_me = (lnet_nid_to_nid4(&ni->ni_nid) == dest_nid);
-	cpt = lnet_cpt_of_nid(from_nid, ni);
+	cpt = lnet_cpt_of_nid(from_nid4, ni);
 
 	CDEBUG(D_NET, "TRACE: %s(%s) <- %s : %s\n",
 	       libcfs_nid2str(dest_nid),
@@ -4246,7 +4248,7 @@ void lnet_monitor_thr_stop(void)
 	case LNET_MSG_GET:
 		if (payload_length > 0) {
 			CERROR("%s, src %s: bad %s payload %d (0 expected)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       lnet_msgtyp2str(type), payload_length);
 			return -EPROTO;
@@ -4258,7 +4260,7 @@ void lnet_monitor_thr_stop(void)
 		if (payload_length >
 		   (u32)(for_me ? LNET_MAX_PAYLOAD : LNET_MTU)) {
 			CERROR("%s, src %s: bad %s payload %d (%d max expected)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       lnet_msgtyp2str(type),
 			       payload_length,
@@ -4269,7 +4271,7 @@ void lnet_monitor_thr_stop(void)
 
 	default:
 		CERROR("%s, src %s: Bad message type 0x%x\n",
-		       libcfs_nid2str(from_nid),
+		       libcfs_nid2str(from_nid4),
 		       libcfs_nid2str(src_nid), type);
 		return -EPROTO;
 	}
@@ -4296,7 +4298,7 @@ void lnet_monitor_thr_stop(void)
 		if (LNET_NIDNET(dest_nid) == LNET_NID_NET(&ni->ni_nid)) {
 			/* should have gone direct */
 			CERROR("%s, src %s: Bad dest nid %s (should have been sent direct)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       libcfs_nid2str(dest_nid));
 			return -EPROTO;
@@ -4308,7 +4310,7 @@ void lnet_monitor_thr_stop(void)
 			 * this node's NID on its own network
 			 */
 			CERROR("%s, src %s: Bad dest nid %s (it's my nid but on a different network)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       libcfs_nid2str(dest_nid));
 			return -EPROTO;
@@ -4316,7 +4318,7 @@ void lnet_monitor_thr_stop(void)
 
 		if (rdma_req && type == LNET_MSG_GET) {
 			CERROR("%s, src %s: Bad optimized GET for %s (final destination must be me)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       libcfs_nid2str(dest_nid));
 			return -EPROTO;
@@ -4324,7 +4326,7 @@ void lnet_monitor_thr_stop(void)
 
 		if (!the_lnet.ln_routing) {
 			CERROR("%s, src %s: Dropping message for %s (routing not enabled)\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid),
 			       libcfs_nid2str(dest_nid));
 			goto drop;
@@ -4338,7 +4340,7 @@ void lnet_monitor_thr_stop(void)
 	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
 	    fail_peer(src_nid, 0)) {		/* shall we now? */
 		CERROR("%s, src %s: Dropping %s to simulate failure\n",
-		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4347,7 +4349,7 @@ void lnet_monitor_thr_stop(void)
 	if (!list_empty(&the_lnet.ln_drop_rules) &&
 	    lnet_drop_rule_match(hdr, lnet_nid_to_nid4(&ni->ni_nid), NULL)) {
 		CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n",
-		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
 		       libcfs_nid2str(dest_nid), lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4355,7 +4357,7 @@ void lnet_monitor_thr_stop(void)
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("%s, src %s: Dropping %s (out of memory)\n",
-		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4372,7 +4374,7 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_offset = 0;
 	msg->msg_hdr = *hdr;
 	/* for building message event */
-	msg->msg_from = from_nid;
+	msg->msg_from = from_nid4;
 	if (!for_me) {
 		msg->msg_target.pid = dest_pid;
 		msg->msg_target.nid = dest_nid;
@@ -4388,14 +4390,12 @@ void lnet_monitor_thr_stop(void)
 	}
 
 	lnet_net_lock(cpt);
-	/* FIXME support large-addr nid */
-	lpni = lnet_nid2peerni_locked(from_nid, lnet_nid_to_nid4(&ni->ni_nid),
-				      cpt);
+	lpni = lnet_peerni_by_nid_locked(&from_nid, &ni->ni_nid, cpt);
 	if (IS_ERR(lpni)) {
 		lnet_net_unlock(cpt);
 		rc = PTR_ERR(lpni);
 		CERROR("%s, src %s: Dropping %s (error %d looking up sender)\n",
-		       libcfs_nid2str(from_nid), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
 		       lnet_msgtyp2str(type), rc);
 		kfree(msg);
 		if (rc == -ESHUTDOWN)
@@ -4410,7 +4410,7 @@ void lnet_monitor_thr_stop(void)
 	 */
 	if (((lnet_drop_asym_route && for_me) ||
 	     !lpni->lpni_peer_net->lpn_peer->lp_alive) &&
-	    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) {
+	    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid4)) {
 		u32 src_net_id = LNET_NIDNET(src_nid);
 		struct lnet_peer *gw = lpni->lpni_peer_net->lpn_peer;
 		struct lnet_route *route;
@@ -4445,7 +4445,7 @@ void lnet_monitor_thr_stop(void)
 			 * => asymmetric routing detected but forbidden
 			 */
 			CERROR("%s, src %s: Dropping asymmetrical route %s\n",
-			       libcfs_nid2str(from_nid),
+			       libcfs_nid2str(from_nid4),
 			       libcfs_nid2str(src_nid), lnet_msgtyp2str(type));
 			kfree(msg);
 			goto drop;
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index d91815d..dfd6744 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -803,7 +803,7 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 }
 
 int
-cfs_match_nid_net(lnet_nid_t nid, u32 net_type,
+cfs_match_nid_net(struct lnet_nid *nid, u32 net_type,
 		  struct list_head *net_num_list,
 		  struct list_head *addr)
 {
@@ -813,15 +813,16 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 	if (!addr || !net_num_list)
 		return 0;
 
-	nf = type2net_info(LNET_NETTYP(LNET_NIDNET(nid)));
+	nf = type2net_info(LNET_NETTYP(LNET_NID_NET(nid)));
 	if (!nf || !net_num_list || !addr)
 		return 0;
 
-	address = LNET_NIDADDR(nid);
+	/* FIXME handle long-addr nid */
+	address = LNET_NIDADDR(lnet_nid_to_nid4(nid));
 
 	/* if either the address or net number don't match then no match */
 	if (!nf->nf_match_addr(address, addr) ||
-	    !cfs_match_net(LNET_NIDNET(nid), net_type, net_num_list))
+	    !cfs_match_net(LNET_NID_NET(nid), net_type, net_num_list))
 		return 0;
 
 	return 1;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 4b6f339..1853388 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -990,7 +990,7 @@ struct lnet_peer_ni *
  */
 bool
 lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni,
-			     lnet_nid_t gw_nid)
+			     struct lnet_nid *gw_nid)
 {
 	struct lnet_nid_list *ne;
 
@@ -1006,9 +1006,9 @@ struct lnet_peer_ni *
 	 */
 	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
 		CDEBUG(D_NET, "Comparing pref %s with gw %s\n",
-		       libcfs_nid2str(ne->nl_nid),
-		       libcfs_nid2str(gw_nid));
-		if (ne->nl_nid == gw_nid)
+		       libcfs_nidstr(&ne->nl_nid),
+		       libcfs_nidstr(gw_nid));
+		if (nid_same(&ne->nl_nid, gw_nid))
 			return true;
 	}
 
@@ -1037,7 +1037,7 @@ struct lnet_peer_ni *
 
 int
 lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni,
-		       lnet_nid_t gw_nid)
+		       struct lnet_nid *gw_nid)
 {
 	int cpt = lpni->lpni_cpt;
 	struct lnet_nid_list *ne = NULL;
@@ -1050,7 +1050,7 @@ struct lnet_peer_ni *
 	__must_hold(&the_lnet.ln_api_mutex);
 
 	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
-		if (ne->nl_nid == gw_nid)
+		if (nid_same(&ne->nl_nid, gw_nid))
 			return -EEXIST;
 	}
 
@@ -1058,7 +1058,7 @@ struct lnet_peer_ni *
 	if (!ne)
 		return -ENOMEM;
 
-	ne->nl_nid = gw_nid;
+	ne->nl_nid = *gw_nid;
 
 	/* Lock the cpt to protect against addition and checks in the
 	 * selection algorithm
@@ -1076,16 +1076,16 @@ struct lnet_peer_ni *
  * shared mmode.
  */
 bool
-lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid)
+lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, struct lnet_nid *nid)
 {
 	struct lnet_nid_list *ne;
 
 	if (lpni->lpni_pref_nnids == 0)
 		return false;
 	if (lpni->lpni_pref_nnids == 1)
-		return lpni->lpni_pref.nid == nid;
+		return nid_same(&lpni->lpni_pref.nid, nid);
 	list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) {
-		if (ne->nl_nid == nid)
+		if (nid_same(&ne->nl_nid, nid))
 			return true;
 	}
 	return false;
@@ -1096,24 +1096,27 @@ struct lnet_peer_ni *
  * defined. Only to be used for non-multi-rail peer_ni.
  */
 int
-lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid)
+lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni,
+				 struct lnet_nid *nid)
 {
 	int rc = 0;
 
+	if (!nid)
+		return -EINVAL;
 	spin_lock(&lpni->lpni_lock);
-	if (nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(nid)) {
 		rc = -EINVAL;
 	} else if (lpni->lpni_pref_nnids > 0) {
 		rc = -EPERM;
 	} else if (lpni->lpni_pref_nnids == 0) {
-		lpni->lpni_pref.nid = nid;
+		lpni->lpni_pref.nid = *nid;
 		lpni->lpni_pref_nnids = 1;
 		lpni->lpni_state |= LNET_PEER_NI_NON_MR_PREF;
 	}
 	spin_unlock(&lpni->lpni_lock);
 
 	CDEBUG(D_NET, "peer %s nid %s: %d\n",
-	       libcfs_nidstr(&lpni->lpni_nid), libcfs_nid2str(nid), rc);
+	       libcfs_nidstr(&lpni->lpni_nid), libcfs_nidstr(nid), rc);
 	return rc;
 }
 
@@ -1161,20 +1164,21 @@ struct lnet_peer_ni *
 }
 
 int
-lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid)
+lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid)
 {
 	struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer;
 	struct lnet_nid_list *ne1 = NULL;
 	struct lnet_nid_list *ne2 = NULL;
-	lnet_nid_t tmp_nid = LNET_NID_ANY;
+	struct lnet_nid *tmp_nid = NULL;
 	int rc = 0;
 
-	if (nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(nid)) {
 		rc = -EINVAL;
 		goto out;
 	}
 
-	if (lpni->lpni_pref_nnids == 1 && lpni->lpni_pref.nid == nid) {
+	if (lpni->lpni_pref_nnids == 1 &&
+	    nid_same(&lpni->lpni_pref.nid, nid)) {
 		rc = -EEXIST;
 		goto out;
 	}
@@ -1191,12 +1195,12 @@ struct lnet_peer_ni *
 		size_t alloc_size = sizeof(*ne1);
 
 		if (lpni->lpni_pref_nnids == 1) {
-			tmp_nid = lpni->lpni_pref.nid;
+			tmp_nid = &lpni->lpni_pref.nid;
 			INIT_LIST_HEAD(&lpni->lpni_pref.nids);
 		}
 
 		list_for_each_entry(ne1, &lpni->lpni_pref.nids, nl_list) {
-			if (ne1->nl_nid == nid) {
+			if (nid_same(&ne1->nl_nid, nid)) {
 				rc = -EEXIST;
 				goto out;
 			}
@@ -1217,15 +1221,15 @@ struct lnet_peer_ni *
 				goto out;
 			}
 			INIT_LIST_HEAD(&ne2->nl_list);
-			ne2->nl_nid = tmp_nid;
+			ne2->nl_nid = *tmp_nid;
 		}
-		ne1->nl_nid = nid;
+		ne1->nl_nid = *nid;
 	}
 
 	lnet_net_lock(LNET_LOCK_EX);
 	spin_lock(&lpni->lpni_lock);
 	if (lpni->lpni_pref_nnids == 0) {
-		lpni->lpni_pref.nid = nid;
+		lpni->lpni_pref.nid = *nid;
 	} else {
 		if (ne2)
 			list_add_tail(&ne2->nl_list, &lpni->lpni_pref.nids);
@@ -1243,12 +1247,12 @@ struct lnet_peer_ni *
 		spin_unlock(&lpni->lpni_lock);
 	}
 	CDEBUG(D_NET, "peer %s nid %s: %d\n",
-	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nid2str(nid), rc);
+	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nidstr(nid), rc);
 	return rc;
 }
 
 int
-lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid)
+lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid)
 {
 	struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer;
 	struct lnet_nid_list *ne = NULL;
@@ -1260,13 +1264,13 @@ struct lnet_peer_ni *
 	}
 
 	if (lpni->lpni_pref_nnids == 1) {
-		if (lpni->lpni_pref.nid != nid) {
+		if (!nid_same(&lpni->lpni_pref.nid, nid)) {
 			rc = -ENOENT;
 			goto out;
 		}
 	} else {
 		list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) {
-			if (ne->nl_nid == nid)
+			if (nid_same(&ne->nl_nid, nid))
 				goto remove_nid_entry;
 		}
 		rc = -ENOENT;
@@ -1278,7 +1282,7 @@ struct lnet_peer_ni *
 	lnet_net_lock(LNET_LOCK_EX);
 	spin_lock(&lpni->lpni_lock);
 	if (lpni->lpni_pref_nnids == 1) {
-		lpni->lpni_pref.nid = LNET_NID_ANY;
+		lpni->lpni_pref.nid = LNET_ANY_NID;
 	} else {
 		list_del_init(&ne->nl_list);
 		if (lpni->lpni_pref_nnids == 2) {
@@ -1301,7 +1305,7 @@ struct lnet_peer_ni *
 	kfree(ne);
 out:
 	CDEBUG(D_NET, "peer %s nid %s: %d\n",
-	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nid2str(nid), rc);
+	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nidstr(nid), rc);
 	return rc;
 }
 
@@ -1316,7 +1320,7 @@ struct lnet_peer_ni *
 
 	lnet_net_lock(LNET_LOCK_EX);
 	if (lpni->lpni_pref_nnids == 1)
-		lpni->lpni_pref.nid = LNET_NID_ANY;
+		lpni->lpni_pref.nid = LNET_ANY_NID;
 	else if (lpni->lpni_pref_nnids > 1)
 		list_splice_init(&lpni->lpni_pref.nids, &zombies);
 	lpni->lpni_pref_nnids = 0;
@@ -1849,7 +1853,7 @@ struct lnet_peer_net *
  * lpni creation initiated due to traffic either sending or receiving.
  */
 static int
-lnet_peer_ni_traffic_add(struct lnet_nid *nid, lnet_nid_t pref)
+lnet_peer_ni_traffic_add(struct lnet_nid *nid, struct lnet_nid *pref)
 {
 	struct lnet_peer *lp;
 	struct lnet_peer_net *lpn;
@@ -1886,8 +1890,7 @@ struct lnet_peer_net *
 	lpni = lnet_peer_ni_alloc(nid);
 	if (!lpni)
 		goto out_free_lpn;
-	if (pref != LNET_NID_ANY)
-		lnet_peer_ni_set_non_mr_pref_nid(lpni, pref);
+	lnet_peer_ni_set_non_mr_pref_nid(lpni, pref);
 
 	return lnet_peer_attach_peer_ni(lp, lpn, lpni, flags);
 
@@ -2084,7 +2087,7 @@ struct lnet_peer_ni *
 
 	lnet_net_unlock(cpt);
 
-	rc = lnet_peer_ni_traffic_add(nid, LNET_NID_ANY);
+	rc = lnet_peer_ni_traffic_add(nid, NULL);
 	if (rc) {
 		lpni = ERR_PTR(rc);
 		goto out_net_relock;
@@ -2104,21 +2107,20 @@ struct lnet_peer_ni *
  * hold on the peer_ni.
  */
 struct lnet_peer_ni *
-lnet_nid2peerni_locked(lnet_nid_t nid4, lnet_nid_t pref, int cpt)
+lnet_peerni_by_nid_locked(struct lnet_nid *nid,
+			  struct lnet_nid *pref, int cpt)
 {
 	struct lnet_peer_ni *lpni = NULL;
-	struct lnet_nid nid;
 	int rc;
 
 	if (the_lnet.ln_state != LNET_STATE_RUNNING)
 		return ERR_PTR(-ESHUTDOWN);
 
-	lnet_nid4_to_nid(nid4, &nid);
 	/*
 	 * find if a peer_ni already exists.
 	 * If so then just return that.
 	 */
-	lpni = lnet_find_peer_ni_locked(nid4);
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (lpni)
 		return lpni;
 
@@ -2145,13 +2147,13 @@ struct lnet_peer_ni *
 		goto out_mutex_unlock;
 	}
 
-	rc = lnet_peer_ni_traffic_add(&nid, pref);
+	rc = lnet_peer_ni_traffic_add(nid, pref);
 	if (rc) {
 		lpni = ERR_PTR(rc);
 		goto out_mutex_unlock;
 	}
 
-	lpni = lnet_find_peer_ni_locked(nid4);
+	lpni = lnet_peer_ni_find_locked(nid);
 	LASSERT(lpni);
 
 out_mutex_unlock:
@@ -2168,6 +2170,19 @@ struct lnet_peer_ni *
 	return lpni;
 }
 
+struct lnet_peer_ni *
+lnet_nid2peerni_locked(lnet_nid_t nid4, lnet_nid_t pref4, int cpt)
+{
+	struct lnet_nid nid, pref;
+
+	lnet_nid4_to_nid(nid4, &nid);
+	lnet_nid4_to_nid(pref4, &pref);
+	if (pref4 == LNET_NID_ANY)
+		return lnet_peerni_by_nid_locked(&nid, NULL, cpt);
+	else
+		return lnet_peerni_by_nid_locked(&nid, &pref, cpt);
+}
+
 bool
 lnet_peer_gw_discovery(struct lnet_peer *lp)
 {
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
index 977a6a6..7fa4f88 100644
--- a/net/lnet/lnet/udsp.c
+++ b/net/lnet/lnet/udsp.c
@@ -213,7 +213,7 @@ enum udsp_apply {
 	struct lnet_ud_nid_descr *ni_match = udi->udi_match;
 	u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority;
 
-	rc = cfs_match_nid_net(lnet_nid_to_nid4(&ni->ni_nid),
+	rc = cfs_match_nid_net(&ni->ni_nid,
 			       ni_match->ud_net_id.udn_net_type,
 			       &ni_match->ud_net_id.udn_net_num_range,
 			       &ni_match->ud_addr_range);
@@ -239,7 +239,7 @@ enum udsp_apply {
 	struct lnet_route *route;
 	struct lnet_peer_ni *lpni;
 	bool cleared = false;
-	lnet_nid_t gw_nid, gw_prim_nid;
+	struct lnet_nid *gw_nid, *gw_prim_nid;
 	int rc = 0;
 	int i;
 
@@ -248,14 +248,14 @@ enum udsp_apply {
 		list_for_each_entry(rnet, rn_list, lrn_list) {
 			list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
 				/* look if gw nid on the same net matches */
-				gw_prim_nid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid);
+				gw_prim_nid = &route->lr_gateway->lp_primary_nid;
 				lpni = NULL;
 				while ((lpni = lnet_get_next_peer_ni_locked(route->lr_gateway,
 									    NULL,
 									    lpni)) != NULL) {
 					if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id))
 						continue;
-					gw_nid = lnet_nid_to_nid4(&lpni->lpni_nid);
+					gw_nid = &lpni->lpni_nid;
 					rc = cfs_match_nid_net(gw_nid,
 							       rte_action->ud_net_id.udn_net_type,
 							       &rte_action->ud_net_id.udn_net_num_range,
@@ -285,13 +285,13 @@ enum udsp_apply {
 				/* match. Add to pref NIDs */
 				CDEBUG(D_NET, "udsp net->gw: %s->%s\n",
 				       libcfs_net2str(net->net_id),
-				       libcfs_nid2str(gw_prim_nid));
+				       libcfs_nidstr(gw_prim_nid));
 				rc = lnet_net_add_pref_rtr(net, gw_prim_nid);
 				lnet_net_lock(LNET_LOCK_EX);
 				/* success if EEXIST return */
 				if (rc && rc != -EEXIST) {
 					CERROR("Failed to add %s to %s pref rtr list\n",
-					       libcfs_nid2str(gw_prim_nid),
+					       libcfs_nidstr(gw_prim_nid),
 					       libcfs_net2str(net->net_id));
 					return rc;
 				}
@@ -417,7 +417,7 @@ enum udsp_apply {
 	struct list_head *rn_list;
 	struct lnet_route *route;
 	bool cleared = false;
-	lnet_nid_t gw_nid;
+	struct lnet_nid *gw_nid;
 	int rc = 0;
 	int i;
 
@@ -425,7 +425,7 @@ enum udsp_apply {
 		rn_list = &the_lnet.ln_remote_nets_hash[i];
 		list_for_each_entry(rnet, rn_list, lrn_list) {
 			list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
-				gw_nid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid);
+				gw_nid = &route->lr_gateway->lp_primary_nid;
 				rc = cfs_match_nid_net(gw_nid,
 						       rte_action->ud_net_id.udn_net_type,
 						       &rte_action->ud_net_id.udn_net_num_range,
@@ -447,7 +447,7 @@ enum udsp_apply {
 				}
 				CDEBUG(D_NET,
 				       "add gw nid %s as preferred for peer %s\n",
-				       libcfs_nid2str(gw_nid),
+				       libcfs_nidstr(gw_nid),
 				       libcfs_nidstr(&lpni->lpni_nid));
 				/* match. Add to pref NIDs */
 				rc = lnet_peer_add_pref_rtr(lpni, gw_nid);
@@ -455,7 +455,7 @@ enum udsp_apply {
 				/* success if EEXIST return */
 				if (rc && rc != -EEXIST) {
 					CERROR("Failed to add %s to %s pref rtr list\n",
-					       libcfs_nid2str(gw_nid),
+					       libcfs_nidstr(gw_nid),
 					       libcfs_nidstr(&lpni->lpni_nid));
 					return rc;
 				}
@@ -481,7 +481,7 @@ enum udsp_apply {
 		    ni_action->ud_net_id.udn_net_type)
 			continue;
 		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
-			rc = cfs_match_nid_net(lnet_nid_to_nid4(&ni->ni_nid),
+			rc = cfs_match_nid_net(&ni->ni_nid,
 					       ni_action->ud_net_id.udn_net_type,
 					       &ni_action->ud_net_id.udn_net_num_range,
 					       &ni_action->ud_addr_range);
@@ -503,8 +503,7 @@ enum udsp_apply {
 			       libcfs_nidstr(&ni->ni_nid),
 			       libcfs_nidstr(&lpni->lpni_nid));
 			/* match. Add to pref NIDs */
-			rc = lnet_peer_add_pref_nid(lpni,
-						    lnet_nid_to_nid4(&ni->ni_nid));
+			rc = lnet_peer_add_pref_nid(lpni, &ni->ni_nid);
 			lnet_net_lock(LNET_LOCK_EX);
 			/* success if EEXIST return */
 			if (rc && rc != -EEXIST) {
@@ -530,7 +529,7 @@ enum udsp_apply {
 	bool local = udi->udi_local;
 	enum lnet_udsp_action_type type = udi->udi_type;
 
-	rc = cfs_match_nid_net(lnet_nid_to_nid4(&lpni->lpni_nid),
+	rc = cfs_match_nid_net(&lpni->lpni_nid,
 			       lp_match->ud_net_id.udn_net_type,
 			       &lp_match->ud_net_id.udn_net_num_range,
 			       &lp_match->ud_addr_range);
@@ -996,7 +995,8 @@ struct lnet_udsp *
 		info->cud_net_priority = ni->ni_net->net_sel_priority;
 		list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
 			if (i < LNET_MAX_SHOW_NUM_NID)
-				info->cud_pref_rtr_nid[i] = ne->nl_nid;
+				info->cud_pref_rtr_nid[i] =
+					lnet_nid_to_nid4(&ne->nl_nid);
 			else
 				break;
 			i++;
@@ -1020,13 +1020,14 @@ struct lnet_udsp *
 	       libcfs_nidstr(&lpni->lpni_nid),
 	       lpni->lpni_pref_nnids);
 	if (lpni->lpni_pref_nnids == 1) {
-		info->cud_pref_nid[0] = lpni->lpni_pref.nid;
+		info->cud_pref_nid[0] = lnet_nid_to_nid4(&lpni->lpni_pref.nid);
 	} else if (lpni->lpni_pref_nnids > 1) {
 		struct list_head *list = &lpni->lpni_pref.nids;
 
 		list_for_each_entry(ne, list, nl_list) {
 			if (i < LNET_MAX_SHOW_NUM_NID)
-				info->cud_pref_nid[i] = ne->nl_nid;
+				info->cud_pref_nid[i] =
+					lnet_nid_to_nid4(&ne->nl_nid);
 			else
 				break;
 			i++;
@@ -1036,7 +1037,8 @@ struct lnet_udsp *
 	i = 0;
 	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
 		if (i < LNET_MAX_SHOW_NUM_NID)
-			info->cud_pref_rtr_nid[i] = ne->nl_nid;
+			info->cud_pref_rtr_nid[i] =
+				lnet_nid_to_nid4(&ne->nl_nid);
 		else
 			break;
 		i++;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 04/19] lnet: switch to large lnet_processid for matching
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 03/19] lnet: extend preferred nids in struct lnet_peer_ni James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race James Simmons
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Change lnet_libhandle.me_match_id and lnet_match_info.mi_id to
struct lnet_processid, so they support large nids.

This requires changing
  LNetMEAttach(), lnet_mt_match_head(), lnet_mt_of_attach(),
  lnet_ptl_match_type(), lnet_match2mt()
to accept a pointer to lnet_processid rather than an
lnet_process_id.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: db49fbf00d24edc83 ("LU-10391 lnet: switch to large lnet_processid for matching")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43597
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/niobuf.c      | 21 ++++++++++++--------
 include/linux/lnet/api.h       |  2 +-
 include/linux/lnet/lib-lnet.h  |  4 ++--
 include/linux/lnet/lib-types.h |  4 ++--
 net/lnet/lnet/api-ni.c         | 12 +++++------
 net/lnet/lnet/lib-me.c         |  4 ++--
 net/lnet/lnet/lib-move.c       | 10 +++++-----
 net/lnet/lnet/lib-ptl.c        | 45 ++++++++++++++++++++++--------------------
 net/lnet/selftest/rpc.c        | 10 +++++++---
 9 files changed, 62 insertions(+), 50 deletions(-)

diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 614bb63..c5bbf5a 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -120,7 +120,7 @@ static void __mdunlink_iterate_helper(struct lnet_handle_md *bd_mds,
 static int ptlrpc_register_bulk(struct ptlrpc_request *req)
 {
 	struct ptlrpc_bulk_desc *desc = req->rq_bulk;
-	struct lnet_process_id peer;
+	struct lnet_processid peer;
 	int rc = 0;
 	int posted_md;
 	int total_md;
@@ -150,7 +150,9 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req)
 
 	desc->bd_failure = 0;
 
-	peer = desc->bd_import->imp_connection->c_peer;
+	peer.pid = desc->bd_import->imp_connection->c_peer.pid;
+	lnet_nid4_to_nid(desc->bd_import->imp_connection->c_peer.nid,
+		      &peer.nid);
 
 	LASSERT(desc->bd_cbid.cbid_fn == client_bulk_callback);
 	LASSERT(desc->bd_cbid.cbid_arg == desc);
@@ -186,7 +188,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req)
 		    OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) {
 			rc = -ENOMEM;
 		} else {
-			me = LNetMEAttach(desc->bd_portal, peer, mbits, 0,
+			me = LNetMEAttach(desc->bd_portal, &peer, mbits, 0,
 					  LNET_UNLINK, LNET_INS_AFTER);
 			rc = PTR_ERR_OR_ZERO(me);
 		}
@@ -225,7 +227,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req)
 	/* Holler if peer manages to touch buffers before he knows the mbits */
 	if (desc->bd_refs != total_md)
 		CWARN("%s: Peer %s touched %d buffers while I registered\n",
-		      desc->bd_import->imp_obd->obd_name, libcfs_id2str(peer),
+		      desc->bd_import->imp_obd->obd_name, libcfs_idstr(&peer),
 		      total_md - desc->bd_refs);
 	spin_unlock(&desc->bd_lock);
 
@@ -492,6 +494,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	unsigned int mpflag = 0;
 	bool rep_mbits = false;
 	struct lnet_handle_md bulk_cookie;
+	struct lnet_processid peer;
 	struct ptlrpc_connection *connection;
 	struct lnet_me *reply_me;
 	struct lnet_md reply_md;
@@ -627,12 +630,14 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 			request->rq_repmsg = NULL;
 		}
 
+		peer.pid = connection->c_peer.pid;
+		lnet_nid4_to_nid(connection->c_peer.nid, &peer.nid);
 		if (request->rq_bulk &&
 		    OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_REPLY_ATTACH)) {
 			reply_me = ERR_PTR(-ENOMEM);
 		} else {
 			reply_me = LNetMEAttach(request->rq_reply_portal,
-						connection->c_peer,
+						&peer,
 						rep_mbits ? request->rq_mbits :
 						request->rq_xid,
 						0, LNET_UNLINK, LNET_INS_AFTER);
@@ -761,8 +766,8 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 {
 	struct ptlrpc_service *service = rqbd->rqbd_svcpt->scp_service;
-	static struct lnet_process_id match_id = {
-		.nid = LNET_NID_ANY,
+	static struct lnet_processid match_id = {
+		.nid = LNET_ANY_NID,
 		.pid = LNET_PID_ANY
 	};
 	int rc;
@@ -780,7 +785,7 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 	 * threads can find it by grabbing a local lock
 	 */
 	me = LNetMEAttach(service->srv_req_portal,
-			  match_id, 0, ~0, LNET_UNLINK,
+			  &match_id, 0, ~0, LNET_UNLINK,
 			  rqbd->rqbd_svcpt->scp_cpt >= 0 ?
 			  LNET_INS_LOCAL : LNET_INS_AFTER);
 	if (IS_ERR(me)) {
diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index d32c7c1..040bf18 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -96,7 +96,7 @@
  */
 struct lnet_me *
 LNetMEAttach(unsigned int portal,
-	     struct lnet_process_id match_id_in,
+	     struct lnet_processid *match_id_in,
 	     u64 match_bits_in,
 	     u64 ignore_bits_in,
 	     enum lnet_unlink unlink_in,
diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index fb2f42fcb..02eae6b 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -629,9 +629,9 @@ int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis,
 
 /* match-table functions */
 struct list_head *lnet_mt_match_head(struct lnet_match_table *mtable,
-				     struct lnet_process_id id, u64 mbits);
+				     struct lnet_processid *id, u64 mbits);
 struct lnet_match_table *lnet_mt_of_attach(unsigned int index,
-					   struct lnet_process_id id,
+					   struct lnet_processid *id,
 					   u64 mbits, u64 ignore_bits,
 					   enum lnet_ins_pos pos);
 int lnet_mt_match_md(struct lnet_match_table *mtable,
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index bde7249..628d133 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -187,7 +187,7 @@ struct lnet_libhandle {
 struct lnet_me {
 	struct list_head	 me_list;
 	int			 me_cpt;
-	struct lnet_process_id	 me_match_id;
+	struct lnet_processid	 me_match_id;
 	unsigned int		 me_portal;
 	unsigned int		 me_pos;	/* hash offset in mt_hash */
 	u64			 me_match_bits;
@@ -994,7 +994,7 @@ enum lnet_match_flags {
 /* parameter for matching operations (GET, PUT) */
 struct lnet_match_info {
 	u64			mi_mbits;
-	struct lnet_process_id	mi_id;
+	struct lnet_processid	mi_id;
 	unsigned int		mi_cpt;
 	unsigned int		mi_opc;
 	unsigned int		mi_portal;
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 340cc84e..9d9d0e6 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1840,8 +1840,8 @@ struct lnet_ping_buffer *
 		       struct lnet_handle_md *ping_mdh,
 		       int ni_count, bool set_eq)
 {
-	struct lnet_process_id id = {
-		.nid = LNET_NID_ANY,
+	struct lnet_processid id = {
+		.nid = LNET_ANY_NID,
 		.pid = LNET_PID_ANY
 	};
 	struct lnet_me *me;
@@ -1859,7 +1859,7 @@ struct lnet_ping_buffer *
 	}
 
 	/* Ping target ME/MD */
-	me = LNetMEAttach(LNET_RESERVED_PORTAL, id,
+	me = LNetMEAttach(LNET_RESERVED_PORTAL, &id,
 			  LNET_PROTO_PING_MATCHBITS, 0,
 			  LNET_UNLINK, LNET_INS_AFTER);
 	if (IS_ERR(me)) {
@@ -2056,15 +2056,15 @@ int lnet_push_target_resize(void)
 int lnet_push_target_post(struct lnet_ping_buffer *pbuf,
 			  struct lnet_handle_md *mdhp)
 {
-	struct lnet_process_id id = {
-		.nid	= LNET_NID_ANY,
+	struct lnet_processid id = {
+		.nid	= LNET_ANY_NID,
 		.pid	= LNET_PID_ANY
 	};
 	struct lnet_md md = { NULL };
 	struct lnet_me *me;
 	int rc;
 
-	me = LNetMEAttach(LNET_RESERVED_PORTAL, id,
+	me = LNetMEAttach(LNET_RESERVED_PORTAL, &id,
 			  LNET_PROTO_PING_MATCHBITS, 0,
 			  LNET_UNLINK, LNET_INS_AFTER);
 	if (IS_ERR(me)) {
diff --git a/net/lnet/lnet/lib-me.c b/net/lnet/lnet/lib-me.c
index 66a79e2..7868165 100644
--- a/net/lnet/lnet/lib-me.c
+++ b/net/lnet/lnet/lib-me.c
@@ -68,7 +68,7 @@
  */
 struct lnet_me *
 LNetMEAttach(unsigned int portal,
-	     struct lnet_process_id match_id,
+	     struct lnet_processid *match_id,
 	     u64 match_bits, u64 ignore_bits,
 	     enum lnet_unlink unlink, enum lnet_ins_pos pos)
 {
@@ -93,7 +93,7 @@ struct lnet_me *
 	lnet_res_lock(mtable->mt_cpt);
 
 	me->me_portal = portal;
-	me->me_match_id = match_id;
+	me->me_match_id = *match_id;
 	me->me_match_bits = match_bits;
 	me->me_ignore_bits = ignore_bits;
 	me->me_unlink = unlink;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 2f7c37d..088a754 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3900,7 +3900,7 @@ void lnet_monitor_thr_stop(void)
 	le32_to_cpus(&hdr->msg.put.offset);
 
 	/* Primary peer NID. */
-	info.mi_id.nid = msg->msg_initiator;
+	lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
 	info.mi_id.pid = hdr->src_pid;
 	info.mi_opc = LNET_MD_OP_PUT;
 	info.mi_portal = hdr->msg.put.ptl_index;
@@ -3939,7 +3939,7 @@ void lnet_monitor_thr_stop(void)
 
 	case LNET_MATCHMD_DROP:
 		CNETERR("Dropping PUT from %s portal %d match %llu offset %d length %d: %d\n",
-			libcfs_id2str(info.mi_id), info.mi_portal,
+			libcfs_idstr(&info.mi_id), info.mi_portal,
 			info.mi_mbits, info.mi_roffset, info.mi_rlength, rc);
 
 		return -ENOENT;	/* -ve: OK but no match */
@@ -3964,7 +3964,7 @@ void lnet_monitor_thr_stop(void)
 	source_id.nid = hdr->src_nid;
 	source_id.pid = hdr->src_pid;
 	/* Primary peer NID */
-	info.mi_id.nid = msg->msg_initiator;
+	lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
 	info.mi_id.pid = hdr->src_pid;
 	info.mi_opc = LNET_MD_OP_GET;
 	info.mi_portal = hdr->msg.get.ptl_index;
@@ -3976,7 +3976,7 @@ void lnet_monitor_thr_stop(void)
 	rc = lnet_ptl_match_md(&info, msg);
 	if (rc == LNET_MATCHMD_DROP) {
 		CNETERR("Dropping GET from %s portal %d match %llu offset %d length %d\n",
-			libcfs_id2str(info.mi_id), info.mi_portal,
+			libcfs_idstr(&info.mi_id), info.mi_portal,
 			info.mi_mbits, info.mi_roffset, info.mi_rlength);
 		return -ENOENT;	/* -ve: OK but no match */
 	}
@@ -4008,7 +4008,7 @@ void lnet_monitor_thr_stop(void)
 		/* didn't get as far as lnet_ni_send() */
 		CERROR("%s: Unable to send REPLY for GET from %s: %d\n",
 		       libcfs_nidstr(&ni->ni_nid),
-		       libcfs_id2str(info.mi_id), rc);
+		       libcfs_idstr(&info.mi_id), rc);
 
 		lnet_finalize(msg, rc);
 	}
diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c
index 095b190..d367c00 100644
--- a/net/lnet/lnet/lib-ptl.c
+++ b/net/lnet/lnet/lib-ptl.c
@@ -39,15 +39,15 @@
 MODULE_PARM_DESC(portal_rotor, "redirect PUTs to different cpu-partitions");
 
 static int
-lnet_ptl_match_type(unsigned int index, struct lnet_process_id match_id,
+lnet_ptl_match_type(unsigned int index, struct lnet_processid *match_id,
 		    u64 mbits, u64 ignore_bits)
 {
 	struct lnet_portal *ptl = the_lnet.ln_portals[index];
 	int unique;
 
-	unique = !ignore_bits &&
-		 match_id.nid != LNET_NID_ANY &&
-		 match_id.pid != LNET_PID_ANY;
+	unique = (!ignore_bits &&
+		  !LNET_NID_IS_ANY(&match_id->nid) &&
+		  match_id->pid != LNET_PID_ANY);
 
 	LASSERT(!lnet_ptl_is_unique(ptl) || !lnet_ptl_is_wildcard(ptl));
 
@@ -151,8 +151,8 @@
 		return LNET_MATCHMD_NONE;
 
 	/* mismatched ME nid/pid? */
-	if (me->me_match_id.nid != LNET_NID_ANY &&
-	    me->me_match_id.nid != info->mi_id.nid)
+	if (!LNET_NID_IS_ANY(&me->me_match_id.nid) &&
+	    !nid_same(&me->me_match_id.nid, &info->mi_id.nid))
 		return LNET_MATCHMD_NONE;
 
 	if (me->me_match_id.pid != LNET_PID_ANY &&
@@ -182,7 +182,7 @@
 	} else if (!(md->md_options & LNET_MD_TRUNCATE)) {
 		/* this packet _really_ is too big */
 		CERROR("Matching packet from %s, match %llu length %d too big: %d left, %d allowed\n",
-		       libcfs_id2str(info->mi_id), info->mi_mbits,
+		       libcfs_idstr(&info->mi_id), info->mi_mbits,
 		       info->mi_rlength, md->md_length - offset, mlength);
 
 		return LNET_MATCHMD_DROP;
@@ -191,7 +191,7 @@
 	/* Commit to this ME/MD */
 	CDEBUG(D_NET, "Incoming %s index %x from %s of length %d/%d into md %#llx [%d] + %d\n",
 	       (info->mi_opc == LNET_MD_OP_PUT) ? "put" : "get",
-	       info->mi_portal, libcfs_id2str(info->mi_id), mlength,
+	       info->mi_portal, libcfs_idstr(&info->mi_id), mlength,
 	       info->mi_rlength, md->md_lh.lh_cookie, md->md_niov, offset);
 
 	lnet_msg_attach_md(msg, md, offset, mlength);
@@ -212,18 +212,18 @@
 }
 
 static struct lnet_match_table *
-lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, u64 mbits)
+lnet_match2mt(struct lnet_portal *ptl, struct lnet_processid *id, u64 mbits)
 {
 	if (LNET_CPT_NUMBER == 1)
 		return ptl->ptl_mtables[0]; /* the only one */
 
 	/* if it's a unique portal, return match-table hashed by NID */
 	return lnet_ptl_is_unique(ptl) ?
-	       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid, NULL)] : NULL;
+	       ptl->ptl_mtables[lnet_nid2cpt(&id->nid, NULL)] : NULL;
 }
 
 struct lnet_match_table *
-lnet_mt_of_attach(unsigned int index, struct lnet_process_id id,
+lnet_mt_of_attach(unsigned int index, struct lnet_processid *id,
 		  u64 mbits, u64 ignore_bits, enum lnet_ins_pos pos)
 {
 	struct lnet_portal *ptl;
@@ -274,7 +274,7 @@ struct lnet_match_table *
 
 	LASSERT(lnet_ptl_is_wildcard(ptl) || lnet_ptl_is_unique(ptl));
 
-	mtable = lnet_match2mt(ptl, info->mi_id, info->mi_mbits);
+	mtable = lnet_match2mt(ptl, &info->mi_id, info->mi_mbits);
 	if (mtable)
 		return mtable;
 
@@ -357,13 +357,13 @@ struct lnet_match_table *
 
 struct list_head *
 lnet_mt_match_head(struct lnet_match_table *mtable,
-		   struct lnet_process_id id, u64 mbits)
+		   struct lnet_processid *id, u64 mbits)
 {
 	struct lnet_portal *ptl = the_lnet.ln_portals[mtable->mt_portal];
 	unsigned long hash = mbits;
 
 	if (!lnet_ptl_is_wildcard(ptl)) {
-		hash += id.nid + id.pid;
+		hash += nidhash(&id->nid) + id->pid;
 
 		LASSERT(lnet_ptl_is_unique(ptl));
 		hash = hash_long(hash, LNET_MT_HASH_BITS);
@@ -385,7 +385,8 @@ struct list_head *
 	if (!list_empty(&mtable->mt_mhash[LNET_MT_HASH_IGNORE]))
 		head = &mtable->mt_mhash[LNET_MT_HASH_IGNORE];
 	else
-		head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits);
+		head = lnet_mt_match_head(mtable, &info->mi_id,
+					  info->mi_mbits);
 again:
 	/* NB: only wildcard portal needs to return LNET_MATCHMD_EXHAUSTED */
 	if (lnet_ptl_is_wildcard(the_lnet.ln_portals[mtable->mt_portal]))
@@ -418,7 +419,8 @@ struct list_head *
 	}
 
 	if (!exhausted && head == &mtable->mt_mhash[LNET_MT_HASH_IGNORE]) {
-		head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits);
+		head = lnet_mt_match_head(mtable, &info->mi_id,
+					  info->mi_mbits);
 		goto again; /* re-check MEs w/o ignore-bits */
 	}
 
@@ -570,8 +572,9 @@ struct list_head *
 	struct lnet_portal *ptl;
 	int rc;
 
-	CDEBUG(D_NET, "Request from %s of length %d into portal %d MB=%#llx\n",
-	       libcfs_id2str(info->mi_id), info->mi_rlength, info->mi_portal,
+	CDEBUG(D_NET,
+	       "Request from %s of length %d into portal %d MB=%#llx\n",
+	       libcfs_idstr(&info->mi_id), info->mi_rlength, info->mi_portal,
 	       info->mi_mbits);
 
 	if (info->mi_portal >= the_lnet.ln_nportals) {
@@ -629,7 +632,7 @@ struct list_head *
 		CDEBUG(D_NET,
 		       "Delaying %s from %s ptl %d MB %#llx off %d len %d\n",
 		       info->mi_opc == LNET_MD_OP_PUT ? "PUT" : "GET",
-		       libcfs_id2str(info->mi_id), info->mi_portal,
+		       libcfs_idstr(&info->mi_id), info->mi_portal,
 		       info->mi_mbits, info->mi_roffset, info->mi_rlength);
 	}
 	goto out0;
@@ -687,7 +690,7 @@ struct list_head *
 
 		hdr = &msg->msg_hdr;
 		/* Multi-Rail: Primary peer NID */
-		info.mi_id.nid = msg->msg_initiator;
+		lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
 		info.mi_id.pid = hdr->src_pid;
 		info.mi_opc = LNET_MD_OP_PUT;
 		info.mi_portal = hdr->msg.put.ptl_index;
@@ -719,7 +722,7 @@ struct list_head *
 			list_add_tail(&msg->msg_list, matches);
 
 			CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",
-			       libcfs_id2str(info.mi_id),
+			       libcfs_idstr(&info.mi_id),
 			       info.mi_portal, info.mi_mbits,
 			       info.mi_roffset, info.mi_rlength);
 		} else {
diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c
index 7141da4..bd95e88 100644
--- a/net/lnet/selftest/rpc.c
+++ b/net/lnet/selftest/rpc.c
@@ -354,14 +354,18 @@ struct srpc_bulk *
 
 static int
 srpc_post_passive_rdma(int portal, int local, u64 matchbits, void *buf,
-		       int len, int options, struct lnet_process_id peer,
+		       int len, int options, struct lnet_process_id peer4,
 		       struct lnet_handle_md *mdh, struct srpc_event *ev)
 {
 	int rc;
 	struct lnet_md md;
 	struct lnet_me *me;
+	struct lnet_processid peer;
 
-	me = LNetMEAttach(portal, peer, matchbits, 0, LNET_UNLINK,
+	peer.pid = peer4.pid;
+	lnet_nid4_to_nid(peer4.nid, &peer.nid);
+
+	me = LNetMEAttach(portal, &peer, matchbits, 0, LNET_UNLINK,
 			  local ? LNET_INS_LOCAL : LNET_INS_AFTER);
 	if (IS_ERR(me)) {
 		rc = PTR_ERR(me);
@@ -387,7 +391,7 @@ struct srpc_bulk *
 
 	CDEBUG(D_NET,
 	       "Posted passive RDMA: peer %s, portal %d, matchbits %#llx\n",
-	       libcfs_id2str(peer), portal, matchbits);
+	       libcfs_id2str(peer4), portal, matchbits);
 	return 0;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 04/19] lnet: switch to large lnet_processid for matching James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 06/19] lustre: llite: tighten condition for fault not drop mmap_sem James Simmons
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.

this construction is used for testing only and as a
workaround it's enough to timeout.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13358
Lustre-commit: 2d2d381f35ee00431 ("LU-13358 libcfs: add timeout to cfs_race() to fix race")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43161
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs_fail.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h
index 45166c5..731401b 100644
--- a/include/linux/libcfs/libcfs_fail.h
+++ b/include/linux/libcfs/libcfs_fail.h
@@ -213,8 +213,14 @@ static inline void cfs_race_wait(u32 id)
 
 			cfs_race_state = 0;
 			CERROR("cfs_race id %x sleeping\n", id);
-			rc = wait_event_interruptible(cfs_race_waitq,
-						      cfs_race_state != 0);
+			/*
+			 * XXX: don't wait forever as there is no guarantee
+			 * that this branch is executed first. for testing
+			 * purposes this construction works good enough
+			 */
+			rc = wait_event_interruptible_timeout(cfs_race_waitq,
+							      cfs_race_state != 0,
+							      5 * HZ);
 			CERROR("cfs_fail_race id %x awake: rc=%d\n", id, rc);
 		}
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 06/19] lustre: llite: tighten condition for fault not drop mmap_sem
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 07/19] lnet: o2iblnd: map_on_demand not needed for frag interop James Simmons
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

As __lock_page_or_retry() indicates, filemap_fault() will return
VM_FAULT_RETRY without releasing mmap_sem iff flags contains
FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT.

So ll_fault0() should pass in FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT in ll_filemap_fault() so that when it
returns VM_FAULT_RETRY, we can pass on trying normal fault
under DLM lock as mmap_sem is still being held.

While in Linux 5.1 (6b4c9f4469819) FAULT_FLAG_RETRY_NOWAIT is enough
to not drop mmap_sem when failed to lock the page.

Fixes: 22cceab961 ("lustre: llite: Avoid eternel retry loops with MAP_POPULATE")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14713
Lustre-commit: 81aec05103558f57a ("LU-14713 llite: tighten condition for fault not drop mmap_sem")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44715
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_mmap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index 8047786..0009c5f 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -285,18 +285,25 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	if (ll_sbi_has_fast_read(ll_i2sbi(inode))) {
 		/* do fast fault */
+		bool allow_retry = vmf->flags & FAULT_FLAG_ALLOW_RETRY;
 		bool has_retry = vmf->flags & FAULT_FLAG_RETRY_NOWAIT;
 
 		/* To avoid loops, instruct downstream to not drop mmap_sem */
-		vmf->flags |= FAULT_FLAG_RETRY_NOWAIT;
+		/**
+		 * only need FAULT_FLAG_ALLOW_RETRY prior to Linux 5.1
+		 * (6b4c9f4469819), where FAULT_FLAG_RETRY_NOWAIT is enough
+		 * to not drop mmap_sem when failed to lock the page.
+		 */
+		vmf->flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
 		ll_cl_add(inode, env, NULL, LCC_MMAP);
 		fault_ret = filemap_fault(vmf);
 		ll_cl_remove(inode, env);
 		if (has_retry)
 			vmf->flags &= ~FAULT_FLAG_RETRY_NOWAIT;
+		if (!allow_retry)
+			vmf->flags &= ~FAULT_FLAG_ALLOW_RETRY;
 
-		/*
-		 * - If there is no error, then the page was found in cache and
+		/* - If there is no error, then the page was found in cache and
 		 *   uptodate;
 		 * - If VM_FAULT_RETRY is set, the page existed but failed to
 		 *   lock. We will try slow path to avoid loops.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 07/19] lnet: o2iblnd: map_on_demand not needed for frag interop
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 06/19] lustre: llite: tighten condition for fault not drop mmap_sem James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 08/19] lnet: o2iblnd: Fix logic for unaligned transfer James Simmons
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The map_on_demand tunable is not used for setting max frags so don't
require that it be set in order to negotiate max frags.

HPE-bug-id: LUS-10488
WC-bug-id: https://jira.whamcloud.com/browse/LU-15094
Lustre-commit: 4e61a4aacdbc237606 ("LU-15094 o2iblnd: map_on_demand not needed for frag interop")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45215
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 380374e..a053e7d 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2658,22 +2658,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		reason = "Unknown";
 		break;
 
-	case IBLND_REJECT_RDMA_FRAGS: {
-		struct lnet_ioctl_config_o2iblnd_tunables *tunables;
-
+	case IBLND_REJECT_RDMA_FRAGS:
 		if (!cp) {
 			reason = "can't negotiate max frags";
 			goto out;
 		}
-		tunables = &peer_ni->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
-		/*
-		 * This check only makes sense if the kernel supports global
-		 * memory registration. Otherwise, map_on_demand will never == 0
-		 */
-		if (!tunables->lnd_map_on_demand) {
-			reason = "map_on_demand must be enabled";
-			goto out;
-		}
+
 		if (conn->ibc_max_frags <= frag_num) {
 			reason = "unsupported max frags";
 			goto out;
@@ -2682,7 +2672,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		peer_ni->ibp_max_frags = frag_num;
 		reason = "rdma fragments";
 		break;
-	}
+
 	case IBLND_REJECT_MSG_QUEUE_SIZE:
 		if (!cp) {
 			reason = "can't negotiate queue depth";
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 08/19] lnet: o2iblnd: Fix logic for unaligned transfer
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 07/19] lnet: o2iblnd: map_on_demand not needed for frag interop James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 09/19] lnet: Reset ni_ping_count only on receive James Simmons
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

It's possible for there to be an offset for the first page of a
transfer. However, there are two bugs with this code in o2iblnd.

The first is that this use-case will require LNET_MAX_IOV + 1 local
RDMA fragments, but we do not specify the correct corresponding values
for the max page list to ib_alloc_fast_reg_page_list(),
ib_alloc_fast_reg_mr(), etc.

The second issue is that the logic in kiblnd_setup_rd_kiov() attempts
to obtain one more scatterlist entry than is actually needed. This
causes the transfer to fail with -EFAULT.

HPE-bug-id: LUS-10407
WC-bug-id: https://jira.whamcloud.com/browse/LU-15092
Lustre-commit: 23a2c92f203ff2f39 ("LU-15092 o2iblnd: Fix logic for unaligned transfer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45216
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  2 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  6 ++++--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 15 +++++++++------
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 36d26b2..9cdc12a 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1392,7 +1392,7 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps,
 		frd->frd_mr = ib_alloc_mr(fpo->fpo_hdev->ibh_pd,
 					  fastreg_gaps ? IB_MR_TYPE_SG_GAPS :
 							 IB_MR_TYPE_MEM_REG,
-					  LNET_MAX_IOV);
+					  IBLND_MAX_RDMA_FRAGS);
 		if (IS_ERR(frd->frd_mr)) {
 			rc = PTR_ERR(frd->frd_mr);
 			CERROR("Failed to allocate ib_alloc_mr: %d\n", rc);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 5066f7b..21f8981 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -112,8 +112,10 @@ struct kib_tunables {
 #define IBLND_OOB_CAPABLE(v)	((v) != IBLND_MSG_VERSION_1)
 #define IBLND_OOB_MSGS(v)	(IBLND_OOB_CAPABLE(v) ? 2 : 0)
 
-#define IBLND_MSG_SIZE		(4 << 10)	/* max size of queued messages (inc hdr) */
-#define IBLND_MAX_RDMA_FRAGS	LNET_MAX_IOV	/* max # of fragments supported */
+/* max size of queued messages (inc hdr) */
+#define IBLND_MSG_SIZE		(4 << 10)
+/* max # of fragments supported. + 1 for unaligned case */
+#define IBLND_MAX_RDMA_FRAGS	(LNET_MAX_IOV + 1)
 
 /************************/
 /* derived constants... */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index a053e7d..db13f41 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -662,6 +662,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct scatterlist *sg;
 	int fragnob;
 	int max_nkiov;
+	int sg_count = 0;
 
 	CDEBUG(D_NET, "niov %d offset %d nob %d\n", nkiov, offset, nob);
 
@@ -682,6 +683,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	do {
 		LASSERT(nkiov > 0);
 
+		if (!sg) {
+			CERROR("lacking enough sg entries to map tx\n");
+			return -EFAULT;
+		}
+		sg_count++;
+
 		fragnob = min((int)(kiov->bv_len - offset), nob);
 
 		/* We're allowed to start at a non-aligned page offset in
@@ -700,10 +707,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		sg_set_page(sg, kiov->bv_page, fragnob,
 			    kiov->bv_offset + offset);
 		sg = sg_next(sg);
-		if (!sg) {
-			CERROR("lacking enough sg entries to map tx\n");
-			return -EFAULT;
-		}
 
 		offset = 0;
 		kiov++;
@@ -711,7 +714,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		nob -= fragnob;
 	} while (nob > 0);
 
-	return kiblnd_map_tx(ni, tx, rd, sg - tx->tx_frags);
+	return kiblnd_map_tx(ni, tx, rd, sg_count);
 }
 
 static int
@@ -1008,7 +1011,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	int nob = offsetof(struct kib_msg, ibm_u) + body_nob;
 
 	LASSERT(tx->tx_nwrq >= 0);
-	LASSERT(tx->tx_nwrq < IBLND_MAX_RDMA_FRAGS + 1);
+	LASSERT(tx->tx_nwrq <= IBLND_MAX_RDMA_FRAGS);
 	LASSERT(nob <= IBLND_MSG_SIZE);
 
 	kiblnd_init_msg(tx->tx_msg, type, body_nob);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 09/19] lnet: Reset ni_ping_count only on receive
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 08/19] lnet: o2iblnd: Fix logic for unaligned transfer James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 10/19] lustre: ptlrpc: fix timeout after spurious wakeup James Simmons
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The lnet_ni:ni_ping_count is currently reset on every (healthy) tx.
We should only reset it when receiving a message over the NI. Taking
net_lock 0 on every tx results in a performance loss for certain
workloads.

Fixes: 885dab4e09 ("lnet: Recover local NI w/exponential backoff interval")
HPE-bug-id: LUS-10427
WC-bug-id: https://jira.whamcloud.com/browse/LU-15102
Lustre-commit: 9cc0a5ff5fc8f45aa ("LU-15102 lnet: Reset ni_ping_count only on receive")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45235
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-msg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 3c8b7c3..12768b2 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -888,8 +888,6 @@
 		 * faster recovery.
 		 */
 		lnet_inc_healthv(&ni->ni_healthv, lnet_health_sensitivity);
-		lnet_net_lock(0);
-		ni->ni_ping_count = 0;
 		/* It's possible msg_txpeer is NULL in the LOLND
 		 * case. Only increment the peer's health if we're
 		 * receiving a message from it. It's the only sure way to
@@ -898,7 +896,9 @@
 		 * as indication that the router is fully healthy.
 		 */
 		if (lpni && msg->msg_rx_committed) {
+			lnet_net_lock(0);
 			lpni->lpni_ping_count = 0;
+			ni->ni_ping_count = 0;
 			/* If we're receiving a message from the router or
 			 * I'm a router, then set that lpni's health to
 			 * maximum so we can commence communication
@@ -925,8 +925,8 @@
 								     &the_lnet.ln_mt_peerNIRecovq,
 								     ktime_get_seconds());
 			}
+			lnet_net_unlock(0);
 		}
-		lnet_net_unlock(0);
 
 		/* we can finalize this message */
 		return -1;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 10/19] lustre: ptlrpc: fix timeout after spurious wakeup
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 09/19] lnet: Reset ni_ping_count only on receive James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 11/19] lnet: Fail peer add for existing gw peer James Simmons
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

so that final timeout don't exceed requested one

WC-bug-id: https://jira.whamcloud.com/browse/LU-15086
Lustre-commit: b8383035406a4b7be ("LU-15086 ptlrpc: fix timeout after spurious wakeup")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45308
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/ptlrpcd.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c
index 9cd9d39..23fb52d 100644
--- a/fs/lustre/ptlrpc/ptlrpcd.c
+++ b/fs/lustre/ptlrpc/ptlrpcd.c
@@ -438,7 +438,7 @@ static int ptlrpcd(void *arg)
 		DEFINE_WAIT_FUNC(wait, woken_wake_function);
 		time64_t timeout;
 
-		timeout = ptlrpc_set_next_timeout(set);
+		timeout = ptlrpc_set_next_timeout(set) * HZ;
 
 		lu_context_enter(&env.le_ctx);
 		lu_context_enter(env.le_ses);
@@ -447,12 +447,15 @@ static int ptlrpcd(void *arg)
 		while (!ptlrpcd_check(&env, pc)) {
 			int ret;
 
-			if (timeout == 0)
+			if (timeout == 0) {
 				ret = wait_woken(&wait, TASK_IDLE,
 						 MAX_SCHEDULE_TIMEOUT);
-			else
+			} else {
 				ret = wait_woken(&wait, TASK_IDLE,
-						 HZ * timeout);
+						 timeout);
+				if (ret > 0)
+					timeout = ret;
+			}
 			if (ret != 0)
 				continue;
 			/* Timed out */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 11/19] lnet: Fail peer add for existing gw peer
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 10/19] lustre: ptlrpc: fix timeout after spurious wakeup James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 12/19] lustre: ptlrpc: remove bogus LASSERT James Simmons
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If there's an existing peer entry for a peer that is being added
via CLI, and that existing peer was not created via the CLI, then
DLC will attempt to delete the existing peer before creating a new
one. The exit status of the peer deletion was not being checked.
This results in the ability to add duplicate peers for gateways,
because gateways cannot be deleted via lnetctl unless the routes
for that gateway have been removed first.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15138
Lustre-commit: 79a4b69adb1e365b1 ("LU-15138 lnet: Fail peer add for existing gw peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45337
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 1853388..a9f33c0 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -499,12 +499,14 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 static int
 lnet_peer_del(struct lnet_peer *peer)
 {
+	int rc;
+
 	lnet_peer_cancel_discovery(peer);
 	lnet_net_lock(LNET_LOCK_EX);
-	lnet_peer_del_locked(peer);
+	rc = lnet_peer_del_locked(peer);
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	return 0;
+	return rc;
 }
 
 /*
@@ -1648,7 +1650,9 @@ struct lnet_peer_net *
 			}
 		}
 		/* Delete and recreate as a configured peer. */
-		lnet_peer_del(lp);
+		rc = lnet_peer_del(lp);
+		if (rc)
+			goto out;
 	}
 
 	/* Create peer, peer_net, and peer_ni. */
@@ -3238,6 +3242,7 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	struct list_head rlist;
 	struct lnet_route *route, *tmp;
 	int sensitivity = lp->lp_health_sensitivity;
+	int rc;
 
 	INIT_LIST_HEAD(&rlist);
 
@@ -3271,7 +3276,10 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	lnet_net_unlock(LNET_LOCK_EX);
 
 	/* lnet_peer_del() deletes all the peer NIs owned by this peer */
-	lnet_peer_del(lp);
+	rc = lnet_peer_del(lp);
+	if (rc)
+		CNETERR("Internal error: Unable to delete peer %s rc %d\n",
+			libcfs_nidstr(&lp->lp_primary_nid), rc);
 
 	list_for_each_entry_safe(route, tmp,
 				 &rlist, lr_list) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 12/19] lustre: ptlrpc: remove bogus LASSERT
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 11/19] lnet: Fail peer add for existing gw peer James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 13/19] lustre: quota: optimize capability check for root squash James Simmons
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

In the error case, it isn't possible for rc to be both -ENOMEM and
0 at the same time, so remove the incorrect LASSERT(rc == 0) to
avoid crashing the system on an allocation failure.

Improve error messages to conform to code style.

Fixes: c8c95f49fd73 ("lnet: me: discard struct lnet_handle_me")
WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: 49769c1eea52e067d ("LU-12678 ptlrpc: remove bogus LASSERT")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45421
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/niobuf.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index c5bbf5a..da04d4e 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -774,7 +774,7 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 	struct lnet_md md;
 	struct lnet_me *me;
 
-	CDEBUG(D_NET, "LNetMEAttach: portal %d\n",
+	CDEBUG(D_NET, "%s: registering portal %d\n", service->srv_name,
 	       service->srv_req_portal);
 
 	if (OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_RQBD))
@@ -789,8 +789,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 			  rqbd->rqbd_svcpt->scp_cpt >= 0 ?
 			  LNET_INS_LOCAL : LNET_INS_AFTER);
 	if (IS_ERR(me)) {
-		CERROR("LNetMEAttach failed: %ld\n", PTR_ERR(me));
-		return -ENOMEM;
+		CERROR("%s: LNetMEAttach failed: rc = %ld\n",
+		       service->srv_name, PTR_ERR(me));
+		return PTR_ERR(me);
 	}
 
 	LASSERT(rqbd->rqbd_refcount == 0);
@@ -810,9 +811,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd)
 		return 0;
 	}
 
-	CERROR("ptlrpc: LNetMDAttach failed: rc = %d\n", rc);
+	CERROR("%s: LNetMDAttach failed: rc = %d\n", service->srv_name, rc);
 	LASSERT(rc == -ENOMEM);
 	rqbd->rqbd_refcount = 0;
 
-	return -ENOMEM;
+	return rc;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 13/19] lustre: quota: optimize capability check for root squash
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 12/19] lustre: ptlrpc: remove bogus LASSERT James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 14/19] lustre: llite: skip request slot for lmv_revalidate_slaves() James Simmons
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

On client side, checking for owner/group quota can be directly
bypassed if this is for root and there is no root squash.

Fixes: cd633cfc96 ("lustre: quota: nodemap squashed root cannot bypass quota")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15141
Lustre-commit: f5fd5a363cc48e38c ("LU-15141 quota: optimize capability check for root squash")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45322
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 1211438..7b7b49f 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -2385,7 +2385,12 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 	}
 
 	/* check if the file's owner/group is over quota */
-	if (!io->ci_noquota) {
+	/* do not check for root without root squash, because in this case
+	 * we should bypass quota
+	 */
+	if ((!oio->oi_cap_sys_resource ||
+	     cli->cl_root_squash) &&
+	    !io->ci_noquota) {
 		struct cl_object *obj;
 		struct cl_attr *attr;
 		unsigned int qid[MAXQUOTAS];
@@ -2400,20 +2405,8 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 		qid[USRQUOTA] = attr->cat_uid;
 		qid[GRPQUOTA] = attr->cat_gid;
 		qid[PRJQUOTA] = attr->cat_projid;
-		/*
-		 * if EDQUOT returned for root, we double check
-		 * if root squash enabled or not updated from server side.
-		 * without root squash, we should bypass quota for root.
-		 */
-		if (rc == 0 && osc_quota_chkdq(cli, qid) == -EDQUOT) {
-			if (oio->oi_cap_sys_resource &&
-			    !cli->cl_root_squash) {
-				io->ci_noquota = 1;
-				rc = 0;
-			} else {
-				rc = -EDQUOT;
-			}
-		}
+		if (rc == 0 && osc_quota_chkdq(cli, qid) == -EDQUOT)
+			rc = -EDQUOT;
 		if (rc)
 			return rc;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 14/19] lustre: llite: skip request slot for lmv_revalidate_slaves()
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 13/19] lustre: quota: optimize capability check for root squash James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 15/19] lnet: set eth routes needed for multi rail James Simmons
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <c17819@cray.com>

Some syscalls need lmv_revalidate_slaves(). It requires
second lock enqueue and the it can be blocked by
lack of RPC slots.

Don't acquire rpc slot for second lock enqueue.

HPE-bug-id: LUS-8416
WC-bug-id: https://jira.whamcloud.com/browse/LU-15121
Lustre-commit: 7e781c605c4189ea1 ("LU-15121 llite: skip request slot for lmv_revalidate_slaves()")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/45275
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h |  7 +++++--
 fs/lustre/include/obd.h        |  1 +
 fs/lustre/ldlm/ldlm_request.c  | 18 +++++++++++-------
 fs/lustre/llite/statahead.c    |  1 +
 fs/lustre/lmv/lmv_intent.c     |  2 ++
 fs/lustre/mdc/mdc_dev.c        |  3 ++-
 fs/lustre/mdc/mdc_locks.c      |  5 +++--
 fs/lustre/osc/osc_request.c    |  2 +-
 8 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 1fc199b..a2fe9676 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -1018,7 +1018,9 @@ struct ldlm_enqueue_info {
 	/* whether enqueue slave stripes */
 	unsigned int		ei_enq_slave:1;
 	/* whether acquire rpc slot */
-	unsigned int		ei_enq_slot:1;
+	unsigned int		ei_req_slot:1;
+	/** whether acquire mod rpc slot */
+	unsigned int		ei_mod_slot:1;
 };
 
 extern struct obd_ops ldlm_obd_ops;
@@ -1343,7 +1345,8 @@ int ldlm_prep_elc_req(struct obd_export *exp,
 int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 			  struct ldlm_enqueue_info *einfo, u8 with_policy,
 			  u64 *flags, void *lvb, u32 lvb_len,
-			  const struct lustre_handle *lockh, int rc);
+			  const struct lustre_handle *lockh, int rc,
+			  bool request_slot);
 int ldlm_cli_convert_req(struct ldlm_lock *lock, u32 *flags, u64 new_bits);
 int ldlm_cli_convert(struct ldlm_lock *lock,
 		     enum ldlm_cancel_flags cancel_flags);
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 27acd33..58a5803 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -722,6 +722,7 @@ enum md_cli_flags {
 	CLI_API32		= BIT(3),
 	CLI_MIGRATE		= BIT(4),
 	CLI_DIRTY_DATA		= BIT(5),
+	CLI_NO_SLOT		= BIT(6),
 };
 
 enum md_op_code {
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 746c45b..44e1ec2 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -359,7 +359,9 @@ static bool ldlm_request_slot_needed(struct ldlm_enqueue_info *einfo)
 	/* exclude EXTENT locks and DOM-only IBITS locks because they
 	 * are asynchronous and don't wait on server being blocked.
 	 */
-	return einfo->ei_type == LDLM_FLOCK || einfo->ei_type == LDLM_IBITS;
+	return einfo->ei_req_slot &&
+	       (einfo->ei_type == LDLM_FLOCK ||
+		einfo->ei_type == LDLM_IBITS);
 }
 
 /**
@@ -371,7 +373,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 			  struct ldlm_enqueue_info *einfo,
 			  u8 with_policy, u64 *ldlm_flags, void *lvb,
 			  u32 lvb_len, const struct lustre_handle *lockh,
-			  int rc)
+			  int rc, bool request_slot)
 {
 	struct ldlm_namespace *ns = exp->exp_obd->obd_namespace;
 	const struct lu_env *env = NULL;
@@ -380,7 +382,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req,
 	struct ldlm_reply *reply;
 	int cleanup_phase = 1;
 
-	if (ldlm_request_slot_needed(einfo))
+	if (request_slot)
 		obd_put_request_slot(&req->rq_import->imp_obd->u.cli);
 
 	ptlrpc_put_mod_rpc_slot(req);
@@ -726,6 +728,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 	int is_replay = *flags & LDLM_FL_REPLAY;
 	int req_passed_in = 1;
 	int rc, err;
+	bool need_req_slot;
 	struct ptlrpc_request *req;
 
 	ns = exp->exp_obd->obd_namespace;
@@ -829,13 +832,14 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 	 * that threads that are waiting for a modify RPC slot are not polluting
 	 * our rpcs in flight counter.
 	 */
-	if (einfo->ei_enq_slot)
+	if (einfo->ei_mod_slot)
 		ptlrpc_get_mod_rpc_slot(req);
 
-	if (ldlm_request_slot_needed(einfo)) {
+	need_req_slot = ldlm_request_slot_needed(einfo);
+	if (need_req_slot) {
 		rc = obd_get_request_slot(&req->rq_import->imp_obd->u.cli);
 		if (rc) {
-			if (einfo->ei_enq_slot)
+			if (einfo->ei_mod_slot)
 				ptlrpc_put_mod_rpc_slot(req);
 			failed_lock_cleanup(ns, lock, einfo->ei_mode);
 			LDLM_LOCK_RELEASE(lock);
@@ -855,7 +859,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 	rc = ptlrpc_queue_wait(req);
 
 	err = ldlm_cli_enqueue_fini(exp, req, einfo, policy ? 1 : 0, flags,
-				    lvb, lvb_len, lockh, rc);
+				    lvb, lvb_len, lockh, rc, need_req_slot);
 
 	/*
 	 * If ldlm_cli_enqueue_fini did not find the lock, we need to free
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 4806e99..39ffb9d 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -380,6 +380,7 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
 	einfo->ei_cb_cp  = ldlm_completion_ast;
 	einfo->ei_cb_gl  = NULL;
 	einfo->ei_cbdata = NULL;
+	einfo->ei_req_slot = 1;
 
 	return minfo;
 }
diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c
index 93da2b3..906ca16 100644
--- a/fs/lustre/lmv/lmv_intent.c
+++ b/fs/lustre/lmv/lmv_intent.c
@@ -106,6 +106,7 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it,
 	}
 
 	op_data->op_bias = MDS_CROSS_REF;
+	op_data->op_cli_flags = CLI_NO_SLOT;
 	CDEBUG(D_INODE, "REMOTE_INTENT with fid=" DFID " -> mds #%u\n",
 	       PFID(&body->mbo_fid1), tgt->ltd_index);
 
@@ -203,6 +204,7 @@ int lmv_revalidate_slaves(struct obd_export *exp,
 		 * it's remote object.
 		 */
 		op_data->op_bias = MDS_CROSS_REF;
+		op_data->op_cli_flags = CLI_NO_SLOT;
 
 		tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[i].lmo_mds);
 		if (!tgt) {
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index b2f60ea..0b1d257 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -66,6 +66,7 @@ static void mdc_lock_build_einfo(const struct lu_env *env,
 	einfo->ei_cb_cp = ldlm_completion_ast;
 	einfo->ei_cb_gl = mdc_ldlm_glimpse_ast;
 	einfo->ei_cbdata = osc; /* value to be put into ->l_ast_data */
+	einfo->ei_req_slot = 1;
 }
 
 static void mdc_lock_lvb_update(const struct lu_env *env,
@@ -664,7 +665,7 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 	/* Complete obtaining the lock procedure. */
 	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags,
 				   aa->oa_lvb, aa->oa_lvb ?
-				   sizeof(*aa->oa_lvb) : 0, lockh, rc);
+				   sizeof(*aa->oa_lvb) : 0, lockh, rc, true);
 	/* Complete mdc stuff. */
 	rc = mdc_enqueue_fini(aa->oa_exp, req, aa->oa_upcall, aa->oa_cookie,
 			      lockh, mode, aa->oa_flags, rc);
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 4135c3a..66f0039 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -984,7 +984,8 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
 		req->rq_sent = ktime_get_real_seconds() + resends;
 	}
 
-	einfo->ei_enq_slot = !mdc_skip_mod_rpc_slot(it);
+	einfo->ei_req_slot = !(op_data->op_cli_flags & CLI_NO_SLOT);
+	einfo->ei_mod_slot = !mdc_skip_mod_rpc_slot(it);
 
 	/* With Data-on-MDT the glimpse callback is needed too.
 	 * It is set here in advance but not in mdc_finish_enqueue()
@@ -1371,7 +1372,7 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 		rc = -ETIMEDOUT;
 
 	rc = ldlm_cli_enqueue_fini(exp, req, einfo, 1, &flags, NULL, 0,
-				   lockh, rc);
+				   lockh, rc, true);
 	if (rc < 0) {
 		CERROR("%s: ldlm_cli_enqueue_fini() failed: rc = %d\n",
 		       exp->exp_obd->obd_name, rc);
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index cf79808..e065eab 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2824,7 +2824,7 @@ int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req,
 
 	/* Complete obtaining the lock procedure. */
 	rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags,
-				   lvb, lvb_len, lockh, rc);
+				   lvb, lvb_len, lockh, rc, false);
 	/* Complete osc stuff. */
 	rc = osc_enqueue_fini(req, aa->oa_upcall, aa->oa_cookie, lockh, mode,
 			      aa->oa_flags, aa->oa_speculative, rc);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 15/19] lnet: set eth routes needed for multi rail
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 14/19] lustre: llite: skip request slot for lmv_revalidate_slaves() James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 16/19] lustre: llite: Do not count tiny write twice James Simmons
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

When ksocklnd is initialized or new ethernet interfaces
are added via lnetctl, set the routing rules using a common
shell script ksocklnd-config. This ensures control over
source interface when sending traffic.

For example, for eth0 with ip 192.168.122.142/24:
   the output of "ip route show table eth0" should be
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.142

This step can be omitted by specifying
   options ksocklnd skip_mr_route_setup=1
in the conf file, or by using switch
   --skip-mr-route-setup
when adding NI with lnetctl. Note that the module parameter
takes priority over the lnetctl switch: if skip-mr-route-setup
is not specified when adding NI with lnetctl, the route still
won't get created if the conf file has skip_mr_route_setup=1.

The route also won't be created if any route already exists
for the given interface, assuming advanced users who manage
routing on their own will want to continue doing so.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14662
Lustre-commit: c9bfe57bd2495671f ("LU-14662 lnet: set eth routes needed for multi rail")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44065
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd_modparams.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c
index c00ea49..5eb58ca 100644
--- a/net/lnet/klnds/socklnd/socklnd_modparams.c
+++ b/net/lnet/klnds/socklnd/socklnd_modparams.c
@@ -147,6 +147,11 @@
 module_param(conns_per_peer, uint, 0644);
 MODULE_PARM_DESC(conns_per_peer, "number of connections per peer");
 
+/* By default skip_mr_route_setup is 0 (do not skip) */
+static unsigned int skip_mr_route_setup;
+module_param(skip_mr_route_setup, uint, 0444);
+MODULE_PARM_DESC(skip_mr_route_setup, "skip automatic setup of linux routes for MR");
+
 #if SOCKNAL_VERSION_DEBUG
 static int protocol = 3;
 module_param(protocol, int, 0644);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 16/19] lustre: llite: Do not count tiny write twice
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (14 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 15/19] lnet: set eth routes needed for multi rail James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 17/19] lustre: llite: mend the trunc_sem_up_write() James Simmons
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

We accidentally count bytes written with tiny write twice
in stats.  Remove the extra count.

This also has the positive effect of improving tiny write
performance by about 4% by removing an extra call to the
 stats code (the main cost is ktime_get()).

Before, 8 byte dd:
13.9 MiB/s
After:
14.3 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-15197
Lustre-commit: 5208135f432a320e9 ("LU-15197 llite: Do not count tiny write twice")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45476
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 6755671..d3374232 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2032,8 +2032,6 @@ static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter)
 
 	if (result > 0) {
 		ll_heat_add(inode, CIT_WRITE, result);
-		ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_WRITE_BYTES,
-				   result);
 		set_bit(LLIF_DATA_MODIFIED, &ll_i2info(inode)->lli_flags);
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 17/19] lustre: llite: mend the trunc_sem_up_write()
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (15 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 16/19] lustre: llite: Do not count tiny write twice James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 18/19] lnet: Netlink improvements James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 19/19] lnet: libcfs: separate daemon_list from cfs_trace_data James Simmons
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

The original lli_trunc_sem replace change (commit ae9e437745) fixed a
lock scenario:

t1 (page fault)          t2 (dio read)              t3 (truncate)
|- vm_mmap_pgoff()       |- vvp_io_read_start()     |- vvp_io_setattr
|- down_write(mmap_sem)  |- down_read(trunc_sem)            _start()
|- do_map()              |- ll_direct_IO_impl()
|- vvp_io_fault_start    |- ll_get_user_pages()

                                                    |- down_write(
                         |- down_read(mmap_sem)        trunc_sem)
|- down_read(trunc_sem)

t1 waits for read semaphore of trunc_sem which is hindered by t3,
since t3 is waiting for the write semaphore while t2 take its read
semaphore,and t2 is waiting for mmap_sem which has been taken by t1,
and a deadlock ensues.

commit ae9e437745 changes the down_read(trunc_sem) to
trunc_sem_down_read_nowait() in page fault path, to make it ignore
that there is a down_write(trunc_sem) waiting, just takes the read
semaphore if no writer has taken the semaphore, and breaks the
deadlock.

But there is a delicacy in using wake_up_var(), wake_up_var()->
__wake_up_bit()->waitqueue_active() locklessly test for waiters on the
queue, and if it's called without explicit smp_mb() it's possible for
the waitqueue_active() to ge hoisted before the condition store such
that we'll observe an empty wait list and the waiter might not
observe the condition, and the waiter won't get woke up whereafter.

Fixes: ae9e437745 ("lustre: llite: replace lli_trunc_sem")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14713
Lustre-commit: 39745c8b5493159bb ("LU-14713 llite: mend the trunc_sem_up_write()")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43844
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 7768c99..ce7431f 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -365,6 +365,8 @@ static inline void trunc_sem_down_write(struct ll_trunc_sem *sem)
 static inline void trunc_sem_up_write(struct ll_trunc_sem *sem)
 {
 	atomic_set(&sem->ll_trunc_readers, 0);
+	/* match the smp_mb() in wait_var_event()->prepare_to_wait() */
+	smp_mb();
 	wake_up_var(&sem->ll_trunc_readers);
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 18/19] lnet: Netlink improvements
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (16 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 17/19] lustre: llite: mend the trunc_sem_up_write() James Simmons
@ 2021-11-28 23:27 ` James Simmons
  2021-11-28 23:27 ` [lustre-devel] [PATCH 19/19] lnet: libcfs: separate daemon_list from cfs_trace_data James Simmons
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

With the expansion of the use of Netlink several issues have been
encountered. This patch fixes many of the issues. The issues are:

1) Fix idx handling in lnet_genl_parse_list() function. It needs
   to always been incremented. Some renaming suggestion for
   enum lnet_nl_scalar_attrs from Neil. New LN_SCALAR_ATTR_INT_VALUE
   to allow pushing integers as well as strings from userspace.

2) Create struct genl_filter_list which will be used to create
   a list of items to pass back to userland. This will be a common
   setup.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9680
Lustre-commit: 82835a1952dcb37e8 ("LU-9680 net: Netlink improvements")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44358
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-types.h    |  8 +++++++-
 include/uapi/linux/lnet/lnet-nl.h | 29 +++++++++++++++++++++++++----
 net/lnet/lnet/api-ni.c            |  9 +++++----
 3 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 628d133..7631044 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -1320,7 +1320,13 @@ struct lnet {
 	struct list_head		ln_udsp_list;
 };
 
-static const struct nla_policy scalar_attr_policy[LN_SCALAR_CNT + 1] = {
+struct genl_filter_list {
+	struct list_head	 lp_list;
+	void			*lp_cursor;
+	bool			 lp_first;
+};
+
+static const struct nla_policy scalar_attr_policy[LN_SCALAR_MAX + 1] = {
 	[LN_SCALAR_ATTR_LIST]		= { .type = NLA_NESTED },
 	[LN_SCALAR_ATTR_LIST_SIZE]	= { .type = NLA_U16 },
 	[LN_SCALAR_ATTR_INDEX]		= { .type = NLA_U16 },
diff --git a/include/uapi/linux/lnet/lnet-nl.h b/include/uapi/linux/lnet/lnet-nl.h
index f5bb67c..83f6e27 100644
--- a/include/uapi/linux/lnet/lnet-nl.h
+++ b/include/uapi/linux/lnet/lnet-nl.h
@@ -38,23 +38,44 @@ enum lnet_nl_key_format {
 	LNKF_SEQUENCE		= 4,
 };
 
+/**
+ * enum lnet_nl_scalar_attrs		- scalar LNet netlink attributes used
+ *					  to compose messages for sending or
+ *					  receiving.
+ *
+ * @LN_SCALAR_ATTR_UNSPEC:		unspecified attribute to catch errors
+ * @LN_SCALAR_ATTR_PAD:			padding for 64-bit attributes, ignore
+ *
+ * @LN_SCALAR_ATTR_LIST:		List of scalar attributes (NLA_NESTED)
+ * @LN_SCALAR_ATTR_LIST_SIZE:		Number of items in scalar list (NLA_U16)
+ * @LN_SCALAR_ATTR_INDEX:		True Netlink attr value (NLA_U16)
+ * @LN_SCALAR_ATTR_NLA_TYPE:		Data format for value part of the pair
+ *					(NLA_U16)
+ * @LN_SCALAR_ATTR_VALUE:		String value of key part of the pair.
+ *					(NLA_NUL_STRING)
+ * @LN_SCALAR_ATTR_INT_VALUE:		Numeric value of key part of the pair.
+ *					(NLA_S64)
+ * @LN_SCALAR_ATTR_KEY_FORMAT:		LNKF_* format of the key value pair.
+ */
 enum lnet_nl_scalar_attrs {
 	LN_SCALAR_ATTR_UNSPEC = 0,
-	LN_SCALAR_ATTR_LIST,
+	LN_SCALAR_ATTR_PAD = LN_SCALAR_ATTR_UNSPEC,
 
+	LN_SCALAR_ATTR_LIST,
 	LN_SCALAR_ATTR_LIST_SIZE,
 	LN_SCALAR_ATTR_INDEX,
 	LN_SCALAR_ATTR_NLA_TYPE,
 	LN_SCALAR_ATTR_VALUE,
+	LN_SCALAR_ATTR_INT_VALUE,
 	LN_SCALAR_ATTR_KEY_FORMAT,
 
-	__LN_SCALAR_ATTR_LAST,
+	__LN_SCALAR_ATTR_MAX_PLUS_ONE,
 };
 
-#define LN_SCALAR_CNT (__LN_SCALAR_ATTR_LAST - 1)
+#define LN_SCALAR_MAX (__LN_SCALAR_ATTR_MAX_PLUS_ONE - 1)
 
 struct ln_key_props {
-	char			*lkp_values;
+	char			*lkp_value;
 	__u16			lkp_key_format;
 	__u16			lkp_data_type;
 };
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 9d9d0e6..3ed3f0b 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -2670,9 +2670,9 @@ static int lnet_genl_parse_list(struct sk_buff *msg,
 				    list->lkl_maxattr);
 
 		nla_put_u16(msg, LN_SCALAR_ATTR_INDEX, count);
-		if (props[count].lkp_values)
+		if (props[count].lkp_value)
 			nla_put_string(msg, LN_SCALAR_ATTR_VALUE,
-				       props[count].lkp_values);
+				       props[count].lkp_value);
 		if (props[count].lkp_key_format)
 			nla_put_u16(msg, LN_SCALAR_ATTR_KEY_FORMAT,
 				    props[count].lkp_key_format);
@@ -2684,13 +2684,14 @@ static int lnet_genl_parse_list(struct sk_buff *msg,
 			rc = lnet_genl_parse_list(msg, data, ++idx);
 			if (rc < 0)
 				return rc;
+			idx = rc;
 		}
 
 		nla_nest_end(msg, key);
 	}
 
 	nla_nest_end(msg, node);
-	return 0;
+	return idx;
 }
 
 int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq,
@@ -2717,7 +2718,7 @@ int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq,
 canceled:
 	if (rc < 0)
 		genlmsg_cancel(msg, hdr);
-	return rc;
+	return rc > 0 ? 0 : rc;
 }
 EXPORT_SYMBOL(lnet_genl_send_scalar_list);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [lustre-devel] [PATCH 19/19] lnet: libcfs: separate daemon_list from cfs_trace_data
  2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
                   ` (17 preceding siblings ...)
  2021-11-28 23:27 ` [lustre-devel] [PATCH 18/19] lnet: Netlink improvements James Simmons
@ 2021-11-28 23:27 ` James Simmons
  18 siblings, 0 replies; 20+ messages in thread
From: James Simmons @ 2021-11-28 23:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

cfs_trace_data provides a fifo for trace messages.  To minimize
locking, there is a separate fifo for each CPU, and even for different
interrupt levels per-cpu.

When a page is remove from the fifo to br written to a file, the page
is added to a "daemon_list".  Trace message on the daemon_list have
already been logged to a file, but can be easily dumped to the console
when a bug occurs.

The daemon_list is always accessed from a single thread at a time, so
the per-CPU facilities for cfs_trace_data are not needed.  However
daemon_list is currently managed per-cpu as part of cfs_trace_data.

This patch moves the daemon_list of pages out to a separate structure
- a simple linked list, protected by cfs_tracefile_sem.

Rather than using a 'cfs_trace_page' to hold linkage information and
content size, we use page->lru for linkage and page->private for
the size of the content in each page.

This is a step towards replacing cfs_trace_data with the Linux
ring_buffer which provides similar functionality with even less
locking.

In the current code, if the daemon which writes trace data to a file
cannot keep up with load, excess pages are moved to the daemon_list
temporarily before being discarded.  With the patch, these page are
simply discarded immediately.
If the daemon thread cannot keep up, that is a configuration problem
and temporarily preserving a few pages is unlikely to really help.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14428
Lustre-commit: 848738a85d82bb71c ("LU-14428 libcfs: separate daemon_list from cfs_trace_data")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/41493
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/tracefile.c | 213 +++++++++++++++++++++++---------------------
 net/lnet/libcfs/tracefile.h |  17 +---
 2 files changed, 112 insertions(+), 118 deletions(-)

diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index e0ef234..b27732a 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -58,6 +58,13 @@ enum cfs_trace_buf_type {
 
 union cfs_trace_data_union (*cfs_trace_data[CFS_TCD_TYPE_CNT])[NR_CPUS] __cacheline_aligned;
 
+/* Pages containing records already processed by daemon.
+ * Link via ->lru, use size in ->private
+ */
+static LIST_HEAD(daemon_pages);
+static long daemon_pages_count;
+static long daemon_pages_max;
+
 char cfs_tracefile[TRACEFILE_NAME_SIZE];
 long long cfs_tracefile_size = CFS_TRACEFILE_SIZE;
 
@@ -68,12 +75,6 @@ enum cfs_trace_buf_type {
 
 struct page_collection {
 	struct list_head	pc_pages;
-	/*
-	 * if this flag is set, collect_pages() will spill both
-	 * ->tcd_daemon_pages and ->tcd_pages to the ->pc_pages. Otherwise,
-	 * only ->tcd_pages are spilled.
-	 */
-	int			pc_want_daemon_pages;
 };
 
 /*
@@ -103,9 +104,6 @@ struct cfs_trace_page {
 	unsigned short		type;
 };
 
-static void put_pages_on_tcd_daemon_list(struct page_collection *pc,
-					 struct cfs_trace_cpu_data *tcd);
-
 /* trace file lock routines */
 /*
  * The walking argument indicates the locking comes from all tcd types
@@ -296,10 +294,10 @@ static void cfs_tcd_shrink(struct cfs_trace_cpu_data *tcd)
 		if (!pgcount--)
 			break;
 
-		list_move_tail(&tage->linkage, &pc.pc_pages);
+		list_del(&tage->linkage);
+		cfs_tage_free(tage);
 		tcd->tcd_cur_pages--;
 	}
-	put_pages_on_tcd_daemon_list(&pc, tcd);
 }
 
 /* return a page that has 'len' bytes left at the end */
@@ -678,11 +676,6 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
 	cfs_tcd_for_each(tcd, i, j) {
 		list_splice_init(&tcd->tcd_pages, &pc->pc_pages);
 		tcd->tcd_cur_pages = 0;
-
-		if (pc->pc_want_daemon_pages) {
-			list_splice_init(&tcd->tcd_daemon_pages, &pc->pc_pages);
-			tcd->tcd_cur_daemon_pages = 0;
-		}
 	}
 }
 
@@ -695,11 +688,6 @@ static void collect_pages_on_all_cpus(struct page_collection *pc)
 		cfs_tcd_for_each_type_lock(tcd, i, cpu) {
 			list_splice_init(&tcd->tcd_pages, &pc->pc_pages);
 			tcd->tcd_cur_pages = 0;
-			if (pc->pc_want_daemon_pages) {
-				list_splice_init(&tcd->tcd_daemon_pages,
-						 &pc->pc_pages);
-				tcd->tcd_cur_daemon_pages = 0;
-			}
 		}
 	}
 }
@@ -746,64 +734,17 @@ static void put_pages_back(struct page_collection *pc)
 		put_pages_back_on_all_cpus(pc);
 }
 
-/* Add pages to a per-cpu debug daemon ringbuffer. This buffer makes sure that
- * we have a good amount of data at all times for dumping during an LBUG, even
- * if we have been steadily writing (and otherwise discarding) pages via the
- * debug daemon.
- */
-static void put_pages_on_tcd_daemon_list(struct page_collection *pc,
-					 struct cfs_trace_cpu_data *tcd)
-{
-	struct cfs_trace_page *tage;
-	struct cfs_trace_page *tmp;
-
-	list_for_each_entry_safe(tage, tmp, &pc->pc_pages, linkage) {
-		__LASSERT_TAGE_INVARIANT(tage);
-
-		if (tage->cpu != tcd->tcd_cpu || tage->type != tcd->tcd_type)
-			continue;
-
-		cfs_tage_to_tail(tage, &tcd->tcd_daemon_pages);
-		tcd->tcd_cur_daemon_pages++;
-
-		if (tcd->tcd_cur_daemon_pages > tcd->tcd_max_pages) {
-			struct cfs_trace_page *victim;
-
-			__LASSERT(!list_empty(&tcd->tcd_daemon_pages));
-			victim = cfs_tage_from_list(tcd->tcd_daemon_pages.next);
-
-			__LASSERT_TAGE_INVARIANT(victim);
-
-			list_del(&victim->linkage);
-			cfs_tage_free(victim);
-			tcd->tcd_cur_daemon_pages--;
-		}
-	}
-}
-
-static void put_pages_on_daemon_list(struct page_collection *pc)
-{
-	struct cfs_trace_cpu_data *tcd;
-	int i, cpu;
-
-	for_each_possible_cpu(cpu) {
-		cfs_tcd_for_each_type_lock(tcd, i, cpu)
-			put_pages_on_tcd_daemon_list(pc, tcd);
-	}
-}
-
 #ifdef CONFIG_LNET_DUMP_ON_PANIC
 void cfs_trace_debug_print(void)
 {
 	struct page_collection pc;
 	struct cfs_trace_page *tage;
 	struct cfs_trace_page *tmp;
+	struct page *page;
 
-	pc.pc_want_daemon_pages = 1;
 	collect_pages(&pc);
 	list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) {
 		char *p, *file, *fn;
-		struct page *page;
 
 		__LASSERT_TAGE_INVARIANT(tage);
 
@@ -830,6 +771,34 @@ void cfs_trace_debug_print(void)
 		list_del(&tage->linkage);
 		cfs_tage_free(tage);
 	}
+	down_write(&cfs_tracefile_sem);
+	while ((page = list_first_entry_or_null(&daemon_pages,
+						struct page, lru)) != NULL) {
+		char *p, *file, *fn;
+
+		p = page_address(page);
+		while (p < ((char *)page_address(page) + page->private)) {
+			struct ptldebug_header *hdr;
+			int len;
+
+			hdr = (void *)p;
+			p += sizeof(*hdr);
+			file = p;
+			p += strlen(file) + 1;
+			fn = p;
+			p += strlen(fn) + 1;
+			len = hdr->ph_len - (int)(p - (char *)hdr);
+
+			cfs_print_to_console(hdr, D_EMERG, file, fn,
+					     "%.*s", len, p);
+
+			p += len;
+		}
+		list_del_init(&page->lru);
+		daemon_pages_count -= 1;
+		put_page(page);
+	}
+	up_write(&cfs_tracefile_sem);
 }
 #endif /* CONFIG_LNET_DUMP_ON_PANIC */
 
@@ -840,6 +809,7 @@ int cfs_tracefile_dump_all_pages(char *filename)
 	struct cfs_trace_page *tage;
 	struct cfs_trace_page *tmp;
 	char *buf;
+	struct page *page;
 	int rc;
 
 	down_write(&cfs_tracefile_sem);
@@ -854,7 +824,6 @@ int cfs_tracefile_dump_all_pages(char *filename)
 		goto out;
 	}
 
-	pc.pc_want_daemon_pages = 1;
 	collect_pages(&pc);
 	if (list_empty(&pc.pc_pages)) {
 		rc = 0;
@@ -881,8 +850,20 @@ int cfs_tracefile_dump_all_pages(char *filename)
 		list_del(&tage->linkage);
 		cfs_tage_free(tage);
 	}
-
-	rc = vfs_fsync(filp, 1);
+	while ((page = list_first_entry_or_null(&daemon_pages,
+						struct page, lru)) != NULL) {
+		buf = page_address(page);
+		rc = kernel_write(filp, buf, page->private, &filp->f_pos);
+		if (rc != (int)page->private) {
+			pr_warn("Lustre: wanted to write %u but wrote %d\n",
+				(int)page->private, rc);
+			break;
+		}
+		list_del(&page->lru);
+		daemon_pages_count -= 1;
+		put_page(page);
+	}
+	rc = vfs_fsync_range(filp, 0, LLONG_MAX, 1);
 	if (rc)
 		pr_err("LustreError: sync returns: rc = %d\n", rc);
 close:
@@ -896,8 +877,8 @@ void cfs_trace_flush_pages(void)
 {
 	struct page_collection pc;
 	struct cfs_trace_page *tage;
+	struct page *page;
 
-	pc.pc_want_daemon_pages = 1;
 	collect_pages(&pc);
 	while (!list_empty(&pc.pc_pages)) {
 		tage = list_first_entry(&pc.pc_pages,
@@ -907,6 +888,15 @@ void cfs_trace_flush_pages(void)
 		list_del(&tage->linkage);
 		cfs_tage_free(tage);
 	}
+
+	down_write(&cfs_tracefile_sem);
+	while ((page = list_first_entry_or_null(&daemon_pages,
+						struct page, lru)) != NULL) {
+		list_del(&page->lru);
+		daemon_pages_count -= 1;
+		put_page(page);
+	}
+	up_write(&cfs_tracefile_sem);
 }
 
 int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
@@ -1039,6 +1029,7 @@ int cfs_trace_set_debug_mb(int mb)
 	cfs_tcd_for_each(tcd, i, j)
 		tcd->tcd_max_pages = (pages * tcd->tcd_pages_factor) / 100;
 
+	daemon_pages_max = pages;
 	up_write(&cfs_tracefile_sem);
 
 	return mb;
@@ -1071,9 +1062,10 @@ static int tracefiled(void *arg)
 	int last_loop = 0;
 	int rc;
 
-	pc.pc_want_daemon_pages = 0;
-
 	while (!last_loop) {
+		LIST_HEAD(for_daemon_pages);
+		int for_daemon_pages_count = 0;
+
 		schedule_timeout_interruptible(HZ);
 		if (kthread_should_stop())
 			last_loop = 1;
@@ -1095,38 +1087,55 @@ static int tracefiled(void *arg)
 			}
 		}
 		up_read(&cfs_tracefile_sem);
-		if (!filp) {
-			put_pages_on_daemon_list(&pc);
-			__LASSERT(list_empty(&pc.pc_pages));
-			continue;
-		}
 
 		list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) {
-			struct dentry *de = file_dentry(filp);
-			static loff_t f_pos;
-
 			__LASSERT_TAGE_INVARIANT(tage);
 
-			if (f_pos >= (off_t)cfs_tracefile_size)
-				f_pos = 0;
-			else if (f_pos > i_size_read(de->d_inode))
-				f_pos = i_size_read(de->d_inode);
-
-			buf = kmap(tage->page);
-			rc = kernel_write(filp, buf, tage->used, &f_pos);
-			kunmap(tage->page);
-
-			if (rc != (int)tage->used) {
-				pr_warn("Lustre: wanted to write %u but wrote %d\n",
-					tage->used, rc);
-				put_pages_back(&pc);
-				__LASSERT(list_empty(&pc.pc_pages));
-				break;
+			if (filp) {
+				struct dentry *de = file_dentry(filp);
+				static loff_t f_pos;
+
+				if (f_pos >= (off_t)cfs_tracefile_size)
+					f_pos = 0;
+				else if (f_pos > i_size_read(de->d_inode))
+					f_pos = i_size_read(de->d_inode);
+
+				buf = kmap(tage->page);
+				rc = kernel_write(filp, buf, tage->used,
+						  &f_pos);
+				kunmap(tage->page);
+				if (rc != (int)tage->used) {
+					pr_warn("Lustre: wanted to write %u but wrote %d\n",
+						tage->used, rc);
+					put_pages_back(&pc);
+					__LASSERT(list_empty(&pc.pc_pages));
+					break;
+				}
 			}
+			list_del_init(&tage->linkage);
+			list_add_tail(&tage->page->lru, &for_daemon_pages);
+			for_daemon_pages_count += 1;
+
+			tage->page->private = (int)tage->used;
+			kfree(tage);
+			atomic_dec(&cfs_tage_allocated);
 		}
 
-		filp_close(filp, NULL);
-		put_pages_on_daemon_list(&pc);
+		if (filp)
+			filp_close(filp, NULL);
+
+		down_write(&cfs_tracefile_sem);
+		list_splice_tail(&for_daemon_pages, &daemon_pages);
+		daemon_pages_count += for_daemon_pages_count;
+		while (daemon_pages_count > daemon_pages_max) {
+			struct page *p = list_first_entry(&daemon_pages,
+							  struct page, lru);
+			list_del(&p->lru);
+			put_page(p);
+			daemon_pages_count -= 1;
+		}
+		up_write(&cfs_tracefile_sem);
+
 		if (!list_empty(&pc.pc_pages)) {
 			int i;
 
@@ -1233,14 +1242,13 @@ int cfs_tracefile_init(int max_pages)
 		tcd->tcd_cpu = j;
 		INIT_LIST_HEAD(&tcd->tcd_pages);
 		INIT_LIST_HEAD(&tcd->tcd_stock_pages);
-		INIT_LIST_HEAD(&tcd->tcd_daemon_pages);
 		tcd->tcd_cur_pages = 0;
 		tcd->tcd_cur_stock_pages = 0;
-		tcd->tcd_cur_daemon_pages = 0;
 		tcd->tcd_max_pages = (max_pages * factor) / 100;
 		LASSERT(tcd->tcd_max_pages > 0);
 		tcd->tcd_shutting_down = 0;
 	}
+	daemon_pages_max = max_pages;
 
 	return 0;
 
@@ -1299,5 +1307,6 @@ static void cfs_trace_cleanup(void)
 void cfs_tracefile_exit(void)
 {
 	cfs_trace_stop_thread();
+	cfs_trace_flush_pages();
 	cfs_trace_cleanup();
 }
diff --git a/net/lnet/libcfs/tracefile.h b/net/lnet/libcfs/tracefile.h
index af21e4a..6293a9c 100644
--- a/net/lnet/libcfs/tracefile.h
+++ b/net/lnet/libcfs/tracefile.h
@@ -92,22 +92,7 @@ int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
 		unsigned long		tcd_cur_pages;
 
 		/*
-		 * pages with trace records already processed by
-		 * tracefiled. These pages are kept in memory, so that some
-		 * portion of log can be written in the event of LBUG. This
-		 * list is maintained in LRU order.
-		 *
-		 * Pages are moved to ->tcd_daemon_pages by tracefiled()
-		 * (put_pages_on_daemon_list()). LRU pages from this list are
-		 * discarded when list grows too large.
-		 */
-		struct list_head	tcd_daemon_pages;
-		/* number of pages on ->tcd_daemon_pages */
-		unsigned long		tcd_cur_daemon_pages;
-
-		/*
-		 * Maximal number of pages allowed on ->tcd_pages and
-		 * ->tcd_daemon_pages each.
+		 * Maximal number of pages allowed on ->tcd_pages
 		 * Always TCD_MAX_PAGES * tcd_pages_factor / 100 in current
 		 * implementation.
 		 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-11-28 23:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-28 23:27 [lustre-devel] [PATCH 00/19] lustre: update to OpenSFS tree Nov 28, 2021 James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 02/19] lnet: change tp_nid to 16byte in lnet_test_peer James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 03/19] lnet: extend preferred nids in struct lnet_peer_ni James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 04/19] lnet: switch to large lnet_processid for matching James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 06/19] lustre: llite: tighten condition for fault not drop mmap_sem James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 07/19] lnet: o2iblnd: map_on_demand not needed for frag interop James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 08/19] lnet: o2iblnd: Fix logic for unaligned transfer James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 09/19] lnet: Reset ni_ping_count only on receive James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 10/19] lustre: ptlrpc: fix timeout after spurious wakeup James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 11/19] lnet: Fail peer add for existing gw peer James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 12/19] lustre: ptlrpc: remove bogus LASSERT James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 13/19] lustre: quota: optimize capability check for root squash James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 14/19] lustre: llite: skip request slot for lmv_revalidate_slaves() James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 15/19] lnet: set eth routes needed for multi rail James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 16/19] lustre: llite: Do not count tiny write twice James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 17/19] lustre: llite: mend the trunc_sem_up_write() James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 18/19] lnet: Netlink improvements James Simmons
2021-11-28 23:27 ` [lustre-devel] [PATCH 19/19] lnet: libcfs: separate daemon_list from cfs_trace_data James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).