lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022
@ 2022-03-20 13:30 James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 01/50] lustre: type cleanups and remove debug statements James Simmons
                   ` (49 more replies)
  0 siblings, 50 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Backport the patches that landed to OpenSFS tree that have landed
as of March 20, 2022.

Alexey Lyashkov (2):
  lnet: o2iblnd: avoid static allocation for msg tx
  lnet: o2iblnd: avoid memory copy for short msg

Andriy Skulysh (2):
  lustre: fld: don't obtain a slot for fld request
  lustre: osc: osc_extent_wait() deadlock

Chris Horn (5):
  lnet: Avoid peer NI recovery for local interface
  lnet: Check LNET_NID_IS_ANY in LNET_NID_NET
  lnet: lnet_peer_data_present() memory leak
  lnet: Don't use pref NI for reserved portal
  lnet: Stop discovery on deleted peer NI

Hongchao Zhang (1):
  lustre: quota: delete unused quota ID

James Simmons (2):
  lustre: type cleanups and remove debug statements
  lustre: osc: Fix grant test for ARM

John L. Hammond (1):
  lustre: osc: add OBD_IOC_GETATTR support for osc

Lai Siyao (3):
  lustre: llite: deadlock in ll_new_node()
  lustre: llite: LL_IOC_LMV_GETSTRIPE 'default' shows inherit layout
  lustre: llite: set default LMV hash type with 2.12 MDS

Mr NeilBrown (19):
  lnet: extend nids in struct lnet_msg
  lnet: Change lnet_send() to take large-addr nids
  lnet: use large nids in struct lnet_event
  lnet: socklnd: prepare for new KSOCK_MSG type
  lnet: socklnd: don't deref lnet_hdr in LNDs
  lnet: separate lnet_hdr in msg from that in lnd.
  lnet: change lnet_hdr to store large nids.
  lnet: change lnet_prep_send to take net_processid
  lnet: convert to struct lnet_process_id in lib-move
  lnet: convert LNetGetID to return an large-addr pid
  lnet: alter lnd_notify_peer_down() to take lnet_nid
  lnet: socklnd: move lnet_hdr unpack into ->pro_unpack
  lnet: socklnd: Change ksock_hello_msg to struct lnet_nid
  lnet: socklnd: add hello message version 4
  lnet: Convert ping to support 16-bytes address
  lnet: convert nids in lnet_parse to lnet_nid
  lnet: change src_nid arg to lnet_parse() to 16byte
  lnet: Fix NULL-deref in lnet_nidstr_r()
  lnet: change lnet_del_route() to take lnet_nid

Oleg Drokin (2):
  lustre: update version to 2.14.57
  lustre: llite: Delay dput in ll_dirty_page_discard_warn

Patrick Farrell (4):
  lustre: llite: Move free user pages
  lustre: llite: Do not get/put DIO pages
  lustre: llite: Remove unnecessary page get/put
  lnet: libcfs: Use FAIL_CHECK_QUIET for fake i/o

Qian Yingjin (1):
  lustre: hsm: update size upon completion of data version

Sebastien Buisson (4):
  lustre: sec: make client encryption compatible with ext4
  lustre: sec: allow subdir mount of encrypted dir
  lustre: sec: present .fscrypt in subdir mount
  lustre: sec: fix DIO for encrypted files

Serguei Smirnov (1):
  lnet: improve hash distribution across CPTs

Shaun Tancheff (1):
  lustre: ptlrpc: Use after free of 'conn' in rhashtable retry

Vladimir Saveliev (2):
  lustre: fld: repeat rpc in fld_client_rpc after EAGAIN
  lustre: llite: clear async errors on write commit sync

 fs/lustre/fld/fld_request.c             |  22 +-
 fs/lustre/include/cl_object.h           |  21 +-
 fs/lustre/include/lustre_dlm.h          |   2 +-
 fs/lustre/include/lustre_net.h          |   6 +-
 fs/lustre/include/lustre_osc.h          |   2 +-
 fs/lustre/include/obd.h                 |   3 +
 fs/lustre/include/obd_class.h           |   4 +-
 fs/lustre/include/obd_support.h         |   1 +
 fs/lustre/ldlm/ldlm_lib.c               |   6 +-
 fs/lustre/ldlm/ldlm_request.c           |   2 +-
 fs/lustre/llite/crypto.c                |  51 ++-
 fs/lustre/llite/dir.c                   |  68 +++-
 fs/lustre/llite/file.c                  |   8 +-
 fs/lustre/llite/llite_internal.h        |  16 +-
 fs/lustre/llite/llite_lib.c             | 102 ++++--
 fs/lustre/llite/namei.c                 |  13 +-
 fs/lustre/llite/rw26.c                  |  52 +--
 fs/lustre/llite/vvp_io.c                |   5 +
 fs/lustre/llite/vvp_page.c              |  36 +-
 fs/lustre/llite/xattr.c                 |  65 ++--
 fs/lustre/lmv/lmv_obd.c                 |  12 +-
 fs/lustre/lov/lov_pack.c                |   5 +-
 fs/lustre/mdc/mdc_dev.c                 |   1 +
 fs/lustre/mdc/mdc_request.c             |  28 +-
 fs/lustre/obdclass/cl_io.c              |  25 +-
 fs/lustre/obdclass/cl_page.c            |   9 +-
 fs/lustre/obdclass/lprocfs_status.c     |   8 +-
 fs/lustre/obdclass/lu_object.c          |   8 +-
 fs/lustre/osc/osc_cache.c               |  39 ++-
 fs/lustre/osc/osc_io.c                  |  30 +-
 fs/lustre/osc/osc_page.c                |   1 -
 fs/lustre/osc/osc_request.c             | 122 ++++---
 fs/lustre/ptlrpc/client.c               |   6 +-
 fs/lustre/ptlrpc/connection.c           |  26 +-
 fs/lustre/ptlrpc/events.c               |  10 +-
 fs/lustre/ptlrpc/import.c               |   6 +-
 fs/lustre/ptlrpc/layout.c               |   5 +-
 fs/lustre/ptlrpc/niobuf.c               |  10 +-
 fs/lustre/ptlrpc/pack_generic.c         |  10 +-
 fs/lustre/ptlrpc/ptlrpc_internal.h      |   2 +-
 fs/lustre/ptlrpc/sec.c                  |   4 +-
 fs/lustre/ptlrpc/sec_config.c           |   9 +-
 fs/lustre/ptlrpc/service.c              |   2 +-
 include/linux/libcfs/libcfs_fail.h      |   6 +-
 include/linux/lnet/api.h                |   4 +-
 include/linux/lnet/lib-lnet.h           |  77 ++++-
 include/linux/lnet/lib-types.h          |  12 +-
 include/linux/lnet/socklnd.h            |  67 ++--
 include/uapi/linux/lnet/lnet-idl.h      |  40 ++-
 include/uapi/linux/lnet/lnet-types.h    |  17 +-
 include/uapi/linux/lustre/lustre_idl.h  |   5 +-
 include/uapi/linux/lustre/lustre_user.h |   6 +-
 include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd-idl.h    |   6 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |   8 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |   5 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c     | 154 ++++++---
 net/lnet/klnds/socklnd/socklnd.c        |  47 ++-
 net/lnet/klnds/socklnd/socklnd.h        |  11 +-
 net/lnet/klnds/socklnd/socklnd_cb.c     | 134 ++++----
 net/lnet/klnds/socklnd/socklnd_proto.c  | 307 ++++++++++++++---
 net/lnet/lnet/api-ni.c                  | 168 +++++-----
 net/lnet/lnet/lib-move.c                | 564 ++++++++++++++++----------------
 net/lnet/lnet/lib-msg.c                 |  43 ++-
 net/lnet/lnet/lib-ptl.c                 |   6 +-
 net/lnet/lnet/lo.c                      |   3 +-
 net/lnet/lnet/net_fault.c               |  12 +-
 net/lnet/lnet/nidstrings.c              |   6 +-
 net/lnet/lnet/peer.c                    |  86 +++--
 net/lnet/lnet/router.c                  |  33 +-
 net/lnet/selftest/console.c             |   4 +-
 net/lnet/selftest/rpc.c                 |   6 +-
 72 files changed, 1632 insertions(+), 1072 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 01/50] lustre: type cleanups and remove debug statements
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 02/50] lustre: osc: Fix grant test for ARM James Simmons
                   ` (48 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Remove pr_info() left over from earlier debugging.
Remove __u[32|64] from core kernel code that slipped in.

Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h  | 2 +-
 fs/lustre/lov/lov_pack.c        | 5 +----
 fs/lustre/obdclass/lu_object.c  | 8 ++++----
 fs/lustre/ptlrpc/layout.c       | 2 +-
 fs/lustre/ptlrpc/pack_generic.c | 2 +-
 5 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index a2fe9676..9286bec 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -1451,7 +1451,7 @@ static inline int ldlm_extent_contain(const struct ldlm_extent *ex1,
 	return ex1->start <= ex2->start && ex1->end >= ex2->end;
 }
 
-int ldlm_inodebits_drop(struct ldlm_lock *lock,  __u64 to_drop);
+int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop);
 
 #endif
 /** @} LDLM */
diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c
index 438bf36..2a080ff 100644
--- a/fs/lustre/lov/lov_pack.c
+++ b/fs/lustre/lov/lov_pack.c
@@ -195,10 +195,8 @@ ssize_t lov_lsm_pack(const struct lov_stripe_md *lsm, void *buf,
 	if (lsm->lsm_magic == LOV_MAGIC_V1 || lsm->lsm_magic == LOV_MAGIC_V3)
 		return lov_lsm_pack_v1v3(lsm, buf, buf_size);
 
-	if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) {
-		pr_info("calling lov_lsm_pack_foreign\n");
+	if (lsm->lsm_magic == LOV_MAGIC_FOREIGN)
 		return lov_lsm_pack_foreign(lsm, buf, buf_size);
-	}
 
 	lmm_size = lov_comp_md_size(lsm);
 	if (buf_size == 0)
@@ -378,7 +376,6 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 
 	lmm_size = lov_lsm_pack(lsm, lmmk, lmmk_size);
 	if (lmm_size < 0) {
-		pr_info("lov_lsm_pack return rc = %zd\n", lmm_size);
 		rc = lmm_size;
 		goto out_free;
 	}
diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c
index b49e0cd..25b47d8 100644
--- a/fs/lustre/obdclass/lu_object.c
+++ b/fs/lustre/obdclass/lu_object.c
@@ -1810,7 +1810,7 @@ int lu_context_refill(struct lu_context *ctx)
 u32 lu_context_tags_default = LCT_CL_THREAD;
 u32 lu_session_tags_default = LCT_SESSION;
 
-void lu_context_tags_update(__u32 tags)
+void lu_context_tags_update(u32 tags)
 {
 	spin_lock(&lu_context_remembered_guard);
 	lu_context_tags_default |= tags;
@@ -1819,7 +1819,7 @@ void lu_context_tags_update(__u32 tags)
 }
 EXPORT_SYMBOL(lu_context_tags_update);
 
-void lu_context_tags_clear(__u32 tags)
+void lu_context_tags_clear(u32 tags)
 {
 	spin_lock(&lu_context_remembered_guard);
 	lu_context_tags_default &= ~tags;
@@ -1828,7 +1828,7 @@ void lu_context_tags_clear(__u32 tags)
 }
 EXPORT_SYMBOL(lu_context_tags_clear);
 
-void lu_session_tags_update(__u32 tags)
+void lu_session_tags_update(u32 tags)
 {
 	spin_lock(&lu_context_remembered_guard);
 	lu_session_tags_default |= tags;
@@ -1837,7 +1837,7 @@ void lu_session_tags_update(__u32 tags)
 }
 EXPORT_SYMBOL(lu_session_tags_update);
 
-void lu_session_tags_clear(__u32 tags)
+void lu_session_tags_clear(u32 tags)
 {
 	spin_lock(&lu_context_remembered_guard);
 	lu_session_tags_default &= ~tags;
diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c
index f31ab6e..8e3c97d 100644
--- a/fs/lustre/ptlrpc/layout.c
+++ b/fs/lustre/ptlrpc/layout.c
@@ -854,7 +854,7 @@ enum rmf_flags {
 	.rmf_name	= (name),				\
 	.rmf_flags	= (flags),				\
 	.rmf_size	= (size),				\
-	.rmf_swab_len	= (int (*)(void *, __u32))(swab_len),	\
+	.rmf_swab_len	= (int (*)(void *, u32))(swab_len),	\
 	.rmf_dumper	= (void (*)(void *))(dumper)		\
 }
 
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index b41f51d..e06c421 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -1246,7 +1246,7 @@ u32 lustre_msg_calc_cksum(struct lustre_msg *msg, u32 buf)
 
 #if IS_ENABLED(CONFIG_CRC32)
 		/* about 10x faster than crypto_hash for small buffers */
-		crc = crc32_le(~(__u32)0, (unsigned char *)pb, len);
+		crc = crc32_le(~(u32)0, (unsigned char *)pb, len);
 #elif IS_ENABLED(CONFIG_CRYPTO_CRC32)
 		unsigned int hsize = 4;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 02/50] lustre: osc: Fix grant test for ARM
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 01/50] lustre: type cleanups and remove debug statements James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 03/50] lnet: extend nids in struct lnet_msg James Simmons
                   ` (47 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

If both OST and OSC support OBD_CONNECT_GRANT_PARAM, OST side will not
change client side claimed grant (a.k.a. o_grant_used) regardless of
the client page size. So no grant loss in this case.

Fixes: 4d8c38a8a882 ("lustre: grant: add support for OBD_CONNECT_GRANT_PARAM")
WC-bug-id: https://jira.whamcloud.com/browse/LU-11596
Lustre-commit: 7d3edce0650f0b66b ("LU-11596 osc: Fix and re-enable sanity grant test for ARM")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/40758
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/osc/osc_cache.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 7b7b49f..7bd28c5 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -905,11 +905,13 @@ int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 
 	if (!sent) {
 		lost_grant = ext->oe_grants;
-	} else if (blocksize < PAGE_SIZE &&
+	} else if (cli->cl_ocd_grant_param == 0 &&
+		   blocksize < PAGE_SIZE &&
 		   last_count != PAGE_SIZE) {
-		/* For short writes we shouldn't count parts of pages that
-		 * span a whole chunk on the OST side, or our accounting goes
-		 * wrong.  Should match the code in filter_grant_check.
+		/* For short writes without OBD_CONNECT_GRANT support, we
+		 * shouldn't count parts of pages that span a whole chunk on
+		 * the OST side, or our accounting goes wrong. Should match
+		 * the code in tgt_grant_check.
 		 */
 		int offset = last_off & ~PAGE_MASK;
 		int count = last_count + (offset & (blocksize - 1));
@@ -1505,11 +1507,11 @@ static void osc_unreserve_grant(struct client_obd *cli,
  * used, we should return these grants to OST. There're two cases where grants
  * can be lost:
  * 1. truncate;
- * 2. blocksize at OST is less than PAGE_SIZE and a partial page was
- *    written. In this case OST may use less chunks to serve this partial
- *    write. OSTs don't actually know the page size on the client side. so
- *    clients have to calculate lost grant by the blocksize on the OST.
- *    See filter_grant_check() for details.
+ * 2. Without OBD_CONNECT_GRANT support and blocksize at OST is less than
+ *    PAGE_SIZE and a partial page was written. In this case OST may use less
+ *    chunks to serve this partial write. OSTs don't actually know the page
+ *    size on the client side. so clients have to calculate lost grant by the
+ *    blocksize on the OST. See tgt_grant_check() for details.
  */
 static void osc_free_grant(struct client_obd *cli, unsigned int nr_pages,
 			   unsigned int lost_grant, unsigned int dirty_grant)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 03/50] lnet: extend nids in struct lnet_msg
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 01/50] lustre: type cleanups and remove debug statements James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 02/50] lustre: osc: Fix grant test for ARM James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 04/50] lnet: Change lnet_send() to take large-addr nids James Simmons
                   ` (46 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

struct lnet_msg contains 3 nids and one process_id (which itself
contains a nid.  Replace each of these with the 'struct lnet_nid'
version.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 57c03f3070753146a ("LU-10391 lnet: extend nids in struct lnet_msg")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43598
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  2 +-
 include/linux/lnet/lib-types.h      | 10 +++---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 29 ++++++++--------
 net/lnet/klnds/socklnd/socklnd_cb.c |  8 ++---
 net/lnet/lnet/lib-move.c            | 68 +++++++++++++++++++------------------
 net/lnet/lnet/lib-msg.c             | 21 ++++++------
 net/lnet/lnet/lib-ptl.c             |  2 +-
 net/lnet/lnet/peer.c                | 24 ++++++-------
 8 files changed, 81 insertions(+), 83 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 02eae6b..29a6252 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -853,7 +853,7 @@ struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp,
 struct lnet_peer_ni *lnet_peer_ni_find_locked(struct lnet_nid *nid);
 struct lnet_peer *lnet_find_peer(lnet_nid_t nid);
 void lnet_peer_net_added(struct lnet_net *net);
-lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid);
+void lnet_peer_primary_nid_locked(lnet_nid_t nid, struct lnet_nid *result);
 int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block);
 void lnet_peer_queue_message(struct lnet_peer *lp, struct lnet_msg *msg);
 int lnet_peer_discovery_start(void);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index eb736a5..40767e6 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -114,19 +114,19 @@ struct lnet_msg {
 	struct list_head	msg_activelist;
 	struct list_head	msg_list;	/* Q for credits/MD */
 
-	struct lnet_process_id	msg_target;
+	struct lnet_processid	msg_target;
 	/* Primary NID of the source. */
-	lnet_nid_t		msg_initiator;
+	struct lnet_nid		msg_initiator;
 	/* where is it from, it's only for building event */
-	lnet_nid_t		msg_from;
+	struct lnet_nid		msg_from;
 	u32			msg_type;
 
 	/*
 	 * hold parameters in case message is with held due
 	 * to discovery
 	 */
-	lnet_nid_t		msg_src_nid_param;
-	lnet_nid_t		msg_rtr_nid_param;
+	struct lnet_nid		msg_src_nid_param;
+	struct lnet_nid		msg_rtr_nid_param;
 
 	/*
 	 * Deadline for the message after which it will be finalized if it
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 7560fe1..8168a26 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1512,7 +1512,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 {
 	struct lnet_hdr *hdr = &lntmsg->msg_hdr;
 	int type = lntmsg->msg_type;
-	struct lnet_process_id target = lntmsg->msg_target;
+	struct lnet_processid *target = &lntmsg->msg_target;
 	int target_is_router = lntmsg->msg_target_is_router;
 	int routing = lntmsg->msg_routing;
 	unsigned int payload_niov = lntmsg->msg_niov;
@@ -1527,9 +1527,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	int rc;
 
 	/* NB 'private' is different depending on what we're sending.... */
-
 	CDEBUG(D_NET, "sending %d bytes in %d frags to %s\n",
-	       payload_nob, payload_niov, libcfs_id2str(target));
+	       payload_nob, payload_niov, libcfs_idstr(target));
 
 	LASSERT(!payload_nob || payload_niov > 0);
 	LASSERT(payload_niov <= LNET_MAX_IOV);
@@ -1543,11 +1542,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	iov_iter_advance(&from, payload_offset);
 
-	tx = kiblnd_get_idle_tx(ni, target.nid);
+	tx = kiblnd_get_idle_tx(ni, lnet_nid_to_nid4(&target->nid));
 	if (!tx) {
 		CERROR("Can't allocate %s txd for %s\n",
 		       lnet_msgtyp2str(type),
-		       libcfs_nid2str(target.nid));
+		       libcfs_nidstr(&target->nid));
 		return -ENOMEM;
 	}
 	ibmsg = tx->tx_msg;
@@ -1576,7 +1575,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 					  payload_offset, payload_nob);
 		if (rc) {
 			CERROR("Can't setup GET sink for %s: %d\n",
-			       libcfs_nid2str(target.nid), rc);
+			       libcfs_nidstr(&target->nid), rc);
 			tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR;
 			kiblnd_tx_done(tx);
 			return -EIO;
@@ -1591,7 +1590,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		tx->tx_lntmsg[1] = lnet_create_reply_msg(ni, lntmsg);
 		if (!tx->tx_lntmsg[1]) {
 			CERROR("Can't create reply for GET -> %s\n",
-			       libcfs_nid2str(target.nid));
+			       libcfs_nidstr(&target->nid));
 			kiblnd_tx_done(tx);
 			return -EIO;
 		}
@@ -1599,7 +1598,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		/* finalise lntmsg[0,1] on completion */
 		tx->tx_lntmsg[0] = lntmsg;
 		tx->tx_waiting = 1;		/* waiting for GET_DONE */
-		kiblnd_launch_tx(ni, tx, target.nid);
+		kiblnd_launch_tx(ni, tx, lnet_nid_to_nid4(&target->nid));
 		return 0;
 
 	case LNET_MSG_REPLY:
@@ -1614,7 +1613,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 					  payload_offset, payload_nob);
 		if (rc) {
 			CERROR("Can't setup PUT src for %s: %d\n",
-			       libcfs_nid2str(target.nid), rc);
+			       libcfs_nidstr(&target->nid), rc);
 			kiblnd_tx_done(tx);
 			return -EIO;
 		}
@@ -1626,7 +1625,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		/* finalise lntmsg on completion */
 		tx->tx_lntmsg[0] = lntmsg;
 		tx->tx_waiting = 1;		/* waiting for PUT_{ACK,NAK} */
-		kiblnd_launch_tx(ni, tx, target.nid);
+		kiblnd_launch_tx(ni, tx, lnet_nid_to_nid4(&target->nid));
 		return 0;
 	}
 
@@ -1651,14 +1650,14 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	/* finalise lntmsg on completion */
 	tx->tx_lntmsg[0] = lntmsg;
 
-	kiblnd_launch_tx(ni, tx, target.nid);
+	kiblnd_launch_tx(ni, tx, lnet_nid_to_nid4(&target->nid));
 	return 0;
 }
 
 static void
 kiblnd_reply(struct lnet_ni *ni, struct kib_rx *rx, struct lnet_msg *lntmsg)
 {
-	struct lnet_process_id target = lntmsg->msg_target;
+	struct lnet_processid *target = &lntmsg->msg_target;
 	unsigned int niov = lntmsg->msg_niov;
 	struct bio_vec *kiov = lntmsg->msg_kiov;
 	unsigned int offset = lntmsg->msg_offset;
@@ -1669,7 +1668,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	tx = kiblnd_get_idle_tx(ni, rx->rx_conn->ibc_peer->ibp_nid);
 	if (!tx) {
 		CERROR("Can't get tx for REPLY to %s\n",
-		       libcfs_nid2str(target.nid));
+		       libcfs_nidstr(&target->nid));
 		goto failed_0;
 	}
 
@@ -1681,7 +1680,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	if (rc) {
 		CERROR("Can't setup GET src for %s: %d\n",
-		       libcfs_nid2str(target.nid), rc);
+		       libcfs_nidstr(&target->nid), rc);
 		goto failed_1;
 	}
 
@@ -1691,7 +1690,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			      rx->rx_msg->ibm_u.get.ibgm_cookie);
 	if (rc < 0) {
 		CERROR("Can't setup rdma for GET from %s: %d\n",
-		       libcfs_nid2str(target.nid), rc);
+		       libcfs_nidstr(&target->nid), rc);
 		goto failed_1;
 	}
 
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index b2a1267..d0c3628 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -911,7 +911,7 @@ struct ksock_conn_cb *
 {
 	unsigned int mpflag = 0;
 	int type = lntmsg->msg_type;
-	struct lnet_processid target;
+	struct lnet_processid *target = &lntmsg->msg_target;
 	unsigned int payload_niov = lntmsg->msg_niov;
 	struct bio_vec *payload_kiov = lntmsg->msg_kiov;
 	unsigned int payload_offset = lntmsg->msg_offset;
@@ -923,11 +923,9 @@ struct ksock_conn_cb *
 	/* NB 'private' is different depending on what we're sending.
 	 * Just ignore it...
 	 */
-	target.pid = lntmsg->msg_target.pid;
-	lnet_nid4_to_nid(lntmsg->msg_target.nid, &target.nid);
 
 	CDEBUG(D_NET, "sending %u bytes in %d frags to %s\n",
-	       payload_nob, payload_niov, libcfs_idstr(&target));
+	       payload_nob, payload_niov, libcfs_idstr(target));
 
 	LASSERT(!payload_nob || payload_niov > 0);
 	LASSERT(payload_niov <= LNET_MAX_IOV);
@@ -965,7 +963,7 @@ struct ksock_conn_cb *
 	tx->tx_msg.ksm_zc_cookies[1] = 0;
 
 	/* The first fragment will be set later in pro_pack */
-	rc = ksocknal_launch_packet(ni, tx, &target);
+	rc = ksocknal_launch_packet(ni, tx, target);
 	if (mpflag)
 		memalloc_noreclaim_restore(mpflag);
 
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 8d4fd4d..83c93ca 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -517,7 +517,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	       unsigned int offset, unsigned int len)
 {
 	msg->msg_type = type;
-	msg->msg_target = target;
+	msg->msg_target.pid = target.pid;
+	lnet_nid4_to_nid(target.nid, &msg->msg_target.nid);
 	msg->msg_len = len;
 	msg->msg_offset = offset;
 
@@ -567,7 +568,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	if (rc) {
 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
 		       libcfs_nidstr(&msg->msg_rxpeer->lpni_nid),
-		       libcfs_id2str(msg->msg_target), rc);
+		       libcfs_idstr(&msg->msg_target), rc);
 		LASSERT(rc < 0); /* required by my callers */
 	}
 
@@ -671,7 +672,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 					LNET_STATS_TYPE_DROP);
 
 		CNETERR("Dropping message for %s: peer not alive\n",
-			libcfs_id2str(msg->msg_target));
+			libcfs_idstr(&msg->msg_target));
 		msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED;
 		if (do_send)
 			lnet_finalize(msg, -EHOSTUNREACH);
@@ -685,12 +686,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		lnet_net_unlock(cpt);
 
 		CNETERR("Aborting message for %s: LNetM[DE]Unlink() already called on the MD/ME.\n",
-			libcfs_id2str(msg->msg_target));
+			libcfs_idstr(&msg->msg_target));
 		if (do_send) {
 			msg->msg_no_resend = true;
 			CDEBUG(D_NET,
 			       "msg %p to %s canceled and will not be resent\n",
-			       msg, libcfs_id2str(msg->msg_target));
+			       msg, libcfs_idstr(&msg->msg_target));
 			lnet_finalize(msg, -ECANCELED);
 		}
 
@@ -1629,7 +1630,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	if (!msg->msg_routing)
 		msg->msg_hdr.src_nid =
 			cpu_to_le64(lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid));
-	msg->msg_target.nid = lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid);
+	msg->msg_target.nid = the_lnet.ln_loni->ni_nid;
 	lnet_msg_commit(msg, cpt);
 	msg->msg_txni = the_lnet.ln_loni;
 
@@ -1711,8 +1712,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	 * what was originally set in the target or it will be the NID of
 	 * a router if this message should be routed
 	 */
-	/* FIXME handle large-addr nids */
-	msg->msg_target.nid = lnet_nid_to_nid4(&msg->msg_txpeer->lpni_nid);
+	msg->msg_target.nid = msg->msg_txpeer->lpni_nid;
 
 	/* lnet_msg_commit assigns the correct cpt to the message, which
 	 * is used to decrement the correct refcount on the ni when it's
@@ -2730,8 +2730,8 @@ struct lnet_ni *
 	 * continuing the same sequence of messages. Similarly, rtr_nid will
 	 * affect our choice of next hop.
 	 */
-	msg->msg_src_nid_param = src_nid;
-	msg->msg_rtr_nid_param = rtr_nid;
+	lnet_nid4_to_nid(src_nid, &msg->msg_src_nid_param);
+	lnet_nid4_to_nid(rtr_nid, &msg->msg_rtr_nid_param);
 
 	/* If necessary, perform discovery on the peer that owns this peer_ni.
 	 * Note, this can result in the ownership of this peer_ni changing
@@ -2844,7 +2844,7 @@ struct lnet_ni *
 int
 lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 {
-	lnet_nid_t dst_nid = msg->msg_target.nid;
+	lnet_nid_t dst_nid = lnet_nid_to_nid4(&msg->msg_target.nid);
 	int rc;
 
 	/*
@@ -3107,17 +3107,18 @@ struct lnet_mt_event_info {
 			lnet_net_unlock(cpt);
 			CDEBUG(D_NET,
 			       "resending %s->%s: %s recovery %d try# %d\n",
-			       libcfs_nid2str(msg->msg_src_nid_param),
-			       libcfs_id2str(msg->msg_target),
+			       libcfs_nidstr(&msg->msg_src_nid_param),
+			       libcfs_idstr(&msg->msg_target),
 			       lnet_msgtyp2str(msg->msg_type),
 			       msg->msg_recovery,
 			       msg->msg_retry_count);
-			rc = lnet_send(msg->msg_src_nid_param, msg,
-				       msg->msg_rtr_nid_param);
+			rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param),
+				       msg,
+				       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
 			if (rc) {
 				CERROR("Error sending %s to %s: %d\n",
 				       lnet_msgtyp2str(msg->msg_type),
-				       libcfs_id2str(msg->msg_target), rc);
+				       libcfs_idstr(&msg->msg_target), rc);
 				msg->msg_no_resend = true;
 				lnet_finalize(msg, rc);
 			}
@@ -3920,14 +3921,14 @@ void lnet_monitor_thr_stop(void)
 	le32_to_cpus(&hdr->msg.put.offset);
 
 	/* Primary peer NID. */
-	lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
+	info.mi_id.nid = msg->msg_initiator;
 	info.mi_id.pid = hdr->src_pid;
 	info.mi_opc = LNET_MD_OP_PUT;
 	info.mi_portal = hdr->msg.put.ptl_index;
 	info.mi_rlength	= hdr->payload_length;
 	info.mi_roffset	= hdr->msg.put.offset;
 	info.mi_mbits = hdr->msg.put.match_bits;
-	info.mi_cpt = lnet_cpt_of_nid(msg->msg_initiator, ni);
+	info.mi_cpt = lnet_nid2cpt(&msg->msg_initiator, ni);
 
 	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
 	ready_delay = msg->msg_rx_ready_delay;
@@ -3984,14 +3985,14 @@ void lnet_monitor_thr_stop(void)
 	source_id.nid = hdr->src_nid;
 	source_id.pid = hdr->src_pid;
 	/* Primary peer NID */
-	lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
+	info.mi_id.nid = msg->msg_initiator;
 	info.mi_id.pid = hdr->src_pid;
 	info.mi_opc = LNET_MD_OP_GET;
 	info.mi_portal = hdr->msg.get.ptl_index;
 	info.mi_rlength = hdr->msg.get.sink_length;
 	info.mi_roffset = hdr->msg.get.src_offset;
 	info.mi_mbits = hdr->msg.get.match_bits;
-	info.mi_cpt = lnet_cpt_of_nid(msg->msg_initiator, ni);
+	info.mi_cpt = lnet_nid2cpt(&msg->msg_initiator, ni);
 
 	rc = lnet_ptl_match_md(&info, msg);
 	if (rc == LNET_MATCHMD_DROP) {
@@ -4023,7 +4024,8 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_receiving = 0;
 
 	/* FIXME need to handle large-addr nid */
-	rc = lnet_send(lnet_nid_to_nid4(&ni->ni_nid), msg, msg->msg_from);
+	rc = lnet_send(lnet_nid_to_nid4(&ni->ni_nid), msg,
+		       lnet_nid_to_nid4(&msg->msg_from));
 	if (rc < 0) {
 		/* didn't get as far as lnet_ni_send() */
 		CERROR("%s: Unable to send REPLY for GET from %s: %d\n",
@@ -4395,10 +4397,10 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_offset = 0;
 	msg->msg_hdr = *hdr;
 	/* for building message event */
-	msg->msg_from = from_nid4;
+	msg->msg_from = from_nid;
 	if (!for_me) {
 		msg->msg_target.pid = dest_pid;
-		msg->msg_target.nid = dest_nid;
+		lnet_nid4_to_nid(dest_nid, &msg->msg_target.nid);
 		msg->msg_routing = 1;
 	} else {
 		/* convert common msg->hdr fields to host byteorder */
@@ -4491,7 +4493,7 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_rxni = ni;
 	lnet_ni_addref_locked(ni, cpt);
 	/* Multi-Rail: Primary NID of source. */
-	msg->msg_initiator = lnet_peer_primary_nid_locked(src_nid);
+	lnet_peer_primary_nid_locked(src_nid, &msg->msg_initiator);
 
 	/* mark the status of this lpni as UP since we received a message
 	 * from it. The ping response reports back the ns_status which is
@@ -4827,7 +4829,7 @@ struct lnet_msg *
 	 */
 	struct lnet_msg *msg;
 	struct lnet_libmd *getmd = getmsg->msg_md;
-	struct lnet_process_id peer_id = getmsg->msg_target;
+	struct lnet_processid *peer_id = &getmsg->msg_target;
 	int cpt;
 
 	LASSERT(!getmsg->msg_target_is_router);
@@ -4836,7 +4838,7 @@ struct lnet_msg *
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("%s: Dropping REPLY from %s: can't allocate msg\n",
-		       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(peer_id));
+		       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(peer_id));
 		goto drop;
 	}
 
@@ -4847,7 +4849,7 @@ struct lnet_msg *
 
 	if (!getmd->md_threshold) {
 		CERROR("%s: Dropping REPLY from %s for inactive MD %p\n",
-		       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(peer_id),
+		       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(peer_id),
 		       getmd);
 		lnet_res_unlock(cpt);
 		goto drop;
@@ -4856,21 +4858,21 @@ struct lnet_msg *
 	LASSERT(!getmd->md_offset);
 
 	CDEBUG(D_NET, "%s: Reply from %s md %p\n",
-	       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(peer_id), getmd);
+	       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(peer_id), getmd);
 
 	/* setup information for lnet_build_msg_event */
 	msg->msg_initiator =
-		lnet_nid_to_nid4(&getmsg->msg_txpeer->lpni_peer_net->lpn_peer->lp_primary_nid);
-	msg->msg_from = peer_id.nid;
+		getmsg->msg_txpeer->lpni_peer_net->lpn_peer->lp_primary_nid;
+	msg->msg_from = peer_id->nid;
 	msg->msg_type = LNET_MSG_GET; /* flag this msg as an "optimized" GET */
-	msg->msg_hdr.src_nid = peer_id.nid;
+	msg->msg_hdr.src_nid = lnet_nid_to_nid4(&peer_id->nid);
 	msg->msg_hdr.payload_length = getmd->md_length;
 	msg->msg_receiving = 1; /* required by lnet_msg_attach_md */
 
 	lnet_msg_attach_md(msg, getmd, getmd->md_offset, getmd->md_length);
 	lnet_res_unlock(cpt);
 
-	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
+	cpt = lnet_nid2cpt(&peer_id->nid, ni);
 
 	lnet_net_lock(cpt);
 	lnet_msg_commit(msg, cpt);
@@ -4881,7 +4883,7 @@ struct lnet_msg *
 	return msg;
 
 drop:
-	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
+	cpt = lnet_nid2cpt(&peer_id->nid, ni);
 
 	lnet_net_lock(cpt);
 	lnet_incr_stats(&ni->ni_stats, LNET_MSG_GET, LNET_STATS_TYPE_DROP);
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 12768b2..980f93d 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -79,12 +79,12 @@
 		ev->target.nid = hdr->dest_nid;
 		ev->initiator.pid = hdr->src_pid;
 		/* Multi-Rail: resolve src_nid to "primary" peer NID */
-		ev->initiator.nid = msg->msg_initiator;
+		ev->initiator.nid = lnet_nid_to_nid4(&msg->msg_initiator);
 		/* Multi-Rail: track source NID. */
 		ev->source.pid = hdr->src_pid;
 		ev->source.nid = hdr->src_nid;
 		ev->rlength = hdr->payload_length;
-		ev->sender = msg->msg_from;
+		ev->sender = lnet_nid_to_nid4(&msg->msg_from);
 		ev->mlength = msg->msg_wanted;
 		ev->offset = msg->msg_offset;
 	}
@@ -396,7 +396,8 @@
 		msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits;
 		msg->msg_hdr.msg.ack.mlength = cpu_to_le32(msg->msg_ev.mlength);
 
-		rc = lnet_send(msg->msg_ev.target.nid, msg, msg->msg_from);
+		rc = lnet_send(msg->msg_ev.target.nid, msg,
+			       lnet_nid_to_nid4(&msg->msg_from));
 
 		lnet_net_lock(cpt);
 		/*
@@ -636,7 +637,7 @@
 	 * this message consumed. The message will
 	 * consume another credit when it gets resent.
 	 */
-	msg->msg_target.nid = msg->msg_hdr.dest_nid;
+	lnet_nid4_to_nid(msg->msg_hdr.dest_nid, &msg->msg_target.nid);
 	lnet_msg_decommit_tx(msg, -EAGAIN);
 	msg->msg_sending = 0;
 	msg->msg_receiving = 0;
@@ -692,8 +693,8 @@
 	/* don't resend recovery messages */
 	if (msg->msg_recovery) {
 		CDEBUG(D_NET, "msg %s->%s is a recovery ping. retry# %d\n",
-		       libcfs_nid2str(msg->msg_from),
-		       libcfs_nid2str(msg->msg_target.nid),
+		       libcfs_nidstr(&msg->msg_from),
+		       libcfs_nidstr(&msg->msg_target.nid),
 		       msg->msg_retry_count);
 		return -ENOTRECOVERABLE;
 	}
@@ -703,8 +704,8 @@
 	 */
 	if (msg->msg_no_resend) {
 		CDEBUG(D_NET, "msg %s->%s requested no resend. retry# %d\n",
-		       libcfs_nid2str(msg->msg_from),
-		       libcfs_nid2str(msg->msg_target.nid),
+		       libcfs_nidstr(&msg->msg_from),
+		       libcfs_nidstr(&msg->msg_target.nid),
 		       msg->msg_retry_count);
 		return -ENOTRECOVERABLE;
 	}
@@ -712,8 +713,8 @@
 	/* check if the message has exceeded the number of retries */
 	if (msg->msg_retry_count >= lnet_retry_count) {
 		CNETERR("msg %s->%s exceeded retry count %d\n",
-			libcfs_nid2str(msg->msg_from),
-			libcfs_nid2str(msg->msg_target.nid),
+			libcfs_nidstr(&msg->msg_from),
+			libcfs_nidstr(&msg->msg_target.nid),
 			msg->msg_retry_count);
 		return -ENOTRECOVERABLE;
 	}
diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c
index d367c00..30628e5 100644
--- a/net/lnet/lnet/lib-ptl.c
+++ b/net/lnet/lnet/lib-ptl.c
@@ -690,7 +690,7 @@ struct list_head *
 
 		hdr = &msg->msg_hdr;
 		/* Multi-Rail: Primary peer NID */
-		lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid);
+		info.mi_id.nid = msg->msg_initiator;
 		info.mi_id.pid = hdr->src_pid;
 		info.mi_opc = LNET_MD_OP_PUT;
 		info.mi_portal = hdr->msg.put.ptl_index;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 057a1db..9cb06d2 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1325,20 +1325,18 @@ struct lnet_peer_ni *
 	}
 }
 
-lnet_nid_t
-lnet_peer_primary_nid_locked(lnet_nid_t nid)
+void
+lnet_peer_primary_nid_locked(lnet_nid_t nid, struct lnet_nid *result)
 {
 	/* FIXME handle large-addr nid */
 	struct lnet_peer_ni *lpni;
-	lnet_nid_t primary_nid = nid;
 
+	lnet_nid4_to_nid(nid, result);
 	lpni = lnet_find_peer_ni_locked(nid);
 	if (lpni) {
-		primary_nid = lnet_nid_to_nid4(&lpni->lpni_peer_net->lpn_peer->lp_primary_nid);
+		*result = lpni->lpni_peer_net->lpn_peer->lp_primary_nid;
 		lnet_peer_ni_decref_locked(lpni);
 	}
-
-	return primary_nid;
 }
 
 bool
@@ -2297,13 +2295,13 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp, int dc_error)
 
 		CDEBUG(D_NET, "sending pending message %s to target %s\n",
 		       lnet_msgtyp2str(msg->msg_type),
-		       libcfs_id2str(msg->msg_target));
-		rc = lnet_send(msg->msg_src_nid_param, msg,
-			       msg->msg_rtr_nid_param);
+		       libcfs_idstr(&msg->msg_target));
+		rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param), msg,
+			       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
 		if (rc < 0) {
 			CNETERR("Error sending %s to %s: %d\n",
 				lnet_msgtyp2str(msg->msg_type),
-				libcfs_id2str(msg->msg_target), rc);
+				libcfs_idstr(&msg->msg_target), rc);
 			lnet_finalize(msg, rc);
 		}
 	}
@@ -3699,12 +3697,12 @@ static void lnet_resend_msgs(void)
 
 	list_for_each_entry_safe(msg, tmp, &resend, msg_list) {
 		list_del_init(&msg->msg_list);
-		rc = lnet_send(msg->msg_src_nid_param, msg,
-			       msg->msg_rtr_nid_param);
+		rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param), msg,
+			       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
 		if (rc < 0) {
 			CNETERR("Error sending %s to %s: %d\n",
 				lnet_msgtyp2str(msg->msg_type),
-				libcfs_id2str(msg->msg_target), rc);
+				libcfs_idstr(&msg->msg_target), rc);
 			lnet_finalize(msg, rc);
 		}
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 04/50] lnet: Change lnet_send() to take large-addr nids
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (2 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 03/50] lnet: extend nids in struct lnet_msg James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 05/50] lnet: use large nids in struct lnet_event James Simmons
                   ` (45 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The src and rtr nids passed to lnet_send() are now pointers to a
'struct lnet_nid'.  NULL can be passed for the rtr nid, which is
treated the same as ANY.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 005bd7075c8710fb8 ("LU-10391 lnet: Change lnet_send() to take large-addr nids")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43599
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |   3 +-
 net/lnet/lnet/lib-move.c      | 181 +++++++++++++++++++++++-------------------
 net/lnet/lnet/lib-msg.c       |   7 +-
 net/lnet/lnet/peer.c          |   8 +-
 4 files changed, 108 insertions(+), 91 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 29a6252..8b4a29b 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -585,7 +585,8 @@ void lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md,
 void lnet_prep_send(struct lnet_msg *msg, int type,
 		    struct lnet_process_id target, unsigned int offset,
 		    unsigned int len);
-int lnet_send(lnet_nid_t nid, struct lnet_msg *msg, lnet_nid_t rtr_nid);
+int lnet_send(struct lnet_nid *nid, struct lnet_msg *msg,
+	      struct lnet_nid *rtr_nid);
 int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis,
 		   void *user_ptr, lnet_handler_t handler, bool recovery);
 void lnet_return_tx_credits_locked(struct lnet_msg *msg);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 83c93ca..de3e0ac 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -53,9 +53,9 @@ struct lnet_send_data {
 	struct lnet_peer_ni	*sd_gw_lpni;
 	struct lnet_peer_net	*sd_peer_net;
 	struct lnet_msg		*sd_msg;
-	lnet_nid_t		sd_dst_nid;
-	lnet_nid_t		sd_src_nid;
-	lnet_nid_t		sd_rtr_nid;
+	struct lnet_nid		sd_dst_nid;
+	struct lnet_nid		sd_src_nid;
+	struct lnet_nid		sd_rtr_nid;
 	int			sd_cpt;
 	int			sd_md_cpt;
 	u32			sd_send_case;
@@ -1769,11 +1769,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) %s : %s try# %d\n",
 		       libcfs_nid2str(msg->msg_hdr.src_nid),
 		       libcfs_nidstr(&msg->msg_txni->ni_nid),
-		       libcfs_nid2str(sd->sd_src_nid),
+		       libcfs_nidstr(&sd->sd_src_nid),
 		       libcfs_nid2str(msg->msg_hdr.dest_nid),
-		       libcfs_nid2str(sd->sd_dst_nid),
+		       libcfs_nidstr(&sd->sd_dst_nid),
 		       libcfs_nidstr(&msg->msg_txpeer->lpni_nid),
-		       libcfs_nid2str(sd->sd_rtr_nid),
+		       libcfs_nidstr(&sd->sd_rtr_nid),
 		       lnet_msgtyp2str(msg->msg_type), msg->msg_retry_count);
 
 	return rc;
@@ -1804,11 +1804,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	/* the destination lpni is set before we get here. */
 
 	/* find local NI */
-	sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt);
+	sd->sd_best_ni = lnet_nid_to_ni_locked(&sd->sd_src_nid, sd->sd_cpt);
 	if (!sd->sd_best_ni) {
 		CERROR("Can't send to %s: src %s is not a local nid\n",
-		       libcfs_nid2str(sd->sd_dst_nid),
-		       libcfs_nid2str(sd->sd_src_nid));
+		       libcfs_nidstr(&sd->sd_dst_nid),
+		       libcfs_nidstr(&sd->sd_src_nid));
 		return -EINVAL;
 	}
 
@@ -1830,11 +1830,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 static int
 lnet_handle_spec_local_mr_dst(struct lnet_send_data *sd)
 {
-	sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt);
+	sd->sd_best_ni = lnet_nid_to_ni_locked(&sd->sd_src_nid, sd->sd_cpt);
 	if (!sd->sd_best_ni) {
 		CERROR("Can't send to %s: src %s is not a local nid\n",
-		       libcfs_nid2str(sd->sd_dst_nid),
-		       libcfs_nid2str(sd->sd_src_nid));
+		       libcfs_nidstr(&sd->sd_dst_nid),
+		       libcfs_nidstr(&sd->sd_src_nid));
 		return -EINVAL;
 	}
 
@@ -1846,7 +1846,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		return lnet_handle_send(sd);
 
 	CERROR("can't send to %s. no NI on %s\n",
-	       libcfs_nid2str(sd->sd_dst_nid),
+	       libcfs_nidstr(&sd->sd_dst_nid),
 	       libcfs_net2str(sd->sd_best_ni->ni_net->net_id));
 
 	return -EHOSTUNREACH;
@@ -1947,7 +1947,7 @@ struct lnet_ni *
 
 static int
 lnet_handle_find_routed_path(struct lnet_send_data *sd,
-			     lnet_nid_t dst_nid,
+			     struct lnet_nid *dst_nid,
 			     struct lnet_peer_ni **gw_lpni,
 			     struct lnet_peer **gw_peer)
 {
@@ -1962,21 +1962,21 @@ struct lnet_ni *
 	struct lnet_peer_ni *lpni = NULL;
 	struct lnet_peer_ni *gwni = NULL;
 	bool route_found = false;
-	lnet_nid_t src_nid = (sd->sd_src_nid != LNET_NID_ANY) ? sd->sd_src_nid :
-			      sd->sd_best_ni ? lnet_nid_to_nid4(&sd->sd_best_ni->ni_nid) :
-			      LNET_NID_ANY;
+	struct lnet_nid *src_nid =
+		!LNET_NID_IS_ANY(&sd->sd_src_nid) ||
+		!sd->sd_best_ni ? &sd->sd_src_nid : &sd->sd_best_ni->ni_nid;
 	int best_lpn_healthv = 0;
 	u32 best_lpn_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 
 	CDEBUG(D_NET, "using src nid %s for route restriction\n",
-	       libcfs_nid2str(src_nid));
+	       src_nid ? libcfs_nidstr(src_nid) : "ANY");
 
 	/* If a router nid was specified then we are replying to a GET or
 	 * sending an ACK. In this case we use the gateway associated with the
 	 * specified router nid.
 	 */
-	if (sd->sd_rtr_nid != LNET_NID_ANY) {
-		gwni = lnet_find_peer_ni_locked(sd->sd_rtr_nid);
+	if (!LNET_NID_IS_ANY(&sd->sd_rtr_nid)) {
+		gwni = lnet_peer_ni_find_locked(&sd->sd_rtr_nid);
 		if (gwni) {
 			gw = gwni->lpni_peer_net->lpn_peer;
 			lnet_peer_ni_decref_locked(gwni);
@@ -1984,25 +1984,26 @@ struct lnet_ni *
 				route_found = true;
 		} else {
 			CWARN("No peer NI for gateway %s. Attempting to find an alternative route.\n",
-			       libcfs_nid2str(sd->sd_rtr_nid));
+			       libcfs_nidstr(&sd->sd_rtr_nid));
 		}
 	}
 
 	if (!route_found) {
-		if (sd->sd_msg->msg_routing || src_nid != LNET_NID_ANY) {
+		if (sd->sd_msg->msg_routing ||
+		    (src_nid && !LNET_NID_IS_ANY(src_nid))) {
 			/* If I'm routing this message then I need to find the
 			 * next hop based on the destination NID
 			 *
 			 * We also find next hop based on the destination NID
 			 * if the source NI was specified
 			 */
-			best_rnet = lnet_find_rnet_locked(LNET_NIDNET(sd->sd_dst_nid));
+			best_rnet = lnet_find_rnet_locked(LNET_NID_NET(&sd->sd_dst_nid));
 			if (!best_rnet) {
 				CERROR("Unable to send message from %s to %s - Route table may be misconfigured\n",
-				       src_nid != LNET_NID_ANY ?
-						libcfs_nid2str(src_nid) :
-						"any local NI",
-				       libcfs_nid2str(sd->sd_dst_nid));
+				       (src_nid && LNET_NID_IS_ANY(src_nid)) ?
+						"any local NI" :
+						libcfs_nidstr(src_nid),
+				       libcfs_nidstr(&sd->sd_dst_nid));
 				return -EHOSTUNREACH;
 			}
 		} else {
@@ -2049,17 +2050,17 @@ struct lnet_ni *
 
 			if (!best_lpn) {
 				CERROR("peer %s has no available nets\n",
-				       libcfs_nid2str(sd->sd_dst_nid));
+				       libcfs_nidstr(&sd->sd_dst_nid));
 				return -EHOSTUNREACH;
 			}
 
 			sd->sd_best_lpni = lnet_find_best_lpni(sd->sd_best_ni,
-							       sd->sd_dst_nid,
+							       lnet_nid_to_nid4(&sd->sd_dst_nid),
 							       lp,
 							       best_lpn->lpn_net_id);
 			if (!sd->sd_best_lpni) {
 				CERROR("peer %s is unreachable\n",
-				       libcfs_nid2str(sd->sd_dst_nid));
+				       libcfs_nidstr(&sd->sd_dst_nid));
 				return -EHOSTUNREACH;
 			}
 
@@ -2083,20 +2084,20 @@ struct lnet_ni *
 		 * have when adding a route.
 		 */
 		best_route = lnet_find_route_locked(best_rnet,
-						    LNET_NIDNET(src_nid),
+						    LNET_NID_NET(src_nid),
 						    sd->sd_best_lpni,
 						    &last_route, &gwni);
 		if (!best_route) {
 			CERROR("no route to %s from %s\n",
-			       libcfs_nid2str(dst_nid),
-			       libcfs_nid2str(src_nid));
+			       libcfs_nidstr(dst_nid),
+			       libcfs_nidstr(src_nid));
 			return -EHOSTUNREACH;
 		}
 
 		if (!gwni) {
 			CERROR("Internal Error. Route expected to %s from %s\n",
-			       libcfs_nid2str(dst_nid),
-			       libcfs_nid2str(src_nid));
+			       libcfs_nidstr(dst_nid),
+			       libcfs_nidstr(src_nid));
 			return -EFAULT;
 		}
 
@@ -2130,7 +2131,7 @@ struct lnet_ni *
 		if (!sd->sd_best_ni) {
 			CERROR("Internal Error. Expected local ni on %s but non found :%s\n",
 			       libcfs_net2str(lpn->lpn_net_id),
-			       libcfs_nid2str(sd->sd_src_nid));
+			       libcfs_nidstr(&sd->sd_src_nid));
 			return -EFAULT;
 		}
 	}
@@ -2141,7 +2142,7 @@ struct lnet_ni *
 	/* increment the sequence numbers since now we're sure we're
 	 * going to use this path
 	 */
-	if (sd->sd_rtr_nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(&sd->sd_rtr_nid)) {
 		LASSERT(best_route && last_route);
 		best_route->lr_seq = last_route->lr_seq + 1;
 		if (best_lpn)
@@ -2174,15 +2175,15 @@ struct lnet_ni *
 	struct lnet_peer *gw_peer = NULL;
 
 	/* find local NI */
-	sd->sd_best_ni = lnet_nid2ni_locked(sd->sd_src_nid, sd->sd_cpt);
+	sd->sd_best_ni = lnet_nid_to_ni_locked(&sd->sd_src_nid, sd->sd_cpt);
 	if (!sd->sd_best_ni) {
 		CERROR("Can't send to %s: src %s is not a local nid\n",
-		       libcfs_nid2str(sd->sd_dst_nid),
-		       libcfs_nid2str(sd->sd_src_nid));
+		       libcfs_nidstr(&sd->sd_dst_nid),
+		       libcfs_nidstr(&sd->sd_src_nid));
 		return -EINVAL;
 	}
 
-	rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni,
+	rc = lnet_handle_find_routed_path(sd, &sd->sd_dst_nid, &gw_lpni,
 					  &gw_peer);
 	if (rc)
 		return rc;
@@ -2403,7 +2404,7 @@ struct lnet_ni *
 	if (!sd->sd_best_lpni) {
 		CERROR("Internal fault. Unable to send msg %s to %s. NID not known\n",
 		       lnet_msgtyp2str(sd->sd_msg->msg_type),
-		       libcfs_nid2str(sd->sd_dst_nid));
+		       libcfs_nidstr(&sd->sd_dst_nid));
 		return -EFAULT;
 	}
 
@@ -2418,7 +2419,7 @@ struct lnet_ni *
 							       sd->sd_md_cpt);
 		if (!sd->sd_best_ni) {
 			CERROR("Unable to forward message to %s. No local NI available\n",
-			       libcfs_nid2str(sd->sd_dst_nid));
+			       libcfs_nidstr(&sd->sd_dst_nid));
 			rc = -EHOSTUNREACH;
 		}
 	} else {
@@ -2455,7 +2456,7 @@ struct lnet_ni *
 			 * a response to the provided final destination
 			 */
 			CERROR("Can't send response to %s. No local NI available\n",
-			       libcfs_nid2str(sd->sd_dst_nid));
+			       libcfs_nidstr(&sd->sd_dst_nid));
 			return -EHOSTUNREACH;
 		}
 
@@ -2473,7 +2474,8 @@ struct lnet_ni *
 					lnet_msg_discovery(sd->sd_msg));
 	if (sd->sd_best_ni) {
 		sd->sd_best_lpni =
-		  lnet_find_best_lpni(sd->sd_best_ni, sd->sd_dst_nid,
+		  lnet_find_best_lpni(sd->sd_best_ni,
+				      lnet_nid_to_nid4(&sd->sd_dst_nid),
 				      sd->sd_peer,
 				      sd->sd_best_ni->ni_net->net_id);
 
@@ -2501,8 +2503,8 @@ struct lnet_ni *
 		}
 
 		CERROR("Internal Error. Expected to have a best_lpni: %s -> %s\n",
-		       libcfs_nid2str(sd->sd_src_nid),
-		       libcfs_nid2str(sd->sd_dst_nid));
+		       libcfs_nidstr(&sd->sd_src_nid),
+		       libcfs_nidstr(&sd->sd_dst_nid));
 
 		return -EFAULT;
 	}
@@ -2544,11 +2546,11 @@ struct lnet_ni *
 		struct lnet_peer_ni *gw;
 		struct lnet_peer *gw_peer;
 
-		rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw,
+		rc = lnet_handle_find_routed_path(sd, &sd->sd_dst_nid, &gw,
 						  &gw_peer);
 		if (rc < 0) {
 			CERROR("Can't send response to %s. No route available\n",
-			       libcfs_nid2str(sd->sd_dst_nid));
+			       libcfs_nidstr(&sd->sd_dst_nid));
 			return -EHOSTUNREACH;
 		} else if (rc > 0) {
 			return rc;
@@ -2577,8 +2579,8 @@ struct lnet_ni *
 	 * need to select the destination which we can route to and if
 	 * there are multiple, we need to round robin.
 	 */
-	rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni,
-					  &gw_peer);
+	rc = lnet_handle_find_routed_path(sd, &sd->sd_dst_nid,
+					  &gw_lpni, &gw_peer);
 	if (rc)
 		return rc;
 
@@ -2613,7 +2615,7 @@ struct lnet_ni *
 	/* find the router and that'll find the best NI if we didn't find
 	 * it already.
 	 */
-	rc = lnet_handle_find_routed_path(sd, sd->sd_dst_nid, &gw_lpni,
+	rc = lnet_handle_find_routed_path(sd, &sd->sd_dst_nid, &gw_lpni,
 					  &gw_peer);
 	if (rc)
 		return rc;
@@ -2640,9 +2642,9 @@ struct lnet_ni *
 
 	CDEBUG(D_NET, "Source %s%s to %s %s %s destination\n",
 	       (send_case & SRC_SPEC) ? "Specified: " : "ANY",
-	       (send_case & SRC_SPEC) ? libcfs_nid2str(sd->sd_src_nid) : "",
+	       (send_case & SRC_SPEC) ? libcfs_nidstr(&sd->sd_src_nid) : "",
 	       (send_case & MR_DST) ? "MR: " : "NMR: ",
-	       libcfs_nid2str(sd->sd_dst_nid),
+	       libcfs_nidstr(&sd->sd_dst_nid),
 	       (send_case & LOCAL_DST) ? "local" : "routed");
 
 	switch (send_case) {
@@ -2672,8 +2674,10 @@ struct lnet_ni *
 }
 
 static int
-lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid,
-		    struct lnet_msg *msg, lnet_nid_t rtr_nid)
+lnet_select_pathway(struct lnet_nid *src_nid,
+		    struct lnet_nid *dst_nid,
+		    struct lnet_msg *msg,
+		    struct lnet_nid *rtr_nid)
 {
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer *peer;
@@ -2707,7 +2711,7 @@ struct lnet_ni *
 	 */
 	send_data.sd_msg = msg;
 	send_data.sd_cpt = cpt;
-	if (dst_nid == LNET_NID_LO_0) {
+	if (nid_is_lo0(dst_nid)) {
 		rc = lnet_handle_lo_send(&send_data);
 		lnet_net_unlock(cpt);
 		return rc;
@@ -2717,7 +2721,7 @@ struct lnet_ni *
 	 * created due to network traffic. This call will create the
 	 * peer->peer_net->peer_ni tree.
 	 */
-	lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt);
+	lpni = lnet_peerni_by_nid_locked(dst_nid, NULL, cpt);
 	if (IS_ERR(lpni)) {
 		lnet_net_unlock(cpt);
 		return PTR_ERR(lpni);
@@ -2730,8 +2734,14 @@ struct lnet_ni *
 	 * continuing the same sequence of messages. Similarly, rtr_nid will
 	 * affect our choice of next hop.
 	 */
-	lnet_nid4_to_nid(src_nid, &msg->msg_src_nid_param);
-	lnet_nid4_to_nid(rtr_nid, &msg->msg_rtr_nid_param);
+	if (src_nid)
+		msg->msg_src_nid_param = *src_nid;
+	else
+		msg->msg_src_nid_param = LNET_ANY_NID;
+	if (rtr_nid)
+		msg->msg_rtr_nid_param = *rtr_nid;
+	else
+		msg->msg_rtr_nid_param = LNET_ANY_NID;
 
 	/* If necessary, perform discovery on the peer that owns this peer_ni.
 	 * Note, this can result in the ownership of this peer_ni changing
@@ -2749,15 +2759,15 @@ struct lnet_ni *
 
 	/* Identify the different send cases
 	 */
-	if (src_nid == LNET_NID_ANY) {
+	if (!src_nid || LNET_NID_IS_ANY(src_nid)) {
 		send_case |= SRC_ANY;
-		if (lnet_get_net_locked(LNET_NIDNET(dst_nid)))
+		if (lnet_get_net_locked(LNET_NID_NET(dst_nid)))
 			send_case |= LOCAL_DST;
 		else
 			send_case |= REMOTE_DST;
 	} else {
 		send_case |= SRC_SPEC;
-		if (LNET_NIDNET(src_nid) == LNET_NIDNET(dst_nid))
+		if (LNET_NID_NET(src_nid) == LNET_NID_NET(dst_nid))
 			send_case |= LOCAL_DST;
 		else
 			send_case |= REMOTE_DST;
@@ -2814,9 +2824,15 @@ struct lnet_ni *
 		send_case |= SND_RESP;
 
 	/* assign parameters to the send_data */
-	send_data.sd_rtr_nid = rtr_nid;
-	send_data.sd_src_nid = src_nid;
-	send_data.sd_dst_nid = dst_nid;
+	if (rtr_nid)
+		send_data.sd_rtr_nid = *rtr_nid;
+	else
+		send_data.sd_rtr_nid = LNET_ANY_NID;
+	if (src_nid)
+		send_data.sd_src_nid = *src_nid;
+	else
+		send_data.sd_src_nid = LNET_ANY_NID;
+	send_data.sd_dst_nid = *dst_nid;
 	send_data.sd_best_lpni = lpni;
 	/* keep a pointer to the final destination in case we're going to
 	 * route, so we'll need to access it later
@@ -2842,16 +2858,12 @@ struct lnet_ni *
 }
 
 int
-lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
+lnet_send(struct lnet_nid *src_nid, struct lnet_msg *msg,
+	  struct lnet_nid *rtr_nid)
 {
-	lnet_nid_t dst_nid = lnet_nid_to_nid4(&msg->msg_target.nid);
+	struct lnet_nid *dst_nid = &msg->msg_target.nid;
 	int rc;
 
-	/*
-	 * NB: rtr_nid is set to LNET_NID_ANY for all current use-cases,
-	 * but we might want to use pre-determined router for ACK/REPLY
-	 * in the future
-	 */
 	/* NB: !ni == interface pre-determined (ACK/REPLY) */
 	LASSERT(!msg->msg_txpeer);
 	LASSERT(!msg->msg_txni);
@@ -3112,9 +3124,8 @@ struct lnet_mt_event_info {
 			       lnet_msgtyp2str(msg->msg_type),
 			       msg->msg_recovery,
 			       msg->msg_retry_count);
-			rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param),
-				       msg,
-				       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
+			rc = lnet_send(&msg->msg_src_nid_param, msg,
+				       &msg->msg_rtr_nid_param);
 			if (rc) {
 				CERROR("Error sending %s to %s: %d\n",
 				       lnet_msgtyp2str(msg->msg_type),
@@ -4023,9 +4034,7 @@ void lnet_monitor_thr_stop(void)
 	lnet_ni_recv(ni, msg->msg_private, NULL, 0, 0, 0, 0);
 	msg->msg_receiving = 0;
 
-	/* FIXME need to handle large-addr nid */
-	rc = lnet_send(lnet_nid_to_nid4(&ni->ni_nid), msg,
-		       lnet_nid_to_nid4(&msg->msg_from));
+	rc = lnet_send(&ni->ni_nid, msg, &msg->msg_from);
 	if (rc < 0) {
 		/* didn't get as far as lnet_ni_send() */
 		CERROR("%s: Unable to send REPLY for GET from %s: %d\n",
@@ -4706,7 +4715,7 @@ void lnet_monitor_thr_stop(void)
  * \see lnet_event::hdr_data and lnet_event_kind.
  */
 int
-LNetPut(lnet_nid_t self, struct lnet_handle_md mdh, enum lnet_ack_req ack,
+LNetPut(lnet_nid_t self4, struct lnet_handle_md mdh, enum lnet_ack_req ack,
 	struct lnet_process_id target, unsigned int portal,
 	u64 match_bits, unsigned int offset,
 	u64 hdr_data)
@@ -4714,11 +4723,14 @@ void lnet_monitor_thr_stop(void)
 	struct lnet_rsp_tracker *rspt = NULL;
 	struct lnet_msg *msg;
 	struct lnet_libmd *md;
+	struct lnet_nid self;
 	int cpt;
 	int rc;
 
 	LASSERT(the_lnet.ln_refcount > 0);
 
+	lnet_nid4_to_nid(self4, &self);
+
 	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
 	    fail_peer(target.nid, 1)) { /* shall we now? */
 		CERROR("Dropping PUT to %s: simulated failure\n",
@@ -4803,7 +4815,7 @@ void lnet_monitor_thr_stop(void)
 				 CFS_FAIL_ONCE))
 		rc = -EIO;
 	else
-		rc = lnet_send(self, msg, LNET_NID_ANY);
+		rc = lnet_send(&self, msg, NULL);
 	if (rc) {
 		CNETERR("Error sending PUT to %s: %d\n",
 			libcfs_id2str(target), rc);
@@ -4947,18 +4959,21 @@ struct lnet_msg *
  *		-ENOENT Invalid MD object.
  */
 int
-LNetGet(lnet_nid_t self, struct lnet_handle_md mdh,
+LNetGet(lnet_nid_t self4, struct lnet_handle_md mdh,
 	struct lnet_process_id target, unsigned int portal,
 	u64 match_bits, unsigned int offset, bool recovery)
 {
 	struct lnet_rsp_tracker *rspt;
 	struct lnet_msg *msg;
 	struct lnet_libmd *md;
+	struct lnet_nid self;
 	int cpt;
 	int rc;
 
 	LASSERT(the_lnet.ln_refcount > 0);
 
+	lnet_nid4_to_nid(self4, &self);
+
 	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
 	    fail_peer(target.nid, 1)) {		/* shall we now? */
 		CERROR("Dropping GET to %s: simulated failure\n",
@@ -5029,7 +5044,7 @@ struct lnet_msg *
 	else
 		lnet_rspt_free(rspt, cpt);
 
-	rc = lnet_send(self, msg, LNET_NID_ANY);
+	rc = lnet_send(&self, msg, NULL);
 	if (rc < 0) {
 		CNETERR("Error sending GET to %s: %d\n",
 			libcfs_id2str(target), rc);
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 980f93d..f432488 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -379,6 +379,7 @@
 
 	if (!status && msg->msg_ack) {
 		/* Only send an ACK if the PUT completed successfully */
+		struct lnet_nid src;
 
 		lnet_msg_decommit(msg, cpt, 0);
 
@@ -396,8 +397,8 @@
 		msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits;
 		msg->msg_hdr.msg.ack.mlength = cpu_to_le32(msg->msg_ev.mlength);
 
-		rc = lnet_send(msg->msg_ev.target.nid, msg,
-			       lnet_nid_to_nid4(&msg->msg_from));
+		lnet_nid4_to_nid(msg->msg_ev.target.nid, &src);
+		rc = lnet_send(&src, msg, &msg->msg_from);
 
 		lnet_net_lock(cpt);
 		/*
@@ -419,7 +420,7 @@
 		LASSERT(!msg->msg_receiving);	/* called back recv already */
 		lnet_net_unlock(cpt);
 
-		rc = lnet_send(LNET_NID_ANY, msg, LNET_NID_ANY);
+		rc = lnet_send(NULL, msg, NULL);
 
 		lnet_net_lock(cpt);
 		/*
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 9cb06d2..505065f 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2296,8 +2296,8 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp, int dc_error)
 		CDEBUG(D_NET, "sending pending message %s to target %s\n",
 		       lnet_msgtyp2str(msg->msg_type),
 		       libcfs_idstr(&msg->msg_target));
-		rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param), msg,
-			       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
+		rc = lnet_send(&msg->msg_src_nid_param, msg,
+			       &msg->msg_rtr_nid_param);
 		if (rc < 0) {
 			CNETERR("Error sending %s to %s: %d\n",
 				lnet_msgtyp2str(msg->msg_type),
@@ -3697,8 +3697,8 @@ static void lnet_resend_msgs(void)
 
 	list_for_each_entry_safe(msg, tmp, &resend, msg_list) {
 		list_del_init(&msg->msg_list);
-		rc = lnet_send(lnet_nid_to_nid4(&msg->msg_src_nid_param), msg,
-			       lnet_nid_to_nid4(&msg->msg_rtr_nid_param));
+		rc = lnet_send(&msg->msg_src_nid_param, msg,
+			       &msg->msg_rtr_nid_param);
 		if (rc < 0) {
 			CNETERR("Error sending %s to %s: %d\n",
 				lnet_msgtyp2str(msg->msg_type),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 05/50] lnet: use large nids in struct lnet_event
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (3 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 04/50] lnet: Change lnet_send() to take large-addr nids James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 06/50] lnet: socklnd: prepare for new KSOCK_MSG type James Simmons
                   ` (44 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

All nids, including those in process_id, are changed to
to struct lnet_nid / struct lnet_processid.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: e1d0224fb4045571a ("LU-10391 lnet: use large nids in struct lnet_event")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43600
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h       |  6 +++---
 fs/lustre/include/obd_class.h        |  4 ++--
 fs/lustre/ldlm/ldlm_lib.c            |  6 +++---
 fs/lustre/ldlm/ldlm_request.c        |  2 +-
 fs/lustre/obdclass/lprocfs_status.c  |  8 ++++----
 fs/lustre/osc/osc_request.c          | 14 ++++++-------
 fs/lustre/ptlrpc/client.c            |  6 +++---
 fs/lustre/ptlrpc/connection.c        | 20 +++++++++---------
 fs/lustre/ptlrpc/events.c            | 10 ++++-----
 fs/lustre/ptlrpc/import.c            |  6 +++---
 fs/lustre/ptlrpc/niobuf.c            | 10 ++++-----
 fs/lustre/ptlrpc/pack_generic.c      |  8 ++++----
 fs/lustre/ptlrpc/ptlrpc_internal.h   |  2 +-
 fs/lustre/ptlrpc/sec.c               |  4 ++--
 fs/lustre/ptlrpc/sec_config.c        |  9 +++++----
 fs/lustre/ptlrpc/service.c           |  2 +-
 include/linux/lnet/api.h             |  2 +-
 include/linux/lnet/lib-lnet.h        |  3 ++-
 include/uapi/linux/lnet/lnet-types.h |  8 ++++----
 net/lnet/lnet/api-ni.c               |  8 ++++----
 net/lnet/lnet/lib-msg.c              | 24 +++++++++++-----------
 net/lnet/lnet/peer.c                 | 39 +++++++++++++++++++++++++++---------
 net/lnet/selftest/rpc.c              |  6 +++---
 23 files changed, 114 insertions(+), 93 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index cf1bb7f..7d29542 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -364,9 +364,9 @@ struct ptlrpc_connection {
 	/** linkage for connections hash table */
 	struct rhash_head	c_hash;
 	/** Our own lnet nid for this connection */
-	lnet_nid_t		c_self;
+	struct lnet_nid		c_self;
 	/** Remote side nid for this connection */
-	struct lnet_process_id	c_peer;
+	struct lnet_processid	c_peer;
 	/** UUID of the other side */
 	struct obd_uuid		c_remote_uuid;
 	/** reference counter for this connection */
@@ -1749,7 +1749,7 @@ static inline void  ptlrpc_connection_put(struct ptlrpc_connection *conn)
 
 	CDEBUG(D_INFO, "PUT conn=%p refcount %d to %s\n",
 	       conn, atomic_read(&conn->c_refcount),
-	       libcfs_nid2str(conn->c_peer.nid));
+	       libcfs_nidstr(&conn->c_peer.nid));
 }
 
 struct ptlrpc_connection *ptlrpc_connection_addref(struct ptlrpc_connection *);
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index b69331d..3f444b0 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -98,13 +98,13 @@ int obd_connect_flags2str(char *page, int count, u64 flags, u64 flags2,
 static inline char *obd_export_nid2str(struct obd_export *exp)
 {
 	return exp->exp_connection ?
-		libcfs_nid2str(exp->exp_connection->c_peer.nid) : "<unknown>";
+		libcfs_nidstr(&exp->exp_connection->c_peer.nid) : "<unknown>";
 }
 
 static inline char *obd_import_nid2str(struct obd_import *imp)
 {
 	return imp->imp_connection ?
-		libcfs_nid2str(imp->imp_connection->c_peer.nid) : "<unknown>";
+		libcfs_nidstr(&imp->imp_connection->c_peer.nid) : "<unknown>";
 }
 
 int obd_zombie_impexp_init(void);
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 9aa87d1..02d1eea 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -67,7 +67,7 @@ static int import_set_conn(struct obd_import *imp, struct obd_uuid *uuid,
 	if (imp->imp_connection &&
 	    imp->imp_connection->c_remote_uuid.uuid[0] == 0)
 		/* nid4refnet is used to restrict network connections */
-		nid4refnet = imp->imp_connection->c_self;
+		nid4refnet = lnet_nid_to_nid4(&imp->imp_connection->c_self);
 
 	ptlrpc_conn = ptlrpc_uuid_to_connection(uuid, nid4refnet);
 	if (!ptlrpc_conn) {
@@ -297,7 +297,7 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	const char *name = obd->obd_type->typ_name;
 	enum ldlm_ns_type ns_type = LDLM_NS_TYPE_UNKNOWN;
 	struct ptlrpc_connection fake_conn = {
-		.c_self = 0,
+		.c_self = {},
 		.c_remote_uuid.uuid[0] = 0
 	};
 	int rc;
@@ -494,7 +494,7 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 			       rc);
 			goto err_import;
 		}
-		fake_conn.c_self = LNET_MKNID(refnet, 0);
+		lnet_nid4_to_nid(LNET_MKNID(refnet, 0), &fake_conn.c_self);
 		imp->imp_connection = &fake_conn;
 	}
 
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 44e1ec2..4ba64b1 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1099,7 +1099,7 @@ static int ldlm_cli_cancel_req(struct obd_export *exp,
 		if (rc == LUSTRE_ESTALE) {
 			CDEBUG(D_DLMTRACE,
 			       "client/server (nid %s) out of sync -- not fatal\n",
-			       libcfs_nid2str(req->rq_import->imp_connection->c_peer.nid));
+			       libcfs_nidstr(&req->rq_import->imp_connection->c_peer.nid));
 			rc = 0;
 		} else if (rc == -ETIMEDOUT && /* check there was no reconnect*/
 			   req->rq_import_generation == imp->imp_generation) {
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 335fc34..335e748 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -854,14 +854,14 @@ static void ldebugfs_import_locked(struct seq_file *m,
 	spin_lock(&imp->imp_lock);
 	j = 0;
 	list_for_each_entry(conn, &imp->imp_conn_list, oic_item) {
-		libcfs_nid2str_r(conn->oic_conn->c_peer.nid,
-				 nidstr, sizeof(nidstr));
+		libcfs_nidstr_r(&conn->oic_conn->c_peer.nid,
+				  nidstr, sizeof(nidstr));
 		seq_printf(m, "%s%s", j ? ", " : "", nidstr);
 		j++;
 	}
 	if (imp->imp_connection)
-		libcfs_nid2str_r(imp->imp_connection->c_peer.nid,
-				 nidstr, sizeof(nidstr));
+		libcfs_nidstr_r(&imp->imp_connection->c_peer.nid,
+				  nidstr, sizeof(nidstr));
 	else
 		strncpy(nidstr, "<none>", sizeof(nidstr));
 	seq_printf(m,
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 14863dc..c442819 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1899,7 +1899,7 @@ static void dump_all_bulk_pages(struct obdo *oa, u32 page_count,
 }
 
 static int check_write_checksum(struct obdo *oa,
-				const struct lnet_process_id *peer,
+				const struct lnet_processid *peer,
 				u32 client_cksum, u32 server_cksum,
 				struct osc_brw_async_args *aa)
 {
@@ -1967,7 +1967,7 @@ static int check_write_checksum(struct obdo *oa,
 
 	LCONSOLE_ERROR_MSG(0x132,
 			   "%s: BAD WRITE CHECKSUM: %s: from %s inode " DFID " object " DOSTID " extent [%llu-%llu], original client csum %x (type %x), server csum %x (type %x), client csum now %x\n",
-			   obd_name, msg, libcfs_nid2str(peer->nid),
+			   obd_name, msg, libcfs_nidstr(&peer->nid),
 			   oa->o_valid & OBD_MD_FLFID ? oa->o_parent_seq : (u64)0,
 			   oa->o_valid & OBD_MD_FLFID ? oa->o_parent_oid : 0,
 			   oa->o_valid & OBD_MD_FLFID ? oa->o_parent_ver : 0,
@@ -1985,8 +1985,8 @@ static int check_write_checksum(struct obdo *oa,
 static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 {
 	struct osc_brw_async_args *aa = (void *)&req->rq_async_args;
-	const struct lnet_process_id *peer =
-			&req->rq_import->imp_connection->c_peer;
+	const struct lnet_processid *peer =
+		&req->rq_import->imp_connection->c_peer;
 	struct client_obd *cli = aa->aa_cli;
 	const char *obd_name = cli->cl_import->imp_obd->obd_name;
 	struct ost_body *body;
@@ -2129,7 +2129,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 			goto out;
 
 		if (req->rq_bulk &&
-		    peer->nid != req->rq_bulk->bd_sender) {
+		    lnet_nid_to_nid4(&peer->nid) != req->rq_bulk->bd_sender) {
 			via = " via ";
 			router = libcfs_nid2str(req->rq_bulk->bd_sender);
 		}
@@ -2152,7 +2152,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 			LCONSOLE_ERROR_MSG(0x133,
 					   "%s: BAD READ CHECKSUM: from %s%s%s inode "DFID" object "DOSTID" extent [%llu-%llu], client %x/%x, server %x, cksum_type %x\n",
 					   obd_name,
-					   libcfs_nid2str(peer->nid),
+					   libcfs_nidstr(&peer->nid),
 					   via, router,
 					   clbody->oa.o_valid & OBD_MD_FLFID ?
 					   clbody->oa.o_parent_seq : (u64)0,
@@ -2181,7 +2181,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 		if ((cksum_missed & (-cksum_missed)) == cksum_missed)
 			CERROR("%s: checksum %u requested from %s but not sent\n",
 			       obd_name, cksum_missed,
-			       libcfs_nid2str(peer->nid));
+			       libcfs_nidstr(&peer->nid));
 	} else {
 		rc = 0;
 	}
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index dedb5db..ec0cd5f 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1303,7 +1303,7 @@ static int ptlrpc_check_status(struct ptlrpc_request *req)
 	rc = lustre_msg_get_status(req->rq_repmsg);
 	if (lustre_msg_get_type(req->rq_repmsg) == PTL_RPC_MSG_ERR) {
 		struct obd_import *imp = req->rq_import;
-		lnet_nid_t nid = imp->imp_connection->c_peer.nid;
+		struct lnet_nid *nid = &imp->imp_connection->c_peer.nid;
 		u32 opc = lustre_msg_get_opc(req->rq_reqmsg);
 
 		/* -EAGAIN is normal when using POSIX flocks */
@@ -1313,7 +1313,7 @@ static int ptlrpc_check_status(struct ptlrpc_request *req)
 					   "%s: operation %s to node %s failed: rc = %d\n",
 					   imp->imp_obd->obd_name,
 					   ll_opcode2str(opc),
-					   libcfs_nid2str(nid), rc);
+					   libcfs_nidstr(nid), rc);
 		return rc < 0 ? rc : -EINVAL;
 	}
 
@@ -2199,7 +2199,7 @@ int ptlrpc_expire_one_request(struct ptlrpc_request *req, int async_unlink)
 		  req->rq_sent, req->rq_real_sent);
 
 	if (imp && obd_debug_peer_on_timeout)
-		LNetDebugPeer(imp->imp_connection->c_peer);
+		LNetDebugPeer(&imp->imp_connection->c_peer);
 
 	ptlrpc_unregister_reply(req, async_unlink);
 	ptlrpc_unregister_bulk(req, async_unlink);
diff --git a/fs/lustre/ptlrpc/connection.c b/fs/lustre/ptlrpc/connection.c
index 0415357..8dbaea40 100644
--- a/fs/lustre/ptlrpc/connection.c
+++ b/fs/lustre/ptlrpc/connection.c
@@ -48,20 +48,20 @@
 
 static u32 lnet_process_id_hash(const void *data, u32 len, u32 seed)
 {
-	const struct lnet_process_id *lpi = data;
+	const struct lnet_processid *lpi = data;
 
 	seed = hash_32(seed ^ lpi->pid, 32);
-	seed ^= hash_64(lpi->nid, 32);
+	seed = hash_32(nidhash(&lpi->nid) ^ seed, 32);
 	return seed;
 }
 
 static int lnet_process_id_cmp(struct rhashtable_compare_arg *arg,
 			       const void *obj)
 {
-	const struct lnet_process_id *lpi = arg->key;
+	const struct lnet_processid *lpi = arg->key;
 	const struct ptlrpc_connection *con = obj;
 
-	if (lpi->nid == con->c_peer.nid &&
+	if (nid_same(&lpi->nid, &con->c_peer.nid) &&
 	    lpi->pid == con->c_peer.pid)
 		return 0;
 	return -ESRCH;
@@ -76,12 +76,14 @@ static int lnet_process_id_cmp(struct rhashtable_compare_arg *arg,
 };
 
 struct ptlrpc_connection *
-ptlrpc_connection_get(struct lnet_process_id peer, lnet_nid_t self,
+ptlrpc_connection_get(struct lnet_process_id peer4, lnet_nid_t self,
 		      struct obd_uuid *uuid)
 {
 	struct ptlrpc_connection *conn, *conn2;
+	struct lnet_processid peer;
 
-	peer.nid = LNetPrimaryNID(peer.nid);
+	peer4.nid = LNetPrimaryNID(peer4.nid);
+	lnet_pid4_to_pid(peer4, &peer);
 	conn = rhashtable_lookup_fast(&conn_hash, &peer, conn_hash_params);
 	if (conn) {
 		ptlrpc_connection_addref(conn);
@@ -93,7 +95,7 @@ struct ptlrpc_connection *
 		return NULL;
 
 	conn->c_peer = peer;
-	conn->c_self = self;
+	lnet_nid4_to_nid(self, &conn->c_self);
 	atomic_set(&conn->c_refcount, 1);
 	if (uuid)
 		obd_str2uuid(&conn->c_remote_uuid, uuid->uuid);
@@ -125,7 +127,7 @@ struct ptlrpc_connection *
 out:
 	CDEBUG(D_INFO, "conn=%p refcount %d to %s\n",
 	       conn, atomic_read(&conn->c_refcount),
-	       libcfs_nid2str(conn->c_peer.nid));
+	       libcfs_nidstr(&conn->c_peer.nid));
 	return conn;
 }
 
@@ -135,7 +137,7 @@ struct ptlrpc_connection *
 	atomic_inc(&conn->c_refcount);
 	CDEBUG(D_INFO, "conn=%p refcount %d to %s\n",
 	       conn, atomic_read(&conn->c_refcount),
-	       libcfs_nid2str(conn->c_peer.nid));
+	       libcfs_nidstr(&conn->c_peer.nid));
 
 	return conn;
 }
diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c
index dbf9f9d..385a6f2 100644
--- a/fs/lustre/ptlrpc/events.c
+++ b/fs/lustre/ptlrpc/events.c
@@ -213,7 +213,7 @@ void client_bulk_callback(struct lnet_event *ev)
 
 	if (ev->type != LNET_EVENT_UNLINK && ev->status == 0) {
 		desc->bd_nob_transferred += ev->mlength;
-		desc->bd_sender = ev->sender;
+		desc->bd_sender = lnet_nid_to_nid4(&ev->sender);
 	} else {
 		/* start reconnect and resend if network error hit */
 		spin_lock(&req->rq_lock);
@@ -330,7 +330,7 @@ void request_in_callback(struct lnet_event *ev)
 		if (!req) {
 			CERROR("Can't allocate incoming request descriptor: Dropping %s RPC from %s\n",
 			       service->srv_name,
-			       libcfs_id2str(ev->initiator));
+			       libcfs_idstr(&ev->initiator));
 			return;
 		}
 	}
@@ -346,9 +346,9 @@ void request_in_callback(struct lnet_event *ev)
 		req->rq_reqdata_len = ev->mlength;
 	ktime_get_real_ts64(&req->rq_arrival_time);
 	/* Multi-Rail: keep track of both initiator and source NID. */
-	req->rq_peer = ev->initiator;
-	req->rq_source = ev->source;
-	req->rq_self = ev->target.nid;
+	req->rq_peer = lnet_pid_to_pid4(&ev->initiator);
+	req->rq_source = lnet_pid_to_pid4(&ev->source);
+	req->rq_self = lnet_nid_to_nid4(&ev->target.nid);
 	req->rq_rqbd = rqbd;
 	req->rq_phase = RQ_PHASE_NEW;
 	if (ev->type == LNET_EVENT_PUT)
diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index f28fb68..3dc987cf 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -516,7 +516,7 @@ static int import_select_connection(struct obd_import *imp)
 	list_for_each_entry(conn, &imp->imp_conn_list, oic_item) {
 		CDEBUG(D_HA, "%s: connect to NID %s last attempt %lld\n",
 		       imp->imp_obd->obd_name,
-		       libcfs_nid2str(conn->oic_conn->c_peer.nid),
+		       libcfs_nidstr(&conn->oic_conn->c_peer.nid),
 		       conn->oic_last_attempt);
 
 		/* If we have not tried this connection since
@@ -591,7 +591,7 @@ static int import_select_connection(struct obd_import *imp)
 			       "%s: Connection changing to %.*s (at %s)\n",
 			       imp->imp_obd->obd_name,
 			       target_len, target_start,
-			       libcfs_nid2str(imp_conn->oic_conn->c_peer.nid));
+			       libcfs_nidstr(&imp_conn->oic_conn->c_peer.nid));
 		}
 
 		imp->imp_conn_current = imp_conn;
@@ -600,7 +600,7 @@ static int import_select_connection(struct obd_import *imp)
 	/* The below message is checked in conf-sanity.sh test_35[ab] */
 	CDEBUG(D_HA, "%s: import %p using connection %s/%s\n",
 	       imp->imp_obd->obd_name, imp, imp_conn->oic_uuid.uuid,
-	       libcfs_nid2str(imp_conn->oic_conn->c_peer.nid));
+	       libcfs_nidstr(&imp_conn->oic_conn->c_peer.nid));
 
 out_unlock:
 	spin_unlock(&imp->imp_lock);
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index da04d4e..afe83ad 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -150,9 +150,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req)
 
 	desc->bd_failure = 0;
 
-	peer.pid = desc->bd_import->imp_connection->c_peer.pid;
-	lnet_nid4_to_nid(desc->bd_import->imp_connection->c_peer.nid,
-		      &peer.nid);
+	peer = desc->bd_import->imp_connection->c_peer;
 
 	LASSERT(desc->bd_cbid.cbid_fn == client_bulk_callback);
 	LASSERT(desc->bd_cbid.cbid_arg == desc);
@@ -630,8 +628,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 			request->rq_repmsg = NULL;
 		}
 
-		peer.pid = connection->c_peer.pid;
-		lnet_nid4_to_nid(connection->c_peer.nid, &peer.nid);
+		peer = connection->c_peer;
 		if (request->rq_bulk &&
 		    OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_REPLY_ATTACH)) {
 			reply_me = ERR_PTR(-ENOMEM);
@@ -723,7 +720,8 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 	rc = ptl_send_buf(&request->rq_req_md_h,
 			  request->rq_reqbuf, request->rq_reqdata_len,
 			  LNET_NOACK_REQ, &request->rq_req_cbid,
-			  LNET_NID_ANY, connection->c_peer,
+			  LNET_NID_ANY,
+			  lnet_pid_to_pid4(&connection->c_peer),
 			  request->rq_request_portal,
 			  request->rq_xid, 0, &bulk_cookie);
 	if (likely(rc == 0))
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index e06c421..f075188e 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -2460,7 +2460,7 @@ void _debug_req(struct ptlrpc_request *req,
 {
 	bool req_ok = req->rq_reqmsg != NULL;
 	bool rep_ok = false;
-	lnet_nid_t nid = LNET_NID_ANY;
+	struct lnet_nid *nid = NULL;
 	int rep_flags = -1;
 	int rep_status = -1;
 	va_list args;
@@ -2482,9 +2482,9 @@ void _debug_req(struct ptlrpc_request *req,
 	spin_unlock(&req->rq_early_free_lock);
 
 	if (req->rq_import && req->rq_import->imp_connection)
-		nid = req->rq_import->imp_connection->c_peer.nid;
+		nid = &req->rq_import->imp_connection->c_peer.nid;
 	else if (req->rq_export && req->rq_export->exp_connection)
-		nid = req->rq_export->exp_connection->c_peer.nid;
+		nid = &req->rq_export->exp_connection->c_peer.nid;
 
 	va_start(args, fmt);
 	vaf.fmt = fmt;
@@ -2500,7 +2500,7 @@ void _debug_req(struct ptlrpc_request *req,
 			 req->rq_export ?
 			 req->rq_export->exp_client_uuid.uuid :
 			 "<?>",
-			 libcfs_nid2str(nid),
+			 nid ? libcfs_nidstr(nid) : "<unknown>",
 			 req->rq_request_portal, req->rq_reply_portal,
 			 req->rq_reqlen, req->rq_replen,
 			 req->rq_early_count, (s64)req->rq_timedout,
diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h
index d902cfe..9eddb3b 100644
--- a/fs/lustre/ptlrpc/ptlrpc_internal.h
+++ b/fs/lustre/ptlrpc/ptlrpc_internal.h
@@ -278,7 +278,7 @@ struct ptlrpc_reply_state *
 void sptlrpc_conf_choose_flavor(enum lustre_sec_part from,
 				enum lustre_sec_part to,
 				struct obd_uuid *target,
-				lnet_nid_t nid,
+				struct lnet_nid *nid,
 				struct sptlrpc_flavor *sf);
 int sptlrpc_conf_init(void);
 void sptlrpc_conf_fini(void);
diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 7e6b681..f2d0340 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -1396,7 +1396,7 @@ int sptlrpc_import_sec_adapt(struct obd_import *imp,
 			sptlrpc_conf_choose_flavor(cliobd->cl_sp_me,
 						   cliobd->cl_sp_to,
 						   &cliobd->cl_target_uuid,
-						   conn->c_self, &sf);
+						   &conn->c_self, &sf);
 
 		sp = imp->imp_obd->u.cli.cl_sp_me;
 	} else {
@@ -1435,7 +1435,7 @@ int sptlrpc_import_sec_adapt(struct obd_import *imp,
 		CDEBUG(D_SEC, "import %s->%s netid %x: select flavor %s\n",
 		       imp->imp_obd->obd_name,
 		       obd_uuid2str(&conn->c_remote_uuid),
-		       LNET_NIDNET(conn->c_self),
+		       LNET_NID_NET(&conn->c_self),
 		       sptlrpc_flavor2name(&sf, str, sizeof(str)));
 	}
 
diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c
index d44af0f..e0ddebd 100644
--- a/fs/lustre/ptlrpc/sec_config.c
+++ b/fs/lustre/ptlrpc/sec_config.c
@@ -786,7 +786,7 @@ static inline void flavor_set_flags(struct sptlrpc_flavor *sf,
 void sptlrpc_conf_choose_flavor(enum lustre_sec_part from,
 				enum lustre_sec_part to,
 				struct obd_uuid *target,
-				lnet_nid_t nid,
+				struct lnet_nid *nid,
 				struct sptlrpc_flavor *sf)
 {
 	struct sptlrpc_conf *conf;
@@ -810,13 +810,14 @@ void sptlrpc_conf_choose_flavor(enum lustre_sec_part from,
 
 	conf_tgt = sptlrpc_conf_get_tgt(conf, name, 0);
 	if (conf_tgt) {
-		rc = sptlrpc_rule_set_choose(&conf_tgt->sct_rset,
-					     from, to, nid, sf);
+		rc = sptlrpc_rule_set_choose(&conf_tgt->sct_rset, from, to,
+					     lnet_nid_to_nid4(nid), sf);
 		if (rc)
 			goto out;
 	}
 
-	rc = sptlrpc_rule_set_choose(&conf->sc_rset, from, to, nid, sf);
+	rc = sptlrpc_rule_set_choose(&conf->sc_rset, from, to,
+				     lnet_nid_to_nid4(nid), sf);
 out:
 	mutex_unlock(&sptlrpc_conf_lock);
 
diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index dbe2347..7db6f52 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -1937,7 +1937,7 @@ static int ptlrpc_handle_rs(struct ptlrpc_reply_state *rs)
 		CDEBUG(D_HA,
 		       "All locks stolen from rs %p x%lld.t%lld o%d NID %s\n",
 		       rs, rs->rs_xid, rs->rs_transno, rs->rs_opc,
-		       libcfs_nid2str(exp->exp_connection->c_peer.nid));
+		       libcfs_nidstr(&exp->exp_connection->c_peer.nid));
 	}
 
 	if ((rs->rs_sent && !rs->rs_unlinked) || nlocks > 0) {
diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index 040bf18..ee0a9a6 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -162,7 +162,7 @@ int LNetGet(lnet_nid_t self,
 int LNetSetLazyPortal(int portal);
 int LNetClearLazyPortal(int portal);
 int LNetCtl(unsigned int cmd, void *arg);
-void LNetDebugPeer(struct lnet_process_id id);
+void LNetDebugPeer(struct lnet_processid *id);
 int LNetGetPeerDiscoveryStatus(void);
 int LNetAddPeer(lnet_nid_t *nids, u32 num_nids);
 
diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 8b4a29b..9441265 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -852,7 +852,8 @@ struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp,
 					     struct lnet_nid *nid);
 struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid);
 struct lnet_peer_ni *lnet_peer_ni_find_locked(struct lnet_nid *nid);
-struct lnet_peer *lnet_find_peer(lnet_nid_t nid);
+struct lnet_peer *lnet_find_peer4(lnet_nid_t nid);
+struct lnet_peer *lnet_find_peer(struct lnet_nid *nid);
 void lnet_peer_net_added(struct lnet_net *net);
 void lnet_peer_primary_nid_locked(lnet_nid_t nid, struct lnet_nid *result);
 int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block);
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index ec0c4ef..4818271 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -518,17 +518,17 @@ enum lnet_event_kind {
  */
 struct lnet_event {
 	/** The identifier (nid, pid) of the target. */
-	struct lnet_process_id	target;
+	struct lnet_processid	target;
 	/** The identifier (nid, pid) of the initiator. */
-	struct lnet_process_id	initiator;
+	struct lnet_processid	initiator;
 	/** The source NID on the initiator. */
-	struct lnet_process_id	source;
+	struct lnet_processid	source;
 	/**
 	 * The NID of the immediate sender. If the request has been forwarded
 	 * by routers, this is the NID of the last hop; otherwise it's the
 	 * same as the source.
 	 */
-	lnet_nid_t		sender;
+	struct lnet_nid		sender;
 	/** Indicates the type of the event. */
 	enum lnet_event_kind	type;
 	/** The portal table index specified in the request */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 550f035..d61c03a 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4310,7 +4310,7 @@ u32 lnet_get_dlc_seq_locked(void)
 			return rc;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lp = lnet_find_peer(ping->ping_id.nid);
+		lp = lnet_find_peer4(ping->ping_id.nid);
 		if (lp) {
 			ping->ping_id.nid =
 				lnet_nid_to_nid4(&lp->lp_primary_nid);
@@ -4334,7 +4334,7 @@ u32 lnet_get_dlc_seq_locked(void)
 			return rc;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lp = lnet_find_peer(discover->ping_id.nid);
+		lp = lnet_find_peer4(discover->ping_id.nid);
 		if (lp) {
 			discover->ping_id.nid =
 				lnet_nid_to_nid4(&lp->lp_primary_nid);
@@ -4464,9 +4464,9 @@ u32 lnet_get_dlc_seq_locked(void)
 }
 EXPORT_SYMBOL(LNetCtl);
 
-void LNetDebugPeer(struct lnet_process_id id)
+void LNetDebugPeer(struct lnet_processid *id)
 {
-	lnet_debug_peer(id.nid);
+	lnet_debug_peer(lnet_nid_to_nid4(&id->nid));
 }
 EXPORT_SYMBOL(LNetDebugPeer);
 
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index f432488..4102c7b 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -66,25 +66,25 @@
 
 	if (ev_type == LNET_EVENT_SEND) {
 		/* event for active message */
-		ev->target.nid = le64_to_cpu(hdr->dest_nid);
+		lnet_nid4_to_nid(le64_to_cpu(hdr->dest_nid), &ev->target.nid);
 		ev->target.pid = le32_to_cpu(hdr->dest_pid);
-		ev->initiator.nid = LNET_NID_ANY;
+		ev->initiator.nid = LNET_ANY_NID;
 		ev->initiator.pid = the_lnet.ln_pid;
-		ev->source.nid = LNET_NID_ANY;
+		ev->source.nid = LNET_ANY_NID;
 		ev->source.pid = the_lnet.ln_pid;
-		ev->sender = LNET_NID_ANY;
+		ev->sender = LNET_ANY_NID;
 	} else {
 		/* event for passive message */
 		ev->target.pid = hdr->dest_pid;
-		ev->target.nid = hdr->dest_nid;
+		lnet_nid4_to_nid(hdr->dest_nid, &ev->target.nid);
 		ev->initiator.pid = hdr->src_pid;
 		/* Multi-Rail: resolve src_nid to "primary" peer NID */
-		ev->initiator.nid = lnet_nid_to_nid4(&msg->msg_initiator);
+		ev->initiator.nid = msg->msg_initiator;
 		/* Multi-Rail: track source NID. */
 		ev->source.pid = hdr->src_pid;
-		ev->source.nid = hdr->src_nid;
+		lnet_nid4_to_nid(hdr->src_nid, &ev->source.nid);
 		ev->rlength = hdr->payload_length;
-		ev->sender = lnet_nid_to_nid4(&msg->msg_from);
+		ev->sender = msg->msg_from;
 		ev->mlength = msg->msg_wanted;
 		ev->offset = msg->msg_offset;
 	}
@@ -379,7 +379,6 @@
 
 	if (!status && msg->msg_ack) {
 		/* Only send an ACK if the PUT completed successfully */
-		struct lnet_nid src;
 
 		lnet_msg_decommit(msg, cpt, 0);
 
@@ -391,14 +390,15 @@
 
 		ack_wmd = msg->msg_hdr.msg.put.ack_wmd;
 
-		lnet_prep_send(msg, LNET_MSG_ACK, msg->msg_ev.source, 0, 0);
+		lnet_prep_send(msg, LNET_MSG_ACK,
+			       lnet_pid_to_pid4(&msg->msg_ev.source), 0, 0);
 
 		msg->msg_hdr.msg.ack.dst_wmd = ack_wmd;
 		msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits;
 		msg->msg_hdr.msg.ack.mlength = cpu_to_le32(msg->msg_ev.mlength);
 
-		lnet_nid4_to_nid(msg->msg_ev.target.nid, &src);
-		rc = lnet_send(&src, msg, &msg->msg_from);
+		rc = lnet_send(&msg->msg_ev.target.nid, msg,
+			       &msg->msg_from);
 
 		lnet_net_lock(cpt);
 		/*
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 505065f..d0b7bc8 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -768,7 +768,7 @@ struct lnet_peer_ni *
 }
 
 struct lnet_peer *
-lnet_find_peer(lnet_nid_t nid)
+lnet_find_peer4(lnet_nid_t nid)
 {
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer *lp = NULL;
@@ -786,6 +786,25 @@ struct lnet_peer *
 	return lp;
 }
 
+struct lnet_peer *
+lnet_find_peer(struct lnet_nid *nid)
+{
+	struct lnet_peer_ni *lpni;
+	struct lnet_peer *lp = NULL;
+	int cpt;
+
+	cpt = lnet_net_lock_current();
+	lpni = lnet_peer_ni_find_locked(nid);
+	if (lpni) {
+		lp = lpni->lpni_peer_net->lpn_peer;
+		lnet_peer_addref_locked(lp);
+		lnet_peer_ni_decref_locked(lpni);
+	}
+	lnet_net_unlock(cpt);
+
+	return lp;
+}
+
 struct lnet_peer_net *
 lnet_get_next_peer_net_locked(struct lnet_peer *lp, u32 prev_lpn_id)
 {
@@ -2321,12 +2340,12 @@ void lnet_peer_push_event(struct lnet_event *ev)
 	pbuf = LNET_PING_INFO_TO_BUFFER(ev->md_start + ev->offset);
 
 	/* lnet_find_peer() adds a refcount */
-	lp = lnet_find_peer(ev->source.nid);
+	lp = lnet_find_peer(&ev->source.nid);
 	if (!lp) {
 		CDEBUG(D_NET,
 		       "Push Put from unknown %s (source %s). Ignoring...\n",
-		       libcfs_nid2str(ev->initiator.nid),
-		       libcfs_nid2str(ev->source.nid));
+		       libcfs_nidstr(&ev->initiator.nid),
+		       libcfs_nidstr(&ev->source.nid));
 		pbuf->pb_needs_post = true;
 		return;
 	}
@@ -2345,7 +2364,7 @@ void lnet_peer_push_event(struct lnet_event *ev)
 		CDEBUG(D_NET, "Push Put error %d from %s (source %s)\n",
 		       ev->status,
 		       libcfs_nidstr(&lp->lp_primary_nid),
-		       libcfs_nid2str(ev->source.nid));
+		       libcfs_nidstr(&ev->source.nid));
 		goto out;
 	}
 
@@ -2643,8 +2662,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 
 	spin_lock(&lp->lp_lock);
 
-	lnet_nid4_to_nid(ev->target.nid, &lp->lp_disc_src_nid);
-	lnet_nid4_to_nid(ev->source.nid, &lp->lp_disc_dst_nid);
+	lp->lp_disc_src_nid = ev->target.nid;
+	lp->lp_disc_dst_nid = ev->source.nid;
 
 	/*
 	 * If some kind of error happened the contents of message
@@ -2656,7 +2675,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 		CDEBUG(D_NET, "Ping Reply error %d from %s (source %s)\n",
 		       ev->status,
 		       libcfs_nidstr(&lp->lp_primary_nid),
-		       libcfs_nid2str(ev->source.nid));
+		       libcfs_nidstr(&ev->source.nid));
 		goto out;
 	}
 
@@ -2843,7 +2862,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 out:
 	CDEBUG(D_NET, "%s Send to %s: %d\n",
 	       (ev->msg_type == LNET_MSG_GET ? "Ping" : "Push"),
-	       libcfs_nid2str(ev->target.nid), rc);
+	       libcfs_nidstr(&ev->target.nid), rc);
 	return rc;
 }
 
@@ -4031,7 +4050,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	u32 size;
 	int rc;
 
-	lp = lnet_find_peer(cfg->prcfg_prim_nid);
+	lp = lnet_find_peer4(cfg->prcfg_prim_nid);
 
 	if (!lp) {
 		rc = -ENOENT;
diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c
index bd95e88..d1538be 100644
--- a/net/lnet/selftest/rpc.c
+++ b/net/lnet/selftest/rpc.c
@@ -1492,8 +1492,8 @@ struct srpc_client_rpc *
 			sv->sv_shuttingdown);
 
 		buffer = container_of(ev->md_start, struct srpc_buffer, buf_msg);
-		buffer->buf_peer = ev->source;
-		buffer->buf_self = ev->target.nid;
+		buffer->buf_peer = lnet_pid_to_pid4(&ev->source);
+		buffer->buf_self = lnet_nid_to_nid4(&ev->target.nid);
 
 		LASSERT(scd->scd_buf_nposted > 0);
 		scd->scd_buf_nposted--;
@@ -1532,7 +1532,7 @@ struct srpc_client_rpc *
 		    (msg->msg_magic != SRPC_MSG_MAGIC &&
 		     msg->msg_magic != __swab32(SRPC_MSG_MAGIC))) {
 			CERROR("Dropping RPC (%s) from %s: status %d mlength %d type %u magic %u.\n",
-			       sv->sv_name, libcfs_id2str(ev->initiator),
+			       sv->sv_name, libcfs_idstr(&ev->initiator),
 			       ev->status, ev->mlength,
 			       msg->msg_type, msg->msg_magic);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 06/50] lnet: socklnd: prepare for new KSOCK_MSG type
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (4 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 05/50] lnet: use large nids in struct lnet_event James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 07/50] lnet: socklnd: don't deref lnet_hdr in LNDs James Simmons
                   ` (43 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Various places in socklnd assume there are only two message type:
KSOCK_MSG_NOOP and KSOCK_MSG_LNET.  We will soon add another type to
support a new lnet_hdr type with large addresses.
So do some cleanup first:

- get rid of ksock_lnet_msg - it doesn't add anything to lnet_hdr
- separate out 'struct ksock_hdr'.  We often want the size of this
  header, and instead request the offset of a field in ksock_msg.
- introduce switch statements in a couple of places to handle the
  different types of ksock_msg.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 6940303aad5375bb2 ("LU-10391 socklnd: prepare for new KSOCK_MSG type")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43601
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/socklnd.h           | 38 ++++++++++++--------------
 net/lnet/klnds/socklnd/socklnd_cb.c    | 50 +++++++++++++++++-----------------
 net/lnet/klnds/socklnd/socklnd_proto.c | 49 ++++++++++++++++++++-------------
 3 files changed, 73 insertions(+), 64 deletions(-)

diff --git a/include/linux/lnet/socklnd.h b/include/linux/lnet/socklnd.h
index 9f318b7..97ae0e2 100644
--- a/include/linux/lnet/socklnd.h
+++ b/include/linux/lnet/socklnd.h
@@ -52,33 +52,31 @@ struct ksock_hello_msg {
 	u32		kshm_ips[0];	/* IP addrs */
 } __packed;
 
-struct ksock_lnet_msg {
-	struct lnet_hdr	ksnm_hdr;	/* lnet hdr */
-
-	/*
-	 * ksnm_payload is removed because of winnt compiler's limitation:
-	 * zero-sized array can only be placed at the tail of [nested]
-	 * structure definitions. lnet payload will be stored just after
-	 * the body of structure ksock_lnet_msg_t
-	 */
+struct ksock_msg_hdr {
+	u32		ksh_type;		/* type of socklnd message */
+	u32		ksh_csum;		/* checksum if != 0 */
+	u64		ksh_zc_cookies[2];	/* Zero-Copy request/ACK
+						 * cookie
+						 */
 } __packed;
 
+#define KSOCK_MSG_NOOP	0xC0	/* empty */
+#define KSOCK_MSG_LNET	0xC1	/* lnet msg */
+
 struct ksock_msg {
-	u32		ksm_type;		/* type of socklnd message */
-	u32		ksm_csum;		/* checksum if != 0 */
-	u64		ksm_zc_cookies[2];	/* Zero-Copy request/ACK cookie */
+	struct ksock_msg_hdr	ksm_kh;
 	union {
-		struct ksock_lnet_msg lnetmsg; /* lnet message, it's empty if
-						* it's NOOP
-						*/
+		/* case ksm_kh.ksh_type == KSOCK_MSG_NOOP */
+		/* - nothing */
+		/* case ksm_kh.ksh_type == KSOCK_MSG_LNET */
+		struct lnet_hdr lnetmsg;
 	} __packed ksm_u;
 } __packed;
+#define ksm_type ksm_kh.ksh_type
+#define ksm_csum ksm_kh.ksh_csum
+#define ksm_zc_cookies ksm_kh.ksh_zc_cookies
 
-#define KSOCK_MSG_NOOP	0xC0	/* ksm_u empty */
-#define KSOCK_MSG_LNET	0xC1	/* lnet msg */
-
-/*
- * We need to know this number to parse hello msg from ksocklnd in
+/* We need to know this number to parse hello msg from ksocklnd in
  * other LND (usocklnd, for example)
  */
 #define KSOCK_PROTO_V2	2
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index d0c3628..feab2a07 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1007,10 +1007,10 @@ struct ksock_conn_cb *
 		case  KSOCK_PROTO_V3:
 			conn->ksnc_rx_state = SOCKNAL_RX_KSM_HEADER;
 			kvec->iov_base = &conn->ksnc_msg;
-			kvec->iov_len = offsetof(struct ksock_msg, ksm_u);
-			conn->ksnc_rx_nob_left = offsetof(struct ksock_msg, ksm_u);
+			kvec->iov_len = sizeof(struct ksock_msg_hdr);
+			conn->ksnc_rx_nob_left = sizeof(struct ksock_msg_hdr);
 			iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
-				      offsetof(struct ksock_msg, ksm_u));
+				      sizeof(struct ksock_msg_hdr));
 			break;
 
 		case KSOCK_PROTO_V1:
@@ -1111,16 +1111,6 @@ struct ksock_conn_cb *
 			__swab64s(&conn->ksnc_msg.ksm_zc_cookies[1]);
 		}
 
-		if (conn->ksnc_msg.ksm_type != KSOCK_MSG_NOOP &&
-		    conn->ksnc_msg.ksm_type != KSOCK_MSG_LNET) {
-			CERROR("%s: Unknown message type: %x\n",
-			       libcfs_idstr(&conn->ksnc_peer->ksnp_id),
-			       conn->ksnc_msg.ksm_type);
-			ksocknal_new_packet(conn, 0);
-			ksocknal_close_conn_and_siblings(conn, -EPROTO);
-			return -EPROTO;
-		}
-
 		if (conn->ksnc_msg.ksm_type == KSOCK_MSG_NOOP &&
 		    conn->ksnc_msg.ksm_csum &&     /* has checksum */
 		    conn->ksnc_msg.ksm_csum != conn->ksnc_rx_csum) {
@@ -1154,21 +1144,31 @@ struct ksock_conn_cb *
 			}
 		}
 
-		if (conn->ksnc_msg.ksm_type == KSOCK_MSG_NOOP) {
+		switch (conn->ksnc_msg.ksm_type) {
+		case KSOCK_MSG_NOOP:
 			ksocknal_new_packet(conn, 0);
 			return 0;       /* NOOP is done and just return */
-		}
 
-		conn->ksnc_rx_state = SOCKNAL_RX_LNET_HEADER;
-		conn->ksnc_rx_nob_left = sizeof(struct ksock_lnet_msg);
+		case KSOCK_MSG_LNET:
+			conn->ksnc_rx_state = SOCKNAL_RX_LNET_HEADER;
+			conn->ksnc_rx_nob_left = sizeof(struct lnet_hdr);
+
+			kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg;
+			kvec->iov_len = sizeof(struct lnet_hdr);
 
-		kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg;
-		kvec->iov_len = sizeof(struct ksock_lnet_msg);
+			iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
+				      sizeof(struct lnet_hdr));
 
-		iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
-			      sizeof(struct ksock_lnet_msg));
+			goto again;     /* read lnet header now */
 
-		goto again;     /* read lnet header now */
+		default:
+			CERROR("%s: Unknown message type: %x\n",
+			       libcfs_idstr(&conn->ksnc_peer->ksnp_id),
+			       conn->ksnc_msg.ksm_type);
+			ksocknal_new_packet(conn, 0);
+			ksocknal_close_conn_and_siblings(conn, -EPROTO);
+			return -EPROTO;
+		}
 
 	case SOCKNAL_RX_LNET_HEADER:
 		/* unpack message header */
@@ -1176,7 +1176,7 @@ struct ksock_conn_cb *
 
 		if (conn->ksnc_peer->ksnp_id.pid & LNET_PID_USERFLAG) {
 			/* Userspace peer_ni */
-			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg.ksnm_hdr;
+			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg;
 			id = &conn->ksnc_peer->ksnp_id;
 
 			/* Substitute process ID assigned at connection time */
@@ -1188,7 +1188,7 @@ struct ksock_conn_cb *
 		ksocknal_conn_addref(conn);     /* ++ref while parsing */
 
 		rc = lnet_parse(conn->ksnc_peer->ksnp_ni,
-				&conn->ksnc_msg.ksm_u.lnetmsg.ksnm_hdr,
+				&conn->ksnc_msg.ksm_u.lnetmsg,
 				lnet_nid_to_nid4(&conn->ksnc_peer->ksnp_id.nid),
 				conn, 0);
 		if (rc < 0) {
@@ -1225,7 +1225,7 @@ struct ksock_conn_cb *
 		if (!rc && conn->ksnc_msg.ksm_zc_cookies[0]) {
 			LASSERT(conn->ksnc_proto != &ksocknal_protocol_v1x);
 
-			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg.ksnm_hdr;
+			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg;
 			id = &conn->ksnc_peer->ksnp_id;
 
 			rc = conn->ksnc_proto->pro_handle_zcreq(conn,
diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c
index c3ba070..2ecffb1 100644
--- a/net/lnet/klnds/socklnd/socklnd_proto.c
+++ b/net/lnet/klnds/socklnd/socklnd_proto.c
@@ -287,11 +287,12 @@
 
 	if (!tx || !tx->tx_lnetmsg) {
 		/* noop packet */
-		nob = offsetof(struct ksock_msg, ksm_u);
+		nob = sizeof(struct ksock_msg_hdr);
 	} else {
 		nob = tx->tx_lnetmsg->msg_len +
 		      ((conn->ksnc_proto == &ksocknal_protocol_v1x) ?
-		       sizeof(struct lnet_hdr) : sizeof(struct ksock_msg));
+		       0 : sizeof(struct ksock_msg_hdr) +
+			   sizeof(struct lnet_hdr));
 	}
 
 	/* default checking for typed connection */
@@ -325,9 +326,10 @@
 	int nob;
 
 	if (!tx || !tx->tx_lnetmsg)
-		nob = offsetof(struct ksock_msg, ksm_u);
+		nob = sizeof(struct ksock_msg_hdr);
 	else
-		nob = tx->tx_lnetmsg->msg_len + sizeof(struct ksock_msg);
+		nob = sizeof(struct ksock_msg_hdr) + sizeof(struct lnet_hdr) +
+		      tx->tx_lnetmsg->msg_len;
 
 	switch (conn->ksnc_type) {
 	default:
@@ -721,24 +723,33 @@
 static void
 ksocknal_pack_msg_v2(struct ksock_tx *tx)
 {
-	tx->tx_hdr.iov_base = &tx->tx_msg;
-
-	if (tx->tx_lnetmsg) {
-		LASSERT(tx->tx_msg.ksm_type != KSOCK_MSG_NOOP);
+	int hdr_size;
 
-		tx->tx_msg.ksm_u.lnetmsg.ksnm_hdr = tx->tx_lnetmsg->msg_hdr;
-		tx->tx_hdr.iov_len = sizeof(struct ksock_msg);
-		tx->tx_nob = sizeof(struct ksock_msg) + tx->tx_lnetmsg->msg_len;
-		tx->tx_resid = sizeof(struct ksock_msg) + tx->tx_lnetmsg->msg_len;
-	} else {
-		LASSERT(tx->tx_msg.ksm_type == KSOCK_MSG_NOOP);
+	tx->tx_hdr.iov_base = &tx->tx_msg;
 
-		tx->tx_hdr.iov_len = offsetof(struct ksock_msg, ksm_u.lnetmsg.ksnm_hdr);
-		tx->tx_nob = offsetof(struct ksock_msg,  ksm_u.lnetmsg.ksnm_hdr);
-		tx->tx_resid = offsetof(struct ksock_msg,  ksm_u.lnetmsg.ksnm_hdr);
+	switch (tx->tx_msg.ksm_type) {
+	case KSOCK_MSG_LNET:
+		LASSERT(tx->tx_lnetmsg);
+		hdr_size = sizeof(struct ksock_msg_hdr) +
+			   sizeof(struct lnet_hdr);
+
+		tx->tx_msg.ksm_u.lnetmsg = tx->tx_lnetmsg->msg_hdr;
+		tx->tx_hdr.iov_len = hdr_size;
+		tx->tx_nob = hdr_size + tx->tx_lnetmsg->msg_len;
+		tx->tx_resid = hdr_size + tx->tx_lnetmsg->msg_len;
+		break;
+	case KSOCK_MSG_NOOP:
+		LASSERT(!tx->tx_lnetmsg);
+		hdr_size = sizeof(struct ksock_msg_hdr);
+
+		tx->tx_hdr.iov_len = hdr_size;
+		tx->tx_nob = hdr_size;
+		tx->tx_resid = hdr_size;
+		break;
+	default:
+		LASSERT(0);
 	}
-	/*
-	 * Don't checksum before start sending, because packet can be
+	/* Don't checksum before start sending, because packet can be
 	 * piggybacked with ACK
 	 */
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 07/50] lnet: socklnd: don't deref lnet_hdr in LNDs
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (5 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 06/50] lnet: socklnd: prepare for new KSOCK_MSG type James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 08/50] lustre: sec: make client encryption compatible with ext4 James Simmons
                   ` (42 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The lnd_hdr structure needs to be extended to support larger
addresses.  To assist this we need to minimize the number of places
that its content are accessed.

Currently the internals of lnet_hdr are larely untouched inside the
various LNDs, but there are some exceptions in socklnd.
These exceptions are not necessary - the same data is available from
elsewhere in the lnet_msg.

So change those accesses to use the lnet_msg info instead.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 321fe9fa12d244bb7 ("LU-10391 socklnd: don't deref lnet_hdr in LNDs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43602
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd_cb.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index feab2a07..dede642 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -356,10 +356,10 @@ struct ksock_tx *
 
 		if (error && tx->tx_lnetmsg) {
 			CNETERR("Deleting packet type %d len %d %s->%s\n",
-				le32_to_cpu(tx->tx_lnetmsg->msg_hdr.type),
-				le32_to_cpu(tx->tx_lnetmsg->msg_hdr.payload_length),
-				libcfs_nid2str(le64_to_cpu(tx->tx_lnetmsg->msg_hdr.src_nid)),
-				libcfs_nid2str(le64_to_cpu(tx->tx_lnetmsg->msg_hdr.dest_nid)));
+				tx->tx_lnetmsg->msg_type,
+				tx->tx_lnetmsg->msg_len,
+				libcfs_nidstr(&tx->tx_lnetmsg->msg_initiator),
+				libcfs_nidstr(&tx->tx_lnetmsg->msg_target.nid));
 		} else if (error) {
 			CNETERR("Deleting noop packet\n");
 		}
@@ -697,8 +697,8 @@ struct ksock_conn *
 	LASSERT(tx->tx_resid == tx->tx_nob);
 
 	CDEBUG(D_NET, "Packet %p type %d, nob %d niov %d nkiov %d\n",
-	       tx, (tx->tx_lnetmsg) ? tx->tx_lnetmsg->msg_hdr.type :
-					      KSOCK_MSG_NOOP,
+	       tx, tx->tx_lnetmsg ? tx->tx_lnetmsg->msg_hdr.type :
+				    KSOCK_MSG_NOOP,
 	       tx->tx_nob, tx->tx_niov, tx->tx_nkiov);
 
 	bufnob = conn->ksnc_sock->sk->sk_wmem_queued;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 08/50] lustre: sec: make client encryption compatible with ext4
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (6 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 07/50] lnet: socklnd: don't deref lnet_hdr in LNDs James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 09/50] lustre: sec: allow subdir mount of encrypted dir James Simmons
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

In order to benefit from encrypted file handling implemented in
e2fsprogs, we need to adjust the way Lustre deals with encryption
context of files.

First, the encryption context needs to be stored in an xattr named
"encryption.c" instead of "security.c". But neither llite nor ldiskfs
has an xattr handler for this "encryption." xattr type. So we need
to export ldiskfs_xattr_get and ldiskfs_xattr_set_handle symbols for
this to work.

Second, we set the LDISKFS_ENCRYPT_FL flag on files for which we set
the 'encryption.c' xattr. But we just keep this flag for on-disk
inodes, and make sure the flag is cleared for in-memory inodes.
The purpose is to help e2fsprogs with encrypted files handling, while
not disturbing Lustre server side with the encryption flag (servers
are not supposed to know about it for Lustre client-side encryption).

To maintain compatibility with 2.14 in which encryption context is
stored in "security.c" xattr, we try to fetch enc context from this
xattr if getting it from "encryption.c" fails. On client side, in all
cases everything looks like encryption context is stored in
"encryption.c".

WC-bug-id: https://jira.whamcloud.com/browse/LU-13717
Lustre-commit: 4231fab66eab3e984 ("LU-13717 sec: make client encryption compatible with ext4")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45211
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c               | 42 ++++++++--------------
 fs/lustre/llite/llite_internal.h       | 14 +++-----
 fs/lustre/llite/xattr.c                | 65 +++++++++++++++-------------------
 include/uapi/linux/lustre/lustre_idl.h |  5 +--
 4 files changed, 50 insertions(+), 76 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 6a12b6c..f310d4c 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -31,29 +31,13 @@
 
 static int ll_get_context(struct inode *inode, void *ctx, size_t len)
 {
-	struct dentry *dentry = d_find_any_alias(inode);
-	struct lu_env *env;
-	u16 refcheck;
 	int rc;
 
-	env = cl_env_get(&refcheck);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
-
-	/* Set lcc_getencctx=1 to allow this thread to read
-	 * LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr, as requested by fscrypt.
+	/* Get enc context xattr directly instead of going through the VFS,
+	 * as there is no xattr handler for "encryption.".
 	 */
-	ll_cl_add(inode, env, NULL, LCC_RW);
-	ll_env_info(env)->lti_io_ctx.lcc_getencctx = 1;
-
-	rc = __vfs_getxattr(dentry, inode, LL_XATTR_NAME_ENCRYPTION_CONTEXT,
-			    ctx, len);
-
-	ll_cl_remove(inode, env);
-	cl_env_put(env, &refcheck);
-
-	if (dentry)
-		dput(dentry);
+	rc = ll_xattr_list(inode, LL_XATTR_NAME_ENCRYPTION_CONTEXT,
+			   XATTR_ENCRYPTION_T, ctx, len, OBD_MD_FLXATTR);
 
 	/* used as encryption unit size */
 	if (S_ISREG(inode->i_mode))
@@ -90,15 +74,15 @@ int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen,
  *   op_data, so that it will be sent along to the server with the request that
  *   the caller is preparing, thus saving a setxattr request.
  * - inode is not NULL:
- *   normal case in which passed fs_data is a struct dentry *, letting proceed
- *   with setxattr operation.
+ *   normal case, letting proceed with setxattr operation.
  *   This use case should only be used when explicitly setting a new encryption
  *   policy on an existing, empty directory.
  */
 static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 			  void *fs_data)
 {
-	struct dentry *dentry;
+	struct ptlrpc_request *req = NULL;
+	struct ll_sb_info *sbi;
 	int rc;
 
 	if (!inode) {
@@ -119,12 +103,16 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 	if (is_root_inode(inode))
 		return -EPERM;
 
-	dentry = (struct dentry *)fs_data;
-	set_bit(LLIF_SET_ENC_CTX, &ll_i2info(inode)->lli_flags);
-	rc = __vfs_setxattr(dentry, inode, LL_XATTR_NAME_ENCRYPTION_CONTEXT,
-			    ctx, len, XATTR_CREATE);
+	sbi = ll_i2sbi(inode);
+	/* Send setxattr request to lower layers directly instead of going
+	 * through the VFS, as there is no xattr handler for "encryption.".
+	 */
+	rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+			 OBD_MD_FLXATTR, LL_XATTR_NAME_ENCRYPTION_CONTEXT,
+			 ctx, len, XATTR_CREATE, ll_i2suppgid(inode), &req);
 	if (rc)
 		return rc;
+	ptlrpc_req_finished(req);
 
 	return ll_set_encflags(inode, (void *)ctx, len, false);
 }
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index dd338f2..a8d43bd 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -109,8 +109,6 @@ enum ll_file_flags {
 	LLIF_UPDATE_ATIME	= 4,
 	/* foreign file/dir can be unlinked unconditionnaly */
 	LLIF_FOREIGN_REMOVABLE	= 5,
-	/* setting encryption context in progress */
-	LLIF_SET_ENC_CTX	= 6,
 	/* Xattr cache is filled */
 	LLIF_XATTR_CACHE_FILLED	= 7,
 };
@@ -1285,16 +1283,11 @@ enum lcc_type {
 
 struct ll_cl_context {
 	struct list_head	 lcc_list;
-	void	   *lcc_cookie;
+	void			*lcc_cookie;
 	const struct lu_env	*lcc_env;
-	struct cl_io   *lcc_io;
-	struct cl_page *lcc_page;
+	struct cl_io		*lcc_io;
+	struct cl_page		*lcc_page;
 	enum lcc_type		 lcc_type;
-	/**
-	 * Get encryption context operation in progress,
-	 * allow getxattr of LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr
-	 */
-	unsigned int		 lcc_getencctx:1;
 };
 
 struct ll_thread_info {
@@ -1405,6 +1398,7 @@ static inline loff_t ll_file_maxbytes(struct inode *inode)
 #define XATTR_ACL_DEFAULT_T	5
 #define XATTR_LUSTRE_T		6
 #define XATTR_OTHER_T		7
+#define XATTR_ENCRYPTION_T	9
 
 ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size);
 int ll_xattr_list(struct inode *inode, const char *name, int type,
diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index ce9585a..3a342ad 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -132,15 +132,14 @@ static int ll_xattr_set_common(const struct xattr_handler *handler,
 			return -EPERM;
 	}
 
-	/* Setting LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr is only allowed
-	 * when defining an encryption policy on a directory, ie when it
-	 * comes from ll_set_context().
-	 * When new files/dirs are created in an encrypted dir, the xattr
-	 * is set directly in the create request.
+	/* This check is required for compatibility with 2.14, in which
+	 * encryption context is stored in security.c xattr.
+	 * Setting the encryption context should only be possible by llcrypt
+	 * when defining an encryption policy on a directory.
+	 * When new files/dirs are created in an encrypted dir, the enc
+	 * context is set directly in the create request.
 	 */
-	if (handler->flags == XATTR_SECURITY_T &&
-	    !strcmp(name, "c") &&
-	    !test_and_clear_bit(LLIF_SET_ENC_CTX, &ll_i2info(inode)->lli_flags))
+	if (handler->flags == XATTR_SECURITY_T && strcmp(name, "c") == 0)
 		return -EPERM;
 
 	fullname = kasprintf(GFP_KERNEL, "%s%s", xattr_prefix(handler), name);
@@ -364,19 +363,13 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
 	void *xdata;
 	int rc;
 
-	/* Getting LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr is only allowed
-	 * when it comes from ll_get_context(), ie when fscrypt needs to
-	 * know the encryption context.
-	 * Otherwise, any direct reading of this xattr returns -EPERM.
+	/* This check is required for compatibility with 2.14, in which
+	 * encryption context is stored in security.c xattr. Accessing the
+	 * encryption context should only be possible by llcrypt.
 	 */
-	if (type == XATTR_SECURITY_T &&
-	    !strcmp(name, LL_XATTR_NAME_ENCRYPTION_CONTEXT)) {
-		struct ll_cl_context *lcc = ll_cl_find(inode);
-
-		if (!lcc || !lcc->lcc_getencctx) {
-			rc = -EPERM;
-			goto out_xattr;
-		}
+	if (type == XATTR_SECURITY_T && strcmp(name, "security.c") == 0) {
+		rc = -EPERM;
+		goto out_xattr;
 	}
 
 	if (sbi->ll_xattr_cache_enabled && type != XATTR_ACL_ACCESS_T &&
@@ -613,7 +606,6 @@ static int ll_xattr_get(const struct xattr_handler *handler,
 
 ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	struct inode *dir = d_inode(dentry->d_parent);
 	struct inode *inode = d_inode(dentry);
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	ktime_t kstart = ktime_get();
@@ -643,38 +635,37 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size)
 	rem = rc;
 
 	while (rem > 0) {
+		const struct xattr_handler *xh = get_xattr_type(xattr_name);
 		bool hide_xattr = false;
 
-		/* Listing xattrs should not expose
-		 * LL_XATTR_NAME_ENCRYPTION_CONTEXT xattr, unless it comes
-		 * from fscrypt.
-		 */
-		if (get_xattr_type(xattr_name)->flags == XATTR_SECURITY_T &&
-		    !strcmp(xattr_name, LL_XATTR_NAME_ENCRYPTION_CONTEXT)) {
-			struct ll_cl_context *lcc = ll_cl_find(inode);
-
-			if (!lcc || !lcc->lcc_getencctx)
-				hide_xattr = true;
-		}
-
 		/* Hide virtual project id xattr from the list when
 		 * parent has the inherit flag and the same project id,
 		 * so project id won't be messed up by copying the xattrs
 		 * when mv to a tree with different project id.
 		 */
-		if (get_xattr_type(xattr_name)->flags == XATTR_TRUSTED_T &&
+		if (xh && xh->flags == XATTR_TRUSTED_T &&
 		    strcmp(xattr_name, XATTR_NAME_PROJID) == 0) {
+			struct inode *dir = d_inode(dentry->d_parent);
+
 			if (ll_i2info(inode)->lli_projid ==
-					ll_i2info(dir)->lli_projid &&
+			    ll_i2info(dir)->lli_projid &&
 			    test_bit(LLIF_PROJECT_INHERIT,
 				     &ll_i2info(dir)->lli_flags))
 				hide_xattr = true;
+		} else if (xh && xh->flags == XATTR_SECURITY_T &&
+			   strcmp(xattr_name, "security.c") == 0) {
+			/* Listing xattrs should not expose encryption
+			 * context. There is no handler defined for
+			 * XATTR_ENCRYPTION_PREFIX, so this test is just
+			 * needed for compatibility with 2.14, in which
+			 * encryption context is stored in security.c xattr.
+			 */
+			hide_xattr = true;
 		}
 
 		len = strnlen(xattr_name, rem - 1) + 1;
 		rem -= len;
-		if (!xattr_type_filter(sbi, hide_xattr ? NULL :
-				       get_xattr_type(xattr_name))) {
+		if (!xattr_type_filter(sbi, hide_xattr ? NULL : xh)) {
 			/* Skip OK xattr type, leave it in buffer. */
 			xattr_name += len;
 			continue;
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 78e20a7..753df16 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -1065,7 +1065,7 @@ struct lov_mds_md_v1 {		/* LOV EA mds/wire data (little-endian) */
 #define XATTR_USER_PREFIX	"user."
 #define XATTR_TRUSTED_PREFIX	"trusted."
 #define XATTR_SECURITY_PREFIX	"security."
-#define XATTR_LUSTRE_PREFIX	"lustre."
+#define XATTR_ENCRYPTION_PREFIX	"encryption."
 
 #define XATTR_NAME_SOM		"trusted.som"
 #define XATTR_NAME_LOV		"trusted.lov"
@@ -1080,7 +1080,8 @@ struct lov_mds_md_v1 {		/* LOV EA mds/wire data (little-endian) */
 #define XATTR_NAME_LFSCK_NAMESPACE "trusted.lfsck_namespace"
 #define XATTR_NAME_PROJID	"trusted.projid"
 
-#define LL_XATTR_NAME_ENCRYPTION_CONTEXT XATTR_SECURITY_PREFIX"c"
+#define LL_XATTR_NAME_ENCRYPTION_CONTEXT_OLD XATTR_SECURITY_PREFIX"c"
+#define LL_XATTR_NAME_ENCRYPTION_CONTEXT XATTR_ENCRYPTION_PREFIX"c"
 
 struct lov_mds_md_v3 {		/* LOV EA mds/wire data (little-endian) */
 	__u32 lmm_magic;	/* magic number = LOV_MAGIC_V3 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 09/50] lustre: sec: allow subdir mount of encrypted dir
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (7 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 08/50] lustre: sec: make client encryption compatible with ext4 James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 10/50] lustre: fld: repeat rpc in fld_client_rpc after EAGAIN James Simmons
                   ` (40 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

In case of sub-directory mount of an encrypted directory, we need to
retrieve the encryption context of the root inode of the filesystem.
This is done by making the MDT return this upon getattr reply.

Fixes: 71d77bbe7e ("lustre: sec: atomicity of encryption context getting/setting")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15176
Lustre-commit: faf057b46bc770a1a ("LU-15176 sec: allow subdir mount of encrypted dir")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/45407
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_lib.c | 35 +++++++++++++++++++++++++++++++----
 fs/lustre/mdc/mdc_dev.c     |  1 +
 fs/lustre/mdc/mdc_request.c | 28 +++++++++++++++++++++++++---
 fs/lustre/ptlrpc/layout.c   |  3 ++-
 4 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index e3e871d..1121652 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -251,6 +251,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	u64 valid;
 	int size, err, checksum;
 	bool api32;
+	void *encctx;
+	int encctxlen;
 
 	sbi->ll_md_obd  = class_name2obd(md);
 	if (!sbi->ll_md_obd) {
@@ -625,7 +627,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	/* make root inode
 	 * XXX: move this to after cbd setup?
 	 */
-	valid = OBD_MD_FLGETATTR | OBD_MD_FLBLOCKS | OBD_MD_FLMODEASIZE;
+	valid = OBD_MD_FLGETATTR | OBD_MD_FLBLOCKS | OBD_MD_FLMODEASIZE |
+		OBD_MD_ENCCTX;
 	if (test_bit(LL_SBI_ACL, sbi->ll_flags))
 		valid |= OBD_MD_FLACL;
 
@@ -640,6 +643,14 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	op_data->op_valid = valid;
 
 	err = md_getattr(sbi->ll_md_exp, op_data, &request);
+
+	/* We need enc ctx info, so reset it in op_data to
+	 * prevent it from being freed.
+	 */
+	encctx = op_data->op_file_encctx;
+	encctxlen = op_data->op_file_encctx_size;
+	op_data->op_file_encctx = NULL;
+	op_data->op_file_encctx_size = 0;
 	kfree(op_data);
 	if (err) {
 		CERROR("%s: md_getattr failed for root: rc = %d\n",
@@ -659,15 +670,29 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	api32 = test_bit(LL_SBI_32BIT_API, sbi->ll_flags);
 	root = ll_iget(sb, cl_fid_build_ino(&sbi->ll_root_fid, api32), &lmd);
 	md_free_lustre_md(sbi->ll_md_exp, &lmd);
-	ptlrpc_req_finished(request);
 
 	if (IS_ERR(root)) {
 		lmd_clear_acl(&lmd);
 		err = IS_ERR(root) ? PTR_ERR(root) : -EBADF;
-		CERROR("lustre_lite: bad iget4 for root\n");
+		root = NULL;
+		CERROR("%s: bad ll_iget() for root: rc = %d\n",
+		       sbi->ll_fsname, err);
+		ptlrpc_req_finished(request);
 		goto out_root;
 	}
 
+	if (encctxlen) {
+		CDEBUG(D_SEC,
+		       "server returned encryption ctx for root inode "DFID"\n",
+		       PFID(&sbi->ll_root_fid));
+		err = ll_set_encflags(root, encctx, encctxlen, true);
+		if (err)
+			CWARN("%s: cannot set enc ctx for "DFID": rc = %d\n",
+			      sbi->ll_fsname,
+			      PFID(&sbi->ll_root_fid), err);
+	}
+	ptlrpc_req_finished(request);
+
 	checksum = test_bit(LL_SBI_CHECKSUM, sbi->ll_flags);
 	if (sbi->ll_checksum_set) {
 		err = obd_set_info_async(NULL, sbi->ll_dt_exp,
@@ -3164,9 +3189,11 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 	if (ll_need_32bit_api(ll_i2sbi(i1)))
 		op_data->op_cli_flags |= CLI_API32;
 
-	if (opc == LUSTRE_OPC_LOOKUP || opc == LUSTRE_OPC_CREATE) {
+	if ((i2 && is_root_inode(i2)) ||
+	    opc == LUSTRE_OPC_LOOKUP || opc == LUSTRE_OPC_CREATE) {
 		/* In case of lookup, ll_setup_filename() has already been
 		 * called in ll_lookup_it(), so just take provided name.
+		 * Also take provided name if we are dealing with root inode.
 		 */
 		fname.disk_name.name = (unsigned char *)name;
 		fname.disk_name.len = namelen;
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 0b1d257..de67720 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1258,6 +1258,7 @@ static int mdc_io_data_version_start(const struct lu_env *env,
 
 	req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, 0);
 	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER, 0);
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_ENCCTX, RCL_SERVER, 0);
 	ptlrpc_request_set_replen(req);
 
 	req->rq_interpret_reply = mdc_data_version_interpret;
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 1064d9f..f553d44 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -158,7 +158,8 @@ static int mdc_get_root(struct obd_export *exp, const char *fileset,
  * layouts.  --umka
  */
 static int mdc_getattr_common(struct obd_export *exp,
-			      struct ptlrpc_request *req)
+			      struct ptlrpc_request *req,
+			      struct md_op_data *op_data)
 {
 	struct req_capsule *pill = &req->rq_pill;
 	struct mdt_body *body;
@@ -185,6 +186,18 @@ static int mdc_getattr_common(struct obd_export *exp,
 			return -EPROTO;
 	}
 
+	/* If encryption context was returned by MDT, put it in op_data
+	 * so that caller can set it on inode and save an extra getxattr.
+	 */
+	if (op_data && op_data->op_valid & OBD_MD_ENCCTX &&
+	    body->mbo_valid & OBD_MD_ENCCTX) {
+		op_data->op_file_encctx =
+			req_capsule_server_get(pill, &RMF_FILE_ENCCTX);
+		op_data->op_file_encctx_size =
+			req_capsule_get_size(pill, &RMF_FILE_ENCCTX,
+					     RCL_SERVER);
+	}
+
 	return 0;
 }
 
@@ -203,6 +216,7 @@ static int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data,
 		       struct ptlrpc_request **request)
 {
 	struct ptlrpc_request *req;
+	struct obd_device *obd = class_exp2obd(exp);
 	struct obd_import *imp = class_exp2cliimp(exp);
 	u32 acl_bufsize = LUSTRE_POSIX_ACL_MAX_SIZE_OLD;
 	int rc;
@@ -233,9 +247,16 @@ static int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data,
 	req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize);
 	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER,
 			     op_data->op_mode);
+	if (exp_connect_encrypt(exp) && op_data->op_valid & OBD_MD_ENCCTX)
+		req_capsule_set_size(&req->rq_pill, &RMF_FILE_ENCCTX,
+				     RCL_SERVER,
+				     obd->u.cli.cl_max_mds_easize);
+	else
+		req_capsule_set_size(&req->rq_pill, &RMF_FILE_ENCCTX,
+				     RCL_SERVER, 0);
 	ptlrpc_request_set_replen(req);
 
-	rc = mdc_getattr_common(exp, req);
+	rc = mdc_getattr_common(exp, req, op_data);
 	if (rc) {
 		if (rc == -ERANGE) {
 			acl_bufsize = min_t(u32,
@@ -289,6 +310,7 @@ static int mdc_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 	req_capsule_set_size(&req->rq_pill, &RMF_MDT_MD, RCL_SERVER,
 			     op_data->op_mode);
 	req_capsule_set_size(&req->rq_pill, &RMF_ACL, RCL_SERVER, acl_bufsize);
+	req_capsule_set_size(&req->rq_pill, &RMF_FILE_ENCCTX, RCL_SERVER, 0);
 	ptlrpc_request_set_replen(req);
 	if (op_data->op_bias & MDS_FID_OP) {
 		struct mdt_body *b = req_capsule_client_get(&req->rq_pill,
@@ -300,7 +322,7 @@ static int mdc_getattr_name(struct obd_export *exp, struct md_op_data *op_data,
 		}
 	}
 
-	rc = mdc_getattr_common(exp, req);
+	rc = mdc_getattr_common(exp, req, NULL);
 	if (rc) {
 		if (rc == -ERANGE) {
 			acl_bufsize = min_t(u32,
diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c
index 8e3c97d..8725edd 100644
--- a/fs/lustre/ptlrpc/layout.c
+++ b/fs/lustre/ptlrpc/layout.c
@@ -547,7 +547,8 @@
 	&RMF_MDT_MD,
 	&RMF_ACL,
 	&RMF_CAPA1,
-	&RMF_CAPA2
+	&RMF_CAPA2,
+	&RMF_FILE_ENCCTX,
 };
 
 static const struct req_msg_field *mds_setattr_server[] = {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 10/50] lustre: fld: repeat rpc in fld_client_rpc after EAGAIN
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (8 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 09/50] lustre: sec: allow subdir mount of encrypted dir James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 11/50] lustre: fld: don't obtain a slot for fld request James Simmons
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vladimir Saveliev, Lustre Development List

From: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>

Timeout-ed rpc sent by fld_client_rpc() may lead to client operation
failure.

Have fld_client_rpc() to repeat rpc in case of EAGAIN after a while.

Typo in fld_client_rpc() in failure simulation is fixed.

HPE-bug-id: LUS-8652
WC-bug-id: https://jira.whamcloud.com/browse/LU-13468
Lustre-commit: b1acf734f31c13d29 ("LU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/38302
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/fld/fld_request.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index 7260a14..4180bcf 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -39,7 +39,8 @@
 #define DEBUG_SUBSYSTEM S_FLD
 
 #include <linux/module.h>
-#include <asm/div64.h>
+#include <linux/math64.h>
+#include <linux/delay.h>
 
 #include <obd.h>
 #include <obd_class.h>
@@ -314,6 +315,7 @@ int fld_client_rpc(struct obd_export *exp,
 	LASSERT(exp);
 
 	imp = class_exp2cliimp(exp);
+again:
 	switch (fld_op) {
 	case FLD_QUERY:
 		req = ptlrpc_request_alloc_pack(imp, &RQF_FLD_QUERY,
@@ -361,7 +363,7 @@ int fld_client_rpc(struct obd_export *exp,
 	req->rq_reply_portal = MDC_REPLY_PORTAL;
 	ptlrpc_at_set_req_timeout(req);
 
-	if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ && req->rq_no_delay)) {
+	if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ) && req->rq_no_delay) {
 		/* the same error returned by ptlrpc_import_delay_req */
 		rc = -EAGAIN;
 		req->rq_status = rc;
@@ -373,12 +375,18 @@ int fld_client_rpc(struct obd_export *exp,
 
 	if (rc != 0) {
 		if (imp->imp_state != LUSTRE_IMP_CLOSED && !imp->imp_deactive) {
-			/*
-			 * Since LWP is not replayable, so notify the caller
-			 * to retry if needed after a while.
-			 */
+			/* LWP is not replayable, retry after a while. */
 			rc = -EAGAIN;
 		}
+		if (rc == -EAGAIN) {
+			ptlrpc_req_finished(req);
+			if (msleep_interruptible(2 * MSEC_PER_SEC)) {
+				rc = -EINTR;
+				goto out_req;
+			}
+			rc = 0;
+			goto again;
+		}
 		goto out_req;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 11/50] lustre: fld: don't obtain a slot for fld request
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (9 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 10/50] lustre: fld: repeat rpc in fld_client_rpc after EAGAIN James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 12/50] lustre: update version to 2.14.57 James Simmons
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <c17819@cray.com>

fld_client_rpc() is called with ldlm_lock held.
Thus it can cause deadlock on obtainig request slot:

Request slot can be ommited for fld request as they
are sent to separate FLD_REQUEST_PORTAL portal.

HPE-bug-id: LUS-10576
WC-bug-id: https://jira.whamcloud.com/browse/LU-15401
Lustre-commit: be5ed6b393e0268ff ("LU-15401 fld: don't obtain a slot for fld request")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/45956
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/fld/fld_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index 4180bcf..b365dc2 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -368,9 +368,7 @@ int fld_client_rpc(struct obd_export *exp,
 		rc = -EAGAIN;
 		req->rq_status = rc;
 	} else {
-		obd_get_request_slot(&exp->exp_obd->u.cli);
 		rc = ptlrpc_queue_wait(req);
-		obd_put_request_slot(&exp->exp_obd->u.cli);
 	}
 
 	if (rc != 0) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 12/50] lustre: update version to 2.14.57
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (10 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 11/50] lustre: fld: don't obtain a slot for fld request James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 13/50] lustre: llite: deadlock in ll_new_node() James Simmons
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.57

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index 947a829..5cba95e 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 56
+#define LUSTRE_PATCH 57
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.56"
+#define LUSTRE_VERSION_STRING "2.14.57"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 13/50] lustre: llite: deadlock in ll_new_node()
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (11 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 12/50] lustre: update version to 2.14.57 James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 14/50] lnet: o2iblnd: avoid static allocation for msg tx James Simmons
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

ll_new_node() will call ll_dir_getstripe() to fetch parent default
LMV if md_create() returns -EREMOTE, it should call
ll_finish_md_op_data() before calling ll_dir_getstripe() because
the latter will lock lli_lsm_sem again, which will deadlock.

Fixes: 7578c2b576e9a80 ("lustre: ptlrpc: intent_getattr fetches default LMV")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15456
Lustre-commit: 1ce2fee3156858e13 ("LU-15456 llite: deadlock in ll_new_node()")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46157
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/namei.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 0683614..1e3a4fd 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1643,11 +1643,11 @@ static int ll_new_node(struct inode *dir, struct dentry *dchild,
 
 		ptlrpc_req_finished(request);
 		request = NULL;
+		ll_finish_md_op_data(op_data);
+		op_data = NULL;
 
 		err2 = ll_dir_getstripe(dir, (void **)&lum, &lumsize, &request,
 					OBD_MD_DEFAULT_MEA);
-		ll_finish_md_op_data(op_data);
-		op_data = NULL;
 		if (!err2) {
 			struct lustre_md md = { NULL };
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 14/50] lnet: o2iblnd: avoid static allocation for msg tx
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (12 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 13/50] lustre: llite: deadlock in ll_new_node() James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 15/50] lnet: separate lnet_hdr in msg from that in lnd James Simmons
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

tx msg handling simplification, just push
a lnet header message in same list as other.

Cray-bug-id: LUS-1796
WC-bug-id: https://jira.whamcloud.com/browse/LU-14008
Lustre-commit: 7d12b98d3f8294ca0 ("LU-14008 o2iblnd: avoid static allocation for msg tx")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40261
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  5 +++--
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  2 --
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 42 ++++++++++++++++++++++++-------------
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 76f5e7f..9ce6082 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2010,8 +2010,9 @@ static int kiblnd_create_tx_pool(struct kib_poolset *ps, int size,
 		if (!tx->tx_wrq)
 			break;
 
-		tx->tx_sge = kzalloc_cpt((1 + IBLND_MAX_RDMA_FRAGS) *
-					 wrq_sge * sizeof(*tx->tx_sge),
+		/* +1 is for the lnet header/message itself */
+		tx->tx_sge = kzalloc_cpt((1 + IBLND_MAX_RDMA_FRAGS * wrq_sge) *
+					 sizeof(*tx->tx_sge),
 					 GFP_KERNEL, ps->ps_cpt);
 		if (!tx->tx_sge)
 			break;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 4fb651e..5a4b4f8 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -408,8 +408,6 @@ struct kib_tx {					/* transmit message */
 	struct kib_msg	       *tx_msg;		/* message buffer (host vaddr) */
 	u64			tx_msgaddr;	/* message buffer (I/O addr) */
 	DEFINE_DMA_UNMAP_ADDR(tx_msgunmap);	/* for dma_unmap_single() */
-	/** sge for tx_msgaddr */
-	struct ib_sge		tx_msgsge;
 	int			tx_nwrq;	/* # send work items */
 	/* # used scatter/gather elements */
 	int			tx_nsge;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 8168a26..d657366 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1002,32 +1002,44 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 }
 
 static void
+kiblnd_init_tx_sge(struct kib_tx *tx, u64 addr, unsigned int len)
+{
+	struct ib_sge *sge = &tx->tx_sge[tx->tx_nsge];
+	struct kib_hca_dev *hdev = tx->tx_pool->tpo_hdev;
+
+	*sge = (struct ib_sge) {
+		.lkey = hdev->ibh_pd->local_dma_lkey,
+		.addr = addr,
+		.length = len,
+	};
+
+	tx->tx_nsge++;
+}
+
+static void
 kiblnd_init_tx_msg(struct lnet_ni *ni, struct kib_tx *tx, int type,
 		   int body_nob)
 {
-	struct kib_hca_dev *hdev = tx->tx_pool->tpo_hdev;
-	struct ib_sge *sge = &tx->tx_msgsge;
 	struct ib_rdma_wr *wrq = &tx->tx_wrq[tx->tx_nwrq];
 	int nob = offsetof(struct kib_msg, ibm_u) + body_nob;
 
 	LASSERT(tx->tx_nwrq >= 0);
-	LASSERT(tx->tx_nwrq <= IBLND_MAX_RDMA_FRAGS);
+	LASSERT(tx->tx_nwrq < IBLND_MAX_RDMA_FRAGS + 1);
 	LASSERT(nob <= IBLND_MSG_SIZE);
 
 	kiblnd_init_msg(tx->tx_msg, type, body_nob);
 
-	sge->lkey = hdev->ibh_pd->local_dma_lkey;
-	sge->addr = tx->tx_msgaddr;
-	sge->length = nob;
-
-	memset(wrq, 0, sizeof(*wrq));
-
-	wrq->wr.next = NULL;
-	wrq->wr.wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_TX);
-	wrq->wr.sg_list = sge;
-	wrq->wr.num_sge = 1;
-	wrq->wr.opcode = IB_WR_SEND;
-	wrq->wr.send_flags = IB_SEND_SIGNALED;
+	*wrq = (struct ib_rdma_wr) {
+		.wr = {
+			.wr_id	= kiblnd_ptr2wreqid(tx, IBLND_WID_TX),
+			.num_sge = 1,
+			.sg_list = &tx->tx_sge[tx->tx_nsge],
+			.opcode = IB_WR_SEND,
+			.send_flags = IB_SEND_SIGNALED,
+		},
+	};
+
+	kiblnd_init_tx_sge(tx, tx->tx_msgaddr, nob);
 
 	tx->tx_nwrq++;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 15/50] lnet: separate lnet_hdr in msg from that in lnd.
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (13 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 14/50] lnet: o2iblnd: avoid static allocation for msg tx James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 16/50] lnet: change lnet_hdr to store large nids James Simmons
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The lnet_hdr stored in an lnet_msg has fields which are sometimes in
le byte order and sometimes in host byte order.

The various lnds need all these fields to be in le byte order for
transmission or reception over the network.

To support larger (IPv6) NIDs, we will need the lnet_hdr in lnet_msg
to store these NIDs, but the lnd will need both 4byte-addr and 16-byte
lnds depending on protocol negotiation.

This patch separates out the two to make the conversion easier to
follow.

'struct lnet_hdr' is now used within common lnet code, and is not
stored in network buffers.

lnd_send will convert from 'struct lnet_hdr' to whatever is required
in the network buffer.  When lnet_parse() is called, the network
buffer will be converted to a 'struct lnet_hdr' first, and that will
be passed to lnet_parse().

The common fields of 'struct lnet_hdr' are always in host byte order.
The command specific fields (now in 'union lnet_cmd_hdr') have not
been changed and are sometimes host-byte-order and sometimes
l3-byte-order.

The new 'struct lnet_hdr_nid4' is used in network buffers.  It is
opaque - there are no subfields to access.  Very few places in the lnd
code want to access fields anyway.

New functions lnet_hdr_to_nid4() and lnet_hdr_from_nid4() can
convert between the lnet_hdr_nid4 to the internal lnet_hdr.

'struct _lnet_hdr_nid4' is provided to access fields inside 'struct
lnet_hdr_nid4' when that is really needed.  It is used by the to/from
functions and a couple of other places.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 4ea5cf154a4663a73 ("LU-10391 lnet: separate lnet_hdr in msg from that in lnd.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43603
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h          | 30 +++++++++++
 include/linux/lnet/socklnd.h           |  2 +-
 include/uapi/linux/lnet/lnet-idl.h     | 40 +++++++++++---
 net/lnet/klnds/o2iblnd/o2iblnd-idl.h   |  6 +--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c    | 22 ++++----
 net/lnet/klnds/socklnd/socklnd.h       |  3 +-
 net/lnet/klnds/socklnd/socklnd_cb.c    | 42 ++++++++-------
 net/lnet/klnds/socklnd/socklnd_proto.c | 37 +++++++------
 net/lnet/lnet/api-ni.c                 | 96 +++++++++++++++++-----------------
 net/lnet/lnet/lib-move.c               | 43 +++++++--------
 net/lnet/lnet/lib-msg.c                |  4 +-
 net/lnet/lnet/net_fault.c              | 12 ++---
 12 files changed, 198 insertions(+), 139 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 9441265..3c3a9d2 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -477,6 +477,36 @@ struct lnet_ni *
 		((1U << the_lnet.ln_remote_nets_hbits) - 1)];
 }
 
+static inline void lnet_hdr_from_nid4(struct lnet_hdr *hdr,
+				      const struct lnet_hdr_nid4 *vhdr)
+{
+	const struct _lnet_hdr_nid4 *hdr_nid4 = (void *)vhdr;
+
+	hdr->dest_nid = le64_to_cpu(hdr_nid4->dest_nid);
+	hdr->src_nid = le64_to_cpu(hdr_nid4->src_nid);
+	hdr->dest_pid = le32_to_cpu(hdr_nid4->dest_pid);
+	hdr->src_pid = le32_to_cpu(hdr_nid4->src_pid);
+	hdr->type = le32_to_cpu(hdr_nid4->type);
+	hdr->payload_length = le32_to_cpu(hdr_nid4->payload_length);
+
+	hdr->msg = hdr_nid4->msg;
+}
+
+static inline void lnet_hdr_to_nid4(const struct lnet_hdr *hdr,
+				    struct lnet_hdr_nid4 *vhdr)
+{
+	struct _lnet_hdr_nid4 *hdr_nid4 = (void *)vhdr;
+
+	hdr_nid4->dest_nid = cpu_to_le64(hdr->dest_nid);
+	hdr_nid4->src_nid = cpu_to_le64(hdr->src_nid);
+	hdr_nid4->dest_pid = cpu_to_le32(hdr->dest_pid);
+	hdr_nid4->src_pid = cpu_to_le32(hdr->src_pid);
+	hdr_nid4->type = cpu_to_le32(hdr->type);
+	hdr_nid4->payload_length = cpu_to_le32(hdr->payload_length);
+
+	hdr_nid4->msg = hdr->msg;
+}
+
 extern struct lnet_lnd the_lolnd;
 extern int avoid_asym_router_failure;
 
diff --git a/include/linux/lnet/socklnd.h b/include/linux/lnet/socklnd.h
index 97ae0e2..025112b 100644
--- a/include/linux/lnet/socklnd.h
+++ b/include/linux/lnet/socklnd.h
@@ -69,7 +69,7 @@ struct ksock_msg {
 		/* case ksm_kh.ksh_type == KSOCK_MSG_NOOP */
 		/* - nothing */
 		/* case ksm_kh.ksh_type == KSOCK_MSG_LNET */
-		struct lnet_hdr lnetmsg;
+		struct lnet_hdr_nid4 lnetmsg_nid4;
 	} __packed ksm_u;
 } __packed;
 #define ksm_type ksm_kh.ksh_type
diff --git a/include/uapi/linux/lnet/lnet-idl.h b/include/uapi/linux/lnet/lnet-idl.h
index b14723e..a19da76 100644
--- a/include/uapi/linux/lnet/lnet-idl.h
+++ b/include/uapi/linux/lnet/lnet-idl.h
@@ -142,7 +142,32 @@ struct lnet_hello {
 	__u32			type;
 } __attribute__((packed));
 
+union lnet_cmd_hdr {
+	struct lnet_ack		ack;
+	struct lnet_put		put;
+	struct lnet_get		get;
+	struct lnet_reply	reply;
+	struct lnet_hello	hello;
+} __attribute__((packed));
+
+/* This is used for message headers that lnet code is manipulating.
+ *  All fields before the union are in host-byte-order.
+ */
 struct lnet_hdr {
+	lnet_nid_t		dest_nid;
+	lnet_nid_t		src_nid;
+	lnet_pid_t		dest_pid;
+	lnet_pid_t		src_pid;
+	__u32			type;		/* enum lnet_msg_type */
+	__u32			payload_length;	/* payload data to follow */
+	/*<------__u64 aligned------->*/
+	union lnet_cmd_hdr	msg;
+} __attribute__((packed));
+
+/* This is used to support conversion between an lnet_hdr and
+ * the content of a network message.
+ */
+struct _lnet_hdr_nid4 {
 	lnet_nid_t	dest_nid;
 	lnet_nid_t	src_nid;
 	lnet_pid_t	dest_pid;
@@ -150,13 +175,14 @@ struct lnet_hdr {
 	__u32		type;		/* enum lnet_msg_type */
 	__u32		payload_length;	/* payload data to follow */
 	/*<------__u64 aligned------->*/
-	union {
-		struct lnet_ack		ack;
-		struct lnet_put		put;
-		struct lnet_get		get;
-		struct lnet_reply	reply;
-		struct lnet_hello	hello;
-	} msg;
+	union lnet_cmd_hdr msg;
+} __attribute__((packed));
+
+/* This is stored in a network message buffer.  Content cannot be accessed
+ * without converting to an lnet_hdr.
+ */
+struct lnet_hdr_nid4 {
+	char	_bytes[sizeof(struct _lnet_hdr_nid4)];
 } __attribute__((packed));
 
 /* A HELLO message contains a magic number and protocol version
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd-idl.h b/net/lnet/klnds/o2iblnd/o2iblnd-idl.h
index f8972d16..7f6a5b61 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd-idl.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd-idl.h
@@ -50,7 +50,7 @@ struct kib_connparams {
 } __packed;
 
 struct kib_immediate_msg {
-	struct lnet_hdr		ibim_hdr;	/* portals header */
+	struct lnet_hdr_nid4	ibim_hdr;	/* portals header */
 	char			ibim_payload[0];/* piggy-backed payload */
 } __packed;
 
@@ -66,7 +66,7 @@ struct kib_rdma_desc {
 } __packed;
 
 struct kib_putreq_msg {
-	struct lnet_hdr		ibprm_hdr;	/* portals header */
+	struct lnet_hdr_nid4	ibprm_hdr;	/* portals header */
 	u64			ibprm_cookie;	/* opaque completion cookie */
 } __packed;
 
@@ -77,7 +77,7 @@ struct kib_putack_msg {
 } __packed;
 
 struct kib_get_msg {
-	struct lnet_hdr		ibgm_hdr;	/* portals header */
+	struct lnet_hdr_nid4	ibgm_hdr;	/* portals header */
 	u64			ibgm_cookie;	/* opaque completion cookie */
 	struct kib_rdma_desc	ibgm_rd;	/* rdma descriptor */
 } __packed;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index d657366..8f24e26 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -322,6 +322,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	int rc = 0;
 	int rc2;
 	int post_credit;
+	struct lnet_hdr hdr;
 
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
 
@@ -380,16 +381,16 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 
 	case IBLND_MSG_IMMEDIATE:
 		post_credit = IBLND_POSTRX_DONT_POST;
-		rc = lnet_parse(ni, &msg->ibm_u.immediate.ibim_hdr,
-				msg->ibm_srcnid, rx, 0);
+		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.immediate.ibim_hdr);
+		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 0);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
 
 	case IBLND_MSG_PUT_REQ:
 		post_credit = IBLND_POSTRX_DONT_POST;
-		rc = lnet_parse(ni, &msg->ibm_u.putreq.ibprm_hdr,
-				msg->ibm_srcnid, rx, 1);
+		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.putreq.ibprm_hdr);
+		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 1);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
@@ -452,8 +453,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 
 	case IBLND_MSG_GET_REQ:
 		post_credit = IBLND_POSTRX_DONT_POST;
-		rc = lnet_parse(ni, &msg->ibm_u.get.ibgm_hdr,
-				msg->ibm_srcnid, rx, 1);
+		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.get.ibgm_hdr);
+		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 1);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
@@ -1595,7 +1596,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 		nob = offsetof(struct kib_get_msg, ibgm_rd.rd_frags[rd->rd_nfrags]);
 		ibmsg->ibm_u.get.ibgm_cookie = tx->tx_cookie;
-		ibmsg->ibm_u.get.ibgm_hdr = *hdr;
+		lnet_hdr_to_nid4(hdr, &ibmsg->ibm_u.get.ibgm_hdr);
 
 		kiblnd_init_tx_msg(ni, tx, IBLND_MSG_GET_REQ, nob);
 
@@ -1630,7 +1631,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			return -EIO;
 		}
 
-		ibmsg->ibm_u.putreq.ibprm_hdr = *hdr;
+		lnet_hdr_to_nid4(hdr, &ibmsg->ibm_u.putreq.ibprm_hdr);
 		ibmsg->ibm_u.putreq.ibprm_cookie = tx->tx_cookie;
 		kiblnd_init_tx_msg(ni, tx, IBLND_MSG_PUT_REQ, sizeof(struct kib_putreq_msg));
 
@@ -1647,7 +1648,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		 <= IBLND_MSG_SIZE);
 
 	ibmsg = tx->tx_msg;
-	ibmsg->ibm_u.immediate.ibim_hdr = *hdr;
+	lnet_hdr_to_nid4(hdr, &ibmsg->ibm_u.immediate.ibim_hdr);
 
 	rc = copy_from_iter(&ibmsg->ibm_u.immediate.ibim_payload, payload_nob,
 			    &from);
@@ -1757,10 +1758,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		LBUG();
 
 	case IBLND_MSG_IMMEDIATE:
+		/* fallthrough */
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[rlen]);
 		if (nob > rx->rx_nob) {
 			CERROR("Immediate message from %s too big: %d(%d)\n",
-			       libcfs_nid2str(rxmsg->ibm_u.immediate.ibim_hdr.src_nid),
+			       libcfs_nid2str(lntmsg->msg_hdr.src_nid),
 			       nob, rx->rx_nob);
 			rc = -EPROTO;
 			break;
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 4607ef7..5d0be68 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -355,7 +355,8 @@ struct ksock_conn {
 							 * V2.x message takes the
 							 * whole struct
 							 * V1.x message is a bare
-							 * struct lnet_hdr, it's stored in
+							 * struct lnet_hdr_nid4, it's
+							 * stored in
 							 * ksnc_msg.ksm_u.lnetmsg
 							 */
 	/* WRITER */
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index dede642..40f3e79 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1006,21 +1006,23 @@ struct ksock_conn_cb *
 		case  KSOCK_PROTO_V2:
 		case  KSOCK_PROTO_V3:
 			conn->ksnc_rx_state = SOCKNAL_RX_KSM_HEADER;
-			kvec->iov_base = &conn->ksnc_msg;
-			kvec->iov_len = sizeof(struct ksock_msg_hdr);
 			conn->ksnc_rx_nob_left = sizeof(struct ksock_msg_hdr);
+
+			kvec->iov_base = &conn->ksnc_msg;
+			kvec->iov_len = conn->ksnc_rx_nob_left;
 			iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
-				      sizeof(struct ksock_msg_hdr));
+				      kvec->iov_len);
 			break;
 
 		case KSOCK_PROTO_V1:
-			/* Receiving bare struct lnet_hdr */
+			/* Receiving bare struct lnet_hdr_nid4 */
 			conn->ksnc_rx_state = SOCKNAL_RX_LNET_HEADER;
-			kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg;
-			kvec->iov_len = sizeof(struct lnet_hdr);
-			conn->ksnc_rx_nob_left = sizeof(struct lnet_hdr);
+			conn->ksnc_rx_nob_left = sizeof(struct lnet_hdr_nid4);
+
+			kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg_nid4;
+			kvec->iov_len = conn->ksnc_rx_nob_left;
 			iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
-				      sizeof(struct lnet_hdr));
+				      kvec->iov_len);
 			break;
 
 		default:
@@ -1059,8 +1061,9 @@ struct ksock_conn_cb *
 ksocknal_process_receive(struct ksock_conn *conn)
 {
 	struct kvec *kvec = conn->ksnc_rx_iov_space;
-	struct lnet_hdr *lhdr;
+	struct _lnet_hdr_nid4 *lhdr;
 	struct lnet_processid *id;
+	struct lnet_hdr hdr;
 	int rc;
 
 	LASSERT(refcount_read(&conn->ksnc_conn_refcount) > 0);
@@ -1151,13 +1154,12 @@ struct ksock_conn_cb *
 
 		case KSOCK_MSG_LNET:
 			conn->ksnc_rx_state = SOCKNAL_RX_LNET_HEADER;
-			conn->ksnc_rx_nob_left = sizeof(struct lnet_hdr);
-
-			kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg;
-			kvec->iov_len = sizeof(struct lnet_hdr);
+			conn->ksnc_rx_nob_left = sizeof(struct lnet_hdr_nid4);
 
+			kvec->iov_base = &conn->ksnc_msg.ksm_u.lnetmsg_nid4;
+			kvec->iov_len = conn->ksnc_rx_nob_left;
 			iov_iter_kvec(&conn->ksnc_rx_to, READ, kvec, 1,
-				      sizeof(struct lnet_hdr));
+				      kvec->iov_len);
 
 			goto again;     /* read lnet header now */
 
@@ -1174,21 +1176,21 @@ struct ksock_conn_cb *
 		/* unpack message header */
 		conn->ksnc_proto->pro_unpack(&conn->ksnc_msg);
 
+		lnet_hdr_from_nid4(&hdr, &conn->ksnc_msg.ksm_u.lnetmsg_nid4);
+
 		if (conn->ksnc_peer->ksnp_id.pid & LNET_PID_USERFLAG) {
 			/* Userspace peer_ni */
-			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg;
 			id = &conn->ksnc_peer->ksnp_id;
 
 			/* Substitute process ID assigned at connection time */
-			lhdr->src_pid = cpu_to_le32(id->pid);
-			lhdr->src_nid = cpu_to_le64(lnet_nid_to_nid4(&id->nid));
+			hdr.src_pid = id->pid;
+			hdr.src_nid = lnet_nid_to_nid4(&id->nid);
 		}
 
 		conn->ksnc_rx_state = SOCKNAL_RX_PARSE;
 		ksocknal_conn_addref(conn);     /* ++ref while parsing */
 
-		rc = lnet_parse(conn->ksnc_peer->ksnp_ni,
-				&conn->ksnc_msg.ksm_u.lnetmsg,
+		rc = lnet_parse(conn->ksnc_peer->ksnp_ni, &hdr,
 				lnet_nid_to_nid4(&conn->ksnc_peer->ksnp_id.nid),
 				conn, 0);
 		if (rc < 0) {
@@ -1225,7 +1227,7 @@ struct ksock_conn_cb *
 		if (!rc && conn->ksnc_msg.ksm_zc_cookies[0]) {
 			LASSERT(conn->ksnc_proto != &ksocknal_protocol_v1x);
 
-			lhdr = &conn->ksnc_msg.ksm_u.lnetmsg;
+			lhdr = (void *)&conn->ksnc_msg.ksm_u.lnetmsg_nid4;
 			id = &conn->ksnc_peer->ksnp_id;
 
 			rc = conn->ksnc_proto->pro_handle_zcreq(conn,
diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c
index 2ecffb1..20a582b 100644
--- a/net/lnet/klnds/socklnd/socklnd_proto.c
+++ b/net/lnet/klnds/socklnd/socklnd_proto.c
@@ -292,7 +292,7 @@
 		nob = tx->tx_lnetmsg->msg_len +
 		      ((conn->ksnc_proto == &ksocknal_protocol_v1x) ?
 		       0 : sizeof(struct ksock_msg_hdr) +
-			   sizeof(struct lnet_hdr));
+			   sizeof(struct lnet_hdr_nid4));
 	}
 
 	/* default checking for typed connection */
@@ -328,7 +328,8 @@
 	if (!tx || !tx->tx_lnetmsg)
 		nob = sizeof(struct ksock_msg_hdr);
 	else
-		nob = sizeof(struct ksock_msg_hdr) + sizeof(struct lnet_hdr) +
+		nob = sizeof(struct ksock_msg_hdr) +
+		      sizeof(struct lnet_hdr_nid4) +
 		      tx->tx_lnetmsg->msg_len;
 
 	switch (conn->ksnc_type) {
@@ -460,23 +461,23 @@
 ksocknal_send_hello_v1(struct ksock_conn *conn, struct ksock_hello_msg *hello)
 {
 	struct socket *sock = conn->ksnc_sock;
-	struct lnet_hdr *hdr;
+	struct _lnet_hdr_nid4 *hdr;
 	struct lnet_magicversion *hmv;
 	int rc;
 	int i;
 
-	BUILD_BUG_ON(sizeof(struct lnet_magicversion) != offsetof(struct lnet_hdr, src_nid));
+	BUILD_BUG_ON(sizeof(struct lnet_magicversion) !=
+		     offsetof(struct _lnet_hdr_nid4, src_nid));
 
 	hdr = kzalloc(sizeof(*hdr), GFP_NOFS);
 	if (!hdr) {
-		CERROR("Can't allocate struct lnet_hdr\n");
+		CERROR("Can't allocate struct lnet_hdr_nid4\n");
 		return -ENOMEM;
 	}
 
 	hmv = (struct lnet_magicversion *)&hdr->dest_nid;
 
-	/*
-	 * Re-organize V2.x message header to V1.x (struct lnet_hdr)
+	/* Re-organize V2.x message header to V1.x (struct lnet_hdr)
 	 * header and send out
 	 */
 	hmv->magic = cpu_to_le32(LNET_PROTO_TCP_MAGIC);
@@ -569,18 +570,19 @@
 		       int timeout)
 {
 	struct socket *sock = conn->ksnc_sock;
-	struct lnet_hdr *hdr;
+	struct _lnet_hdr_nid4 *hdr;
 	int rc;
 	int i;
 
 	hdr = kzalloc(sizeof(*hdr), GFP_NOFS);
 	if (!hdr) {
-		CERROR("Can't allocate struct lnet_hdr\n");
+		CERROR("Can't allocate struct lnet_hdr_nid4\n");
 		return -ENOMEM;
 	}
 
 	rc = lnet_sock_read(sock, &hdr->src_nid,
-			    sizeof(*hdr) - offsetof(struct lnet_hdr, src_nid),
+			    sizeof(*hdr) - offsetof(struct _lnet_hdr_nid4,
+						    src_nid),
 			    timeout);
 	if (rc) {
 		CERROR("Error %d reading rest of HELLO hdr from %pIS\n",
@@ -713,11 +715,13 @@
 	LASSERT(tx->tx_msg.ksm_type != KSOCK_MSG_NOOP);
 	LASSERT(tx->tx_lnetmsg);
 
-	tx->tx_hdr.iov_base = &tx->tx_lnetmsg->msg_hdr;
-	tx->tx_hdr.iov_len = sizeof(struct lnet_hdr);
+	lnet_hdr_to_nid4(&tx->tx_lnetmsg->msg_hdr,
+			 &tx->tx_msg.ksm_u.lnetmsg_nid4);
+	tx->tx_hdr.iov_base = (void *)&tx->tx_msg.ksm_u.lnetmsg_nid4;
+	tx->tx_hdr.iov_len = sizeof(struct lnet_hdr_nid4);
 
-	tx->tx_nob = tx->tx_lnetmsg->msg_len + sizeof(struct lnet_hdr);
-	tx->tx_resid = tx->tx_lnetmsg->msg_len + sizeof(struct lnet_hdr);
+	tx->tx_nob = tx->tx_lnetmsg->msg_len + sizeof(struct lnet_hdr_nid4);
+	tx->tx_resid = tx->tx_nob;
 }
 
 static void
@@ -731,9 +735,10 @@
 	case KSOCK_MSG_LNET:
 		LASSERT(tx->tx_lnetmsg);
 		hdr_size = sizeof(struct ksock_msg_hdr) +
-			   sizeof(struct lnet_hdr);
+			   sizeof(struct lnet_hdr_nid4);
 
-		tx->tx_msg.ksm_u.lnetmsg = tx->tx_lnetmsg->msg_hdr;
+		lnet_hdr_to_nid4(&tx->tx_lnetmsg->msg_hdr,
+				 &tx->tx_msg.ksm_u.lnetmsg_nid4);
 		tx->tx_hdr.iov_len = hdr_size;
 		tx->tx_nob = hdr_size + tx->tx_lnetmsg->msg_len;
 		tx->tx_resid = hdr_size + tx->tx_lnetmsg->msg_len;
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index d61c03a..2221b19 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -702,64 +702,64 @@ static void lnet_assert_wire_constants(void)
 	BUILD_BUG_ON((int)offsetof(struct lnet_magicversion, version_minor) != 6);
 	BUILD_BUG_ON((int)sizeof(((struct lnet_magicversion *)0)->version_minor) != 2);
 
-	/* Checks for struct struct lnet_hdr */
-	BUILD_BUG_ON((int)sizeof(struct lnet_hdr) != 72);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, dest_nid) != 0);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->dest_nid) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, src_nid) != 8);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->src_nid) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, dest_pid) != 16);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->dest_pid) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, src_pid) != 20);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->src_pid) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, type) != 24);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->type) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, payload_length) != 28);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->payload_length) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg) != 40);
+	/* Checks for struct _lnet_hdr_nid4 */
+	BUILD_BUG_ON((int)sizeof(struct _lnet_hdr_nid4) != 72);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, dest_nid) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->dest_nid) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, src_nid) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->src_nid) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, dest_pid) != 16);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->dest_pid) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, src_pid) != 20);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->src_pid) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, type) != 24);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->type) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, payload_length) != 28);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->payload_length) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg) != 40);
 
 	/* Ack */
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.ack.dst_wmd) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.ack.dst_wmd) != 16);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.ack.match_bits) != 48);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.ack.match_bits) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.ack.mlength) != 56);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.ack.mlength) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.ack.dst_wmd) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.ack.dst_wmd) != 16);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.ack.match_bits) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.ack.match_bits) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.ack.mlength) != 56);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.ack.mlength) != 4);
 
 	/* Put */
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.put.ack_wmd) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.put.ack_wmd) != 16);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.put.match_bits) != 48);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.put.match_bits) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.put.hdr_data) != 56);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.put.hdr_data) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.put.ptl_index) != 64);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.put.ptl_index) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.put.offset) != 68);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.put.offset) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.put.ack_wmd) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.put.ack_wmd) != 16);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.put.match_bits) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.put.match_bits) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.put.hdr_data) != 56);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.put.hdr_data) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.put.ptl_index) != 64);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.put.ptl_index) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.put.offset) != 68);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.put.offset) != 4);
 
 	/* Get */
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.get.return_wmd) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.get.return_wmd) != 16);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.get.match_bits) != 48);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.get.match_bits) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.get.ptl_index) != 56);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.get.ptl_index) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.get.src_offset) != 60);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.get.src_offset) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.get.sink_length) != 64);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.get.sink_length) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.get.return_wmd) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.get.return_wmd) != 16);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.get.match_bits) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.get.match_bits) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.get.ptl_index) != 56);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.get.ptl_index) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.get.src_offset) != 60);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.get.src_offset) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.get.sink_length) != 64);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.get.sink_length) != 4);
 
 	/* Reply */
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.reply.dst_wmd) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.reply.dst_wmd) != 16);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.reply.dst_wmd) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.reply.dst_wmd) != 16);
 
 	/* Hello */
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.hello.incarnation) != 32);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.hello.incarnation) != 8);
-	BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.hello.type) != 40);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.hello.type) != 4);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.hello.incarnation) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.hello.incarnation) != 8);
+	BUILD_BUG_ON((int)offsetof(struct _lnet_hdr_nid4, msg.hello.type) != 40);
+	BUILD_BUG_ON((int)sizeof(((struct _lnet_hdr_nid4 *)0)->msg.hello.type) != 4);
 
 	/* Checks for struct lnet_ni_status and related constants */
 	BUILD_BUG_ON(LNET_NI_STATUS_INVALID != 0x00000000);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index de3e0ac..f55b525 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -526,13 +526,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		lnet_setpayloadbuffer(msg);
 
 	memset(&msg->msg_hdr, 0, sizeof(msg->msg_hdr));
-	msg->msg_hdr.type = cpu_to_le32(type);
+	msg->msg_hdr.type = type;
 	/* dest_nid will be overwritten by lnet_select_pathway() */
-	msg->msg_hdr.dest_nid = cpu_to_le64(target.nid);
-	msg->msg_hdr.dest_pid = cpu_to_le32(target.pid);
+	msg->msg_hdr.dest_nid = target.nid;
+	msg->msg_hdr.dest_pid = target.pid;
 	/* src_nid will be set later */
-	msg->msg_hdr.src_pid = cpu_to_le32(the_lnet.ln_pid);
-	msg->msg_hdr.payload_length = cpu_to_le32(len);
+	msg->msg_hdr.src_pid = the_lnet.ln_pid;
+	msg->msg_hdr.payload_length = len;
 }
 
 void
@@ -705,7 +705,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 			!list_empty(&lp->lpni_txq));
 
 		msg->msg_peertxcredit = 1;
-		lp->lpni_txqnob += msg->msg_len + sizeof(struct lnet_hdr);
+		lp->lpni_txqnob += msg->msg_len + sizeof(struct lnet_hdr_nid4);
 		lp->lpni_txcredits--;
 
 		if (lp->lpni_txcredits < lp->lpni_mintxcredits)
@@ -903,7 +903,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		LASSERT((txpeer->lpni_txcredits < 0) ==
 			!list_empty(&txpeer->lpni_txq));
 
-		txpeer->lpni_txqnob -= msg->msg_len + sizeof(struct lnet_hdr);
+		txpeer->lpni_txqnob -=	msg->msg_len +
+					sizeof(struct lnet_hdr_nid4);
 		LASSERT(txpeer->lpni_txqnob >= 0);
 
 		txpeer->lpni_txcredits++;
@@ -1626,10 +1627,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	/* No send credit hassles with LOLND */
 	lnet_ni_addref_locked(the_lnet.ln_loni, cpt);
 	msg->msg_hdr.dest_nid =
-		cpu_to_le64(lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid));
+		lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid);
 	if (!msg->msg_routing)
 		msg->msg_hdr.src_nid =
-			cpu_to_le64(lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid));
+			lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid);
 	msg->msg_target.nid = the_lnet.ln_loni->ni_nid;
 	lnet_msg_commit(msg, cpt);
 	msg->msg_txni = the_lnet.ln_loni;
@@ -1726,7 +1727,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	 */
 	if (!msg->msg_routing)
 		msg->msg_hdr.src_nid =
-			cpu_to_le64(lnet_nid_to_nid4(&msg->msg_txni->ni_nid));
+			lnet_nid_to_nid4(&msg->msg_txni->ni_nid);
 
 	if (routing) {
 		msg->msg_target_is_router = 1;
@@ -1742,13 +1743,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 */
 		/* FIXME handle large-addr nid */
 		msg->msg_hdr.dest_nid =
-			cpu_to_le64(lnet_nid_to_nid4(&final_dst_lpni->lpni_nid));
+			lnet_nid_to_nid4(&final_dst_lpni->lpni_nid);
 	} else {
 		/* if we're not routing set the dest_nid to the best peer
 		 * ni NID that we picked earlier in the algorithm.
 		 */
 		msg->msg_hdr.dest_nid =
-			cpu_to_le64(lnet_nid_to_nid4(&msg->msg_txpeer->lpni_nid));
+			lnet_nid_to_nid4(&msg->msg_txpeer->lpni_nid);
 	}
 
 	/* if we have response tracker block update it with the next hop
@@ -4259,11 +4260,11 @@ void lnet_monitor_thr_stop(void)
 
 	lnet_nid4_to_nid(from_nid4, &from_nid);
 
-	type = le32_to_cpu(hdr->type);
-	src_nid = le64_to_cpu(hdr->src_nid);
-	dest_nid = le64_to_cpu(hdr->dest_nid);
-	dest_pid = le32_to_cpu(hdr->dest_pid);
-	payload_length = le32_to_cpu(hdr->payload_length);
+	type = hdr->type;
+	src_nid = hdr->src_nid;
+	dest_nid = hdr->dest_nid;
+	dest_pid = hdr->dest_pid;
+	payload_length = hdr->payload_length;
 
 	/* FIXME handle large-addr nids */
 	for_me = (lnet_nid_to_nid4(&ni->ni_nid) == dest_nid);
@@ -4411,14 +4412,6 @@ void lnet_monitor_thr_stop(void)
 		msg->msg_target.pid = dest_pid;
 		lnet_nid4_to_nid(dest_nid, &msg->msg_target.nid);
 		msg->msg_routing = 1;
-	} else {
-		/* convert common msg->hdr fields to host byteorder */
-		msg->msg_hdr.type = type;
-		msg->msg_hdr.src_nid = src_nid;
-		le32_to_cpus(&msg->msg_hdr.src_pid);
-		msg->msg_hdr.dest_nid = dest_nid;
-		msg->msg_hdr.dest_pid = dest_pid;
-		msg->msg_hdr.payload_length = payload_length;
 	}
 
 	lnet_net_lock(cpt);
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 4102c7b..62a02ac 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -66,8 +66,8 @@
 
 	if (ev_type == LNET_EVENT_SEND) {
 		/* event for active message */
-		lnet_nid4_to_nid(le64_to_cpu(hdr->dest_nid), &ev->target.nid);
-		ev->target.pid = le32_to_cpu(hdr->dest_pid);
+		lnet_nid4_to_nid(hdr->dest_nid, &ev->target.nid);
+		ev->target.pid = hdr->dest_pid;
 		ev->initiator.nid = LNET_ANY_NID;
 		ev->initiator.pid = the_lnet.ln_pid;
 		ev->source.nid = LNET_ANY_NID;
diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index 02fc1ae..ee45767 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -427,9 +427,9 @@ struct lnet_drop_rule {
 		     lnet_nid_t local_nid,
 		     enum lnet_msg_hstatus *hstatus)
 {
-	lnet_nid_t src = le64_to_cpu(hdr->src_nid);
-	lnet_nid_t dst = le64_to_cpu(hdr->dest_nid);
-	unsigned int typ = le32_to_cpu(hdr->type);
+	lnet_nid_t src = hdr->src_nid;
+	lnet_nid_t dst = hdr->dest_nid;
+	unsigned int typ = hdr->type;
 	struct lnet_drop_rule *rule;
 	unsigned int ptl = -1;
 	bool drop = false;
@@ -605,9 +605,9 @@ struct delay_daemon_data {
 lnet_delay_rule_match_locked(struct lnet_hdr *hdr, struct lnet_msg *msg)
 {
 	struct lnet_delay_rule *rule;
-	lnet_nid_t src = le64_to_cpu(hdr->src_nid);
-	lnet_nid_t dst = le64_to_cpu(hdr->dest_nid);
-	unsigned int typ = le32_to_cpu(hdr->type);
+	lnet_nid_t src = hdr->src_nid;
+	lnet_nid_t dst = hdr->dest_nid;
+	unsigned int typ = hdr->type;
 	unsigned int ptl = -1;
 
 	/* NB: called with hold of lnet_net_lock */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 16/50] lnet: change lnet_hdr to store large nids.
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (14 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 15/50] lnet: separate lnet_hdr in msg from that in lnd James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 17/50] lnet: change lnet_prep_send to take net_processid James Simmons
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

'struct lnet_hdr' now has large-addr nids.  They are converted to
4-byte-addr on transmit, and converted back on receive.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 5bb421cdfd4ce6a29 ("LU-10391 lnet: change lnet_hdr to store large nids.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43604
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  8 +++---
 include/uapi/linux/lnet/lnet-idl.h  |  4 +--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  2 +-
 net/lnet/klnds/socklnd/socklnd_cb.c |  2 +-
 net/lnet/lnet/lib-move.c            | 49 ++++++++++++++++---------------------
 net/lnet/lnet/lib-msg.c             | 16 ++++++------
 net/lnet/lnet/lib-ptl.c             |  4 +--
 net/lnet/lnet/net_fault.c           |  8 +++---
 8 files changed, 43 insertions(+), 50 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 3c3a9d2..8c4940f 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -482,8 +482,8 @@ static inline void lnet_hdr_from_nid4(struct lnet_hdr *hdr,
 {
 	const struct _lnet_hdr_nid4 *hdr_nid4 = (void *)vhdr;
 
-	hdr->dest_nid = le64_to_cpu(hdr_nid4->dest_nid);
-	hdr->src_nid = le64_to_cpu(hdr_nid4->src_nid);
+	lnet_nid4_to_nid(le64_to_cpu(hdr_nid4->dest_nid), &hdr->dest_nid);
+	lnet_nid4_to_nid(le64_to_cpu(hdr_nid4->src_nid), &hdr->src_nid);
 	hdr->dest_pid = le32_to_cpu(hdr_nid4->dest_pid);
 	hdr->src_pid = le32_to_cpu(hdr_nid4->src_pid);
 	hdr->type = le32_to_cpu(hdr_nid4->type);
@@ -497,8 +497,8 @@ static inline void lnet_hdr_to_nid4(const struct lnet_hdr *hdr,
 {
 	struct _lnet_hdr_nid4 *hdr_nid4 = (void *)vhdr;
 
-	hdr_nid4->dest_nid = cpu_to_le64(hdr->dest_nid);
-	hdr_nid4->src_nid = cpu_to_le64(hdr->src_nid);
+	hdr_nid4->dest_nid = cpu_to_le64(lnet_nid_to_nid4(&hdr->dest_nid));
+	hdr_nid4->src_nid = cpu_to_le64(lnet_nid_to_nid4(&hdr->src_nid));
 	hdr_nid4->dest_pid = cpu_to_le32(hdr->dest_pid);
 	hdr_nid4->src_pid = cpu_to_le32(hdr->src_pid);
 	hdr_nid4->type = cpu_to_le32(hdr->type);
diff --git a/include/uapi/linux/lnet/lnet-idl.h b/include/uapi/linux/lnet/lnet-idl.h
index a19da76..74036e7 100644
--- a/include/uapi/linux/lnet/lnet-idl.h
+++ b/include/uapi/linux/lnet/lnet-idl.h
@@ -154,8 +154,8 @@ struct lnet_hello {
  *  All fields before the union are in host-byte-order.
  */
 struct lnet_hdr {
-	lnet_nid_t		dest_nid;
-	lnet_nid_t		src_nid;
+	struct lnet_nid		dest_nid;
+	struct lnet_nid		src_nid;
 	lnet_pid_t		dest_pid;
 	lnet_pid_t		src_pid;
 	__u32			type;		/* enum lnet_msg_type */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 8f24e26..c1be2f7 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1762,7 +1762,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[rlen]);
 		if (nob > rx->rx_nob) {
 			CERROR("Immediate message from %s too big: %d(%d)\n",
-			       libcfs_nid2str(lntmsg->msg_hdr.src_nid),
+			       libcfs_nidstr(&lntmsg->msg_hdr.src_nid),
 			       nob, rx->rx_nob);
 			rc = -EPROTO;
 			break;
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 40f3e79..925494b 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1184,7 +1184,7 @@ struct ksock_conn_cb *
 
 			/* Substitute process ID assigned at connection time */
 			hdr.src_pid = id->pid;
-			hdr.src_nid = lnet_nid_to_nid4(&id->nid);
+			hdr.src_nid = id->nid;
 		}
 
 		conn->ksnc_rx_state = SOCKNAL_RX_PARSE;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index f55b525..f4c24ff 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -528,7 +528,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	memset(&msg->msg_hdr, 0, sizeof(msg->msg_hdr));
 	msg->msg_hdr.type = type;
 	/* dest_nid will be overwritten by lnet_select_pathway() */
-	msg->msg_hdr.dest_nid = target.nid;
+	lnet_nid4_to_nid(target.nid, &msg->msg_hdr.dest_nid);
 	msg->msg_hdr.dest_pid = target.pid;
 	/* src_nid will be set later */
 	msg->msg_hdr.src_pid = the_lnet.ln_pid;
@@ -1626,11 +1626,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 
 	/* No send credit hassles with LOLND */
 	lnet_ni_addref_locked(the_lnet.ln_loni, cpt);
-	msg->msg_hdr.dest_nid =
-		lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid);
+	msg->msg_hdr.dest_nid = the_lnet.ln_loni->ni_nid;
 	if (!msg->msg_routing)
-		msg->msg_hdr.src_nid =
-			lnet_nid_to_nid4(&the_lnet.ln_loni->ni_nid);
+		msg->msg_hdr.src_nid = the_lnet.ln_loni->ni_nid;
 	msg->msg_target.nid = the_lnet.ln_loni->ni_nid;
 	lnet_msg_commit(msg, cpt);
 	msg->msg_txni = the_lnet.ln_loni;
@@ -1726,8 +1724,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	 * originator and set it here.
 	 */
 	if (!msg->msg_routing)
-		msg->msg_hdr.src_nid =
-			lnet_nid_to_nid4(&msg->msg_txni->ni_nid);
+		msg->msg_hdr.src_nid = msg->msg_txni->ni_nid;
 
 	if (routing) {
 		msg->msg_target_is_router = 1;
@@ -1741,15 +1738,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 * lnet_select_pathway() function and is never changed.
 		 * It's safe to use it here.
 		 */
-		/* FIXME handle large-addr nid */
-		msg->msg_hdr.dest_nid =
-			lnet_nid_to_nid4(&final_dst_lpni->lpni_nid);
+		msg->msg_hdr.dest_nid = final_dst_lpni->lpni_nid;
 	} else {
 		/* if we're not routing set the dest_nid to the best peer
 		 * ni NID that we picked earlier in the algorithm.
 		 */
-		msg->msg_hdr.dest_nid =
-			lnet_nid_to_nid4(&msg->msg_txpeer->lpni_nid);
+		msg->msg_hdr.dest_nid = msg->msg_txpeer->lpni_nid;
 	}
 
 	/* if we have response tracker block update it with the next hop
@@ -1768,10 +1762,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	rc = lnet_post_send_locked(msg, 0);
 	if (!rc)
 		CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) %s : %s try# %d\n",
-		       libcfs_nid2str(msg->msg_hdr.src_nid),
+		       libcfs_nidstr(&msg->msg_hdr.src_nid),
 		       libcfs_nidstr(&msg->msg_txni->ni_nid),
 		       libcfs_nidstr(&sd->sd_src_nid),
-		       libcfs_nid2str(msg->msg_hdr.dest_nid),
+		       libcfs_nidstr(&msg->msg_hdr.dest_nid),
 		       libcfs_nidstr(&sd->sd_dst_nid),
 		       libcfs_nidstr(&msg->msg_txpeer->lpni_nid),
 		       libcfs_nidstr(&sd->sd_rtr_nid),
@@ -2789,8 +2783,8 @@ struct lnet_ni *
 		struct lnet_peer *src_lp;
 		struct lnet_peer_ni *src_lpni;
 
-		src_lpni = lnet_nid2peerni_locked(msg->msg_hdr.src_nid,
-						  LNET_NID_ANY, cpt);
+		src_lpni = lnet_peerni_by_nid_locked(&msg->msg_hdr.src_nid,
+						     NULL, cpt);
 		/* We don't fail the send if we hit any errors here. We'll just
 		 * try to send it via non-multi-rail criteria
 		 */
@@ -3104,11 +3098,11 @@ struct lnet_mt_event_info {
 
 		list_del_init(&msg->msg_list);
 
-		lpni = lnet_find_peer_ni_locked(msg->msg_hdr.dest_nid);
+		lpni = lnet_peer_ni_find_locked(&msg->msg_hdr.dest_nid);
 		if (!lpni) {
 			lnet_net_unlock(cpt);
 			CERROR("Expected that a peer is already created for %s\n",
-			       libcfs_nid2str(msg->msg_hdr.dest_nid));
+			       libcfs_nidstr(&msg->msg_hdr.dest_nid));
 			msg->msg_no_resend = true;
 			lnet_finalize(msg, -EFAULT);
 			lnet_net_lock(cpt);
@@ -3994,7 +3988,7 @@ void lnet_monitor_thr_stop(void)
 	le32_to_cpus(&hdr->msg.get.sink_length);
 	le32_to_cpus(&hdr->msg.get.src_offset);
 
-	source_id.nid = hdr->src_nid;
+	source_id.nid = lnet_nid_to_nid4(&hdr->src_nid);
 	source_id.pid = hdr->src_pid;
 	/* Primary peer NID */
 	info.mi_id.nid = msg->msg_initiator;
@@ -4062,7 +4056,7 @@ void lnet_monitor_thr_stop(void)
 	cpt = lnet_cpt_of_cookie(hdr->msg.reply.dst_wmd.wh_object_cookie);
 	lnet_res_lock(cpt);
 
-	src.nid = hdr->src_nid;
+	src.nid = lnet_nid_to_nid4(&hdr->src_nid);
 	src.pid = hdr->src_pid;
 
 	/* NB handles only looked up by creator (no flips) */
@@ -4121,7 +4115,7 @@ void lnet_monitor_thr_stop(void)
 	struct lnet_libmd *md;
 	int cpt;
 
-	src.nid = hdr->src_nid;
+	src.nid = lnet_nid_to_nid4(&hdr->src_nid);
 	src.pid = hdr->src_pid;
 
 	/* Convert ack fields to host byte order */
@@ -4261,8 +4255,8 @@ void lnet_monitor_thr_stop(void)
 	lnet_nid4_to_nid(from_nid4, &from_nid);
 
 	type = hdr->type;
-	src_nid = hdr->src_nid;
-	dest_nid = hdr->dest_nid;
+	src_nid = lnet_nid_to_nid4(&hdr->src_nid);
+	dest_nid = lnet_nid_to_nid4(&hdr->dest_nid);
 	dest_pid = hdr->dest_pid;
 	payload_length = hdr->payload_length;
 
@@ -4554,7 +4548,7 @@ void lnet_monitor_thr_stop(void)
 
 		list_del(&msg->msg_list);
 
-		id.nid = msg->msg_hdr.src_nid;
+		id.nid = lnet_nid_to_nid4(&msg->msg_hdr.src_nid);
 		id.pid = msg->msg_hdr.src_pid;
 
 		LASSERT(!msg->msg_md);
@@ -4599,11 +4593,10 @@ void lnet_monitor_thr_stop(void)
 
 		list_del(&msg->msg_list);
 
-		/*
-		 * md won't disappear under me, since each msg
+		/* md won't disappear under me, since each msg
 		 * holds a ref on it
 		 */
-		id.nid = msg->msg_hdr.src_nid;
+		id.nid = lnet_nid_to_nid4(&msg->msg_hdr.src_nid);
 		id.pid = msg->msg_hdr.src_pid;
 
 		LASSERT(msg->msg_rx_delayed);
@@ -4870,7 +4863,7 @@ struct lnet_msg *
 		getmsg->msg_txpeer->lpni_peer_net->lpn_peer->lp_primary_nid;
 	msg->msg_from = peer_id->nid;
 	msg->msg_type = LNET_MSG_GET; /* flag this msg as an "optimized" GET */
-	msg->msg_hdr.src_nid = lnet_nid_to_nid4(&peer_id->nid);
+	msg->msg_hdr.src_nid = peer_id->nid;
 	msg->msg_hdr.payload_length = getmd->md_length;
 	msg->msg_receiving = 1; /* required by lnet_msg_attach_md */
 
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 62a02ac..9a4e268 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -66,7 +66,7 @@
 
 	if (ev_type == LNET_EVENT_SEND) {
 		/* event for active message */
-		lnet_nid4_to_nid(hdr->dest_nid, &ev->target.nid);
+		ev->target.nid = hdr->dest_nid;
 		ev->target.pid = hdr->dest_pid;
 		ev->initiator.nid = LNET_ANY_NID;
 		ev->initiator.pid = the_lnet.ln_pid;
@@ -76,13 +76,13 @@
 	} else {
 		/* event for passive message */
 		ev->target.pid = hdr->dest_pid;
-		lnet_nid4_to_nid(hdr->dest_nid, &ev->target.nid);
+		ev->target.nid = hdr->dest_nid;
 		ev->initiator.pid = hdr->src_pid;
 		/* Multi-Rail: resolve src_nid to "primary" peer NID */
 		ev->initiator.nid = msg->msg_initiator;
 		/* Multi-Rail: track source NID. */
 		ev->source.pid = hdr->src_pid;
-		lnet_nid4_to_nid(hdr->src_nid, &ev->source.nid);
+		ev->source.nid = hdr->src_nid;
 		ev->rlength = hdr->payload_length;
 		ev->sender = msg->msg_from;
 		ev->mlength = msg->msg_wanted;
@@ -638,15 +638,15 @@
 	 * this message consumed. The message will
 	 * consume another credit when it gets resent.
 	 */
-	lnet_nid4_to_nid(msg->msg_hdr.dest_nid, &msg->msg_target.nid);
+	msg->msg_target.nid = msg->msg_hdr.dest_nid;
 	lnet_msg_decommit_tx(msg, -EAGAIN);
 	msg->msg_sending = 0;
 	msg->msg_receiving = 0;
 	msg->msg_target_is_router = 0;
 
 	CDEBUG(D_NET, "%s->%s:%s:%s - queuing msg (%p) for resend\n",
-	       libcfs_nid2str(msg->msg_hdr.src_nid),
-	       libcfs_nid2str(msg->msg_hdr.dest_nid),
+	       libcfs_nidstr(&msg->msg_hdr.src_nid),
+	       libcfs_nidstr(&msg->msg_hdr.dest_nid),
 	       lnet_msgtyp2str(msg->msg_type),
 	       lnet_health_error2str(msg->msg_health_status), msg);
 
@@ -1116,9 +1116,9 @@
 
 	CDEBUG(D_NET,
 	       "src %s(%s)->dst %s: %s simulate health error: %s\n",
-	       libcfs_nid2str(msg->msg_hdr.src_nid),
+	       libcfs_nidstr(&msg->msg_hdr.src_nid),
 	       libcfs_nidstr(&msg->msg_txni->ni_nid),
-	       libcfs_nid2str(msg->msg_hdr.dest_nid),
+	       libcfs_nidstr(&msg->msg_hdr.dest_nid),
 	       lnet_msgtyp2str(msg->msg_type),
 	       lnet_health_error2str(*hstatus));
 
diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c
index 30628e5..0aad9a8 100644
--- a/net/lnet/lnet/lib-ptl.c
+++ b/net/lnet/lnet/lib-ptl.c
@@ -279,8 +279,8 @@ struct lnet_match_table *
 		return mtable;
 
 	/* it's a wildcard portal */
-	routed = LNET_NIDNET(msg->msg_hdr.src_nid) !=
-		 LNET_NIDNET(msg->msg_hdr.dest_nid);
+	routed = LNET_NID_NET(&msg->msg_hdr.src_nid) !=
+		 LNET_NID_NET(&msg->msg_hdr.dest_nid);
 
 	if (portal_rotor == LNET_PTL_ROTOR_OFF ||
 	    (portal_rotor != LNET_PTL_ROTOR_ON && !routed)) {
diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index ee45767..1f08b38 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -427,8 +427,8 @@ struct lnet_drop_rule {
 		     lnet_nid_t local_nid,
 		     enum lnet_msg_hstatus *hstatus)
 {
-	lnet_nid_t src = hdr->src_nid;
-	lnet_nid_t dst = hdr->dest_nid;
+	lnet_nid_t src = lnet_nid_to_nid4(&hdr->src_nid);
+	lnet_nid_t dst = lnet_nid_to_nid4(&hdr->dest_nid);
 	unsigned int typ = hdr->type;
 	struct lnet_drop_rule *rule;
 	unsigned int ptl = -1;
@@ -605,8 +605,8 @@ struct delay_daemon_data {
 lnet_delay_rule_match_locked(struct lnet_hdr *hdr, struct lnet_msg *msg)
 {
 	struct lnet_delay_rule *rule;
-	lnet_nid_t src = hdr->src_nid;
-	lnet_nid_t dst = hdr->dest_nid;
+	lnet_nid_t src = lnet_nid_to_nid4(&hdr->src_nid);
+	lnet_nid_t dst = lnet_nid_to_nid4(&hdr->dest_nid);
 	unsigned int typ = hdr->type;
 	unsigned int ptl = -1;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 17/50] lnet: change lnet_prep_send to take net_processid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (15 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 16/50] lnet: change lnet_hdr to store large nids James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 18/50] lnet: convert to struct lnet_process_id in lib-move James Simmons
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Instead of a 'struct lnet_process_id', lnet_prep_send() now takes a
"struct lnet_processid *" with allows larger address.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: c0ccfaf1e35935273 ("LU-10391 lnet: change lnet_prep_send to take net_processid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43605
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  2 +-
 net/lnet/lnet/lib-move.c      | 61 +++++++++++++++++++++++--------------------
 net/lnet/lnet/lib-msg.c       |  3 +--
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 8c4940f..ce18897 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -613,7 +613,7 @@ void lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md,
 void lnet_msg_decommit(struct lnet_msg *msg, int cpt, int status);
 
 void lnet_prep_send(struct lnet_msg *msg, int type,
-		    struct lnet_process_id target, unsigned int offset,
+		    struct lnet_processid *target, unsigned int offset,
 		    unsigned int len);
 int lnet_send(struct lnet_nid *nid, struct lnet_msg *msg,
 	      struct lnet_nid *rtr_nid);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index f4c24ff..1c72ea2 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -513,12 +513,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 }
 
 void
-lnet_prep_send(struct lnet_msg *msg, int type, struct lnet_process_id target,
+lnet_prep_send(struct lnet_msg *msg, int type, struct lnet_processid *target,
 	       unsigned int offset, unsigned int len)
 {
 	msg->msg_type = type;
-	msg->msg_target.pid = target.pid;
-	lnet_nid4_to_nid(target.nid, &msg->msg_target.nid);
+	msg->msg_target = *target;
 	msg->msg_len = len;
 	msg->msg_offset = offset;
 
@@ -528,8 +527,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	memset(&msg->msg_hdr, 0, sizeof(msg->msg_hdr));
 	msg->msg_hdr.type = type;
 	/* dest_nid will be overwritten by lnet_select_pathway() */
-	lnet_nid4_to_nid(target.nid, &msg->msg_hdr.dest_nid);
-	msg->msg_hdr.dest_pid = target.pid;
+	msg->msg_hdr.dest_nid = target->nid;
+	msg->msg_hdr.dest_pid = target->pid;
 	/* src_nid will be set later */
 	msg->msg_hdr.src_pid = the_lnet.ln_pid;
 	msg->msg_hdr.payload_length = len;
@@ -3978,7 +3977,7 @@ void lnet_monitor_thr_stop(void)
 {
 	struct lnet_match_info info;
 	struct lnet_hdr *hdr = &msg->msg_hdr;
-	struct lnet_process_id source_id;
+	struct lnet_processid source_id;
 	struct lnet_handle_wire reply_wmd;
 	int rc;
 
@@ -3988,7 +3987,7 @@ void lnet_monitor_thr_stop(void)
 	le32_to_cpus(&hdr->msg.get.sink_length);
 	le32_to_cpus(&hdr->msg.get.src_offset);
 
-	source_id.nid = lnet_nid_to_nid4(&hdr->src_nid);
+	source_id.nid = hdr->src_nid;
 	source_id.pid = hdr->src_pid;
 	/* Primary peer NID */
 	info.mi_id.nid = msg->msg_initiator;
@@ -4014,7 +4013,7 @@ void lnet_monitor_thr_stop(void)
 
 	reply_wmd = hdr->msg.get.return_wmd;
 
-	lnet_prep_send(msg, LNET_MSG_REPLY, source_id,
+	lnet_prep_send(msg, LNET_MSG_REPLY, &source_id,
 		       msg->msg_offset, msg->msg_wanted);
 
 	msg->msg_hdr.msg.reply.dst_wmd = reply_wmd;
@@ -4702,11 +4701,12 @@ void lnet_monitor_thr_stop(void)
  */
 int
 LNetPut(lnet_nid_t self4, struct lnet_handle_md mdh, enum lnet_ack_req ack,
-	struct lnet_process_id target, unsigned int portal,
+	struct lnet_process_id target4, unsigned int portal,
 	u64 match_bits, unsigned int offset,
 	u64 hdr_data)
 {
 	struct lnet_rsp_tracker *rspt = NULL;
+	struct lnet_processid target;
 	struct lnet_msg *msg;
 	struct lnet_libmd *md;
 	struct lnet_nid self;
@@ -4716,18 +4716,20 @@ void lnet_monitor_thr_stop(void)
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	lnet_nid4_to_nid(self4, &self);
+	lnet_nid4_to_nid(target4.nid, &target.nid);
+	target.pid = target4.pid;
 
-	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
-	    fail_peer(target.nid, 1)) { /* shall we now? */
+	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
+	    fail_peer(target4.nid, 1)) {		/* shall we now? */
 		CERROR("Dropping PUT to %s: simulated failure\n",
-		       libcfs_id2str(target));
+		       libcfs_id2str(target4));
 		return -EIO;
 	}
 
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("Dropping PUT to %s: ENOMEM on struct lnet_msg\n",
-		       libcfs_id2str(target));
+		       libcfs_id2str(target4));
 		return -ENOMEM;
 	}
 	msg->msg_vmflush = !!(current->flags & PF_MEMALLOC);
@@ -4738,7 +4740,7 @@ void lnet_monitor_thr_stop(void)
 		rspt = lnet_rspt_alloc(cpt);
 		if (!rspt) {
 			CERROR("Dropping PUT to %s: ENOMEM on response tracker\n",
-			       libcfs_id2str(target));
+			       libcfs_id2str(target4));
 			return -ENOMEM;
 		}
 		INIT_LIST_HEAD(&rspt->rspt_on_list);
@@ -4749,7 +4751,7 @@ void lnet_monitor_thr_stop(void)
 	md = lnet_handle2md(&mdh);
 	if (!md || !md->md_threshold || md->md_me) {
 		CERROR("Dropping PUT (%llu:%d:%s): MD (%d) invalid\n",
-		       match_bits, portal, libcfs_id2str(target),
+		       match_bits, portal, libcfs_id2str(target4),
 		       !md ? -1 : md->md_threshold);
 		if (md && md->md_me)
 			CERROR("Source MD also attached to portal %d\n",
@@ -4763,11 +4765,11 @@ void lnet_monitor_thr_stop(void)
 		return -ENOENT;
 	}
 
-	CDEBUG(D_NET, "%s -> %s\n", __func__, libcfs_id2str(target));
+	CDEBUG(D_NET, "%s -> %s\n", __func__, libcfs_id2str(target4));
 
 	lnet_msg_attach_md(msg, md, 0, 0);
 
-	lnet_prep_send(msg, LNET_MSG_PUT, target, 0, md->md_length);
+	lnet_prep_send(msg, LNET_MSG_PUT, &target, 0, md->md_length);
 
 	msg->msg_hdr.msg.put.match_bits = cpu_to_le64(match_bits);
 	msg->msg_hdr.msg.put.ptl_index = cpu_to_le32(portal);
@@ -4804,7 +4806,7 @@ void lnet_monitor_thr_stop(void)
 		rc = lnet_send(&self, msg, NULL);
 	if (rc) {
 		CNETERR("Error sending PUT to %s: %d\n",
-			libcfs_id2str(target), rc);
+			libcfs_id2str(target4), rc);
 		msg->msg_no_resend = true;
 		lnet_finalize(msg, rc);
 	}
@@ -4946,10 +4948,11 @@ struct lnet_msg *
  */
 int
 LNetGet(lnet_nid_t self4, struct lnet_handle_md mdh,
-	struct lnet_process_id target, unsigned int portal,
+	struct lnet_process_id target4, unsigned int portal,
 	u64 match_bits, unsigned int offset, bool recovery)
 {
 	struct lnet_rsp_tracker *rspt;
+	struct lnet_processid target;
 	struct lnet_msg *msg;
 	struct lnet_libmd *md;
 	struct lnet_nid self;
@@ -4959,18 +4962,20 @@ struct lnet_msg *
 	LASSERT(the_lnet.ln_refcount > 0);
 
 	lnet_nid4_to_nid(self4, &self);
+	lnet_nid4_to_nid(target4.nid, &target.nid);
+	target.pid = target4.pid;
 
-	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
-	    fail_peer(target.nid, 1)) {		/* shall we now? */
+	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
+	    fail_peer(target4.nid, 1)) {		/* shall we now? */
 		CERROR("Dropping GET to %s: simulated failure\n",
-		       libcfs_id2str(target));
+		       libcfs_id2str(target4));
 		return -EIO;
 	}
 
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("Dropping GET to %s: ENOMEM on struct lnet_msg\n",
-		       libcfs_id2str(target));
+		       libcfs_id2str(target4));
 		return -ENOMEM;
 	}
 
@@ -4979,7 +4984,7 @@ struct lnet_msg *
 	rspt = lnet_rspt_alloc(cpt);
 	if (!rspt) {
 		CERROR("Dropping GET to %s: ENOMEM on response tracker\n",
-		       libcfs_id2str(target));
+		       libcfs_id2str(target4));
 		return -ENOMEM;
 	}
 	INIT_LIST_HEAD(&rspt->rspt_on_list);
@@ -4991,7 +4996,7 @@ struct lnet_msg *
 	md = lnet_handle2md(&mdh);
 	if (!md || !md->md_threshold || md->md_me) {
 		CERROR("Dropping GET (%llu:%d:%s): MD (%d) invalid\n",
-		       match_bits, portal, libcfs_id2str(target),
+		       match_bits, portal, libcfs_id2str(target4),
 		       !md ? -1 : md->md_threshold);
 		if (md && md->md_me)
 			CERROR("REPLY MD also attached to portal %d\n",
@@ -5004,11 +5009,11 @@ struct lnet_msg *
 		return -ENOENT;
 	}
 
-	CDEBUG(D_NET, "%s -> %s\n", __func__, libcfs_id2str(target));
+	CDEBUG(D_NET, "%s -> %s\n", __func__, libcfs_id2str(target4));
 
 	lnet_msg_attach_md(msg, md, 0, 0);
 
-	lnet_prep_send(msg, LNET_MSG_GET, target, 0, 0);
+	lnet_prep_send(msg, LNET_MSG_GET, &target, 0, 0);
 
 	msg->msg_hdr.msg.get.match_bits = cpu_to_le64(match_bits);
 	msg->msg_hdr.msg.get.ptl_index = cpu_to_le32(portal);
@@ -5033,7 +5038,7 @@ struct lnet_msg *
 	rc = lnet_send(&self, msg, NULL);
 	if (rc < 0) {
 		CNETERR("Error sending GET to %s: %d\n",
-			libcfs_id2str(target), rc);
+			libcfs_id2str(target4), rc);
 		msg->msg_no_resend = true;
 		lnet_finalize(msg, rc);
 	}
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 9a4e268..88f017b 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -390,8 +390,7 @@
 
 		ack_wmd = msg->msg_hdr.msg.put.ack_wmd;
 
-		lnet_prep_send(msg, LNET_MSG_ACK,
-			       lnet_pid_to_pid4(&msg->msg_ev.source), 0, 0);
+		lnet_prep_send(msg, LNET_MSG_ACK, &msg->msg_ev.source, 0, 0);
 
 		msg->msg_hdr.msg.ack.dst_wmd = ack_wmd;
 		msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 18/50] lnet: convert to struct lnet_process_id in lib-move
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (16 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 17/50] lnet: change lnet_prep_send to take net_processid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 19/50] lnet: convert LNetGetID to return an large-addr pid James Simmons
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Various functions in lib-move.c create a 'struct lnet_process_id' just
for the purpose of reporting it in error/debug messages.

Change these to 'struct lnet_processid' with larger address support.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 9feddf7e5d01be437 ("LU-10391 lnet: convert to struct lnet_process_id in lib-move")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43606
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 1c72ea2..aa230d7 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4046,7 +4046,7 @@ void lnet_monitor_thr_stop(void)
 {
 	void *private = msg->msg_private;
 	struct lnet_hdr *hdr = &msg->msg_hdr;
-	struct lnet_process_id src = { 0 };
+	struct lnet_processid src = {};
 	struct lnet_libmd *md;
 	int rlength;
 	int mlength;
@@ -4055,14 +4055,14 @@ void lnet_monitor_thr_stop(void)
 	cpt = lnet_cpt_of_cookie(hdr->msg.reply.dst_wmd.wh_object_cookie);
 	lnet_res_lock(cpt);
 
-	src.nid = lnet_nid_to_nid4(&hdr->src_nid);
+	src.nid = hdr->src_nid;
 	src.pid = hdr->src_pid;
 
 	/* NB handles only looked up by creator (no flips) */
 	md = lnet_wire_handle2md(&hdr->msg.reply.dst_wmd);
 	if (!md || !md->md_threshold || md->md_me) {
 		CNETERR("%s: Dropping REPLY from %s for %s MD %#llx.%#llx\n",
-			libcfs_nidstr(&ni->ni_nid), libcfs_id2str(src),
+			libcfs_nidstr(&ni->ni_nid), libcfs_idstr(&src),
 			!md ? "invalid" : "inactive",
 			hdr->msg.reply.dst_wmd.wh_interface_cookie,
 			hdr->msg.reply.dst_wmd.wh_object_cookie);
@@ -4082,7 +4082,7 @@ void lnet_monitor_thr_stop(void)
 	if (mlength < rlength &&
 	    !(md->md_options & LNET_MD_TRUNCATE)) {
 		CNETERR("%s: Dropping REPLY from %s length %d for MD %#llx would overflow (%d)\n",
-			libcfs_nidstr(&ni->ni_nid), libcfs_id2str(src),
+			libcfs_nidstr(&ni->ni_nid), libcfs_idstr(&src),
 			rlength, hdr->msg.reply.dst_wmd.wh_object_cookie,
 			mlength);
 		lnet_res_unlock(cpt);
@@ -4090,7 +4090,7 @@ void lnet_monitor_thr_stop(void)
 	}
 
 	CDEBUG(D_NET, "%s: Reply from %s of length %d/%d into md %#llx\n",
-	       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(src),
+	       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(&src),
 	       mlength, rlength, hdr->msg.reply.dst_wmd.wh_object_cookie);
 
 	lnet_msg_attach_md(msg, md, 0, mlength);
@@ -4110,11 +4110,11 @@ void lnet_monitor_thr_stop(void)
 lnet_parse_ack(struct lnet_ni *ni, struct lnet_msg *msg)
 {
 	struct lnet_hdr *hdr = &msg->msg_hdr;
-	struct lnet_process_id src = { 0 };
+	struct lnet_processid src = {};
 	struct lnet_libmd *md;
 	int cpt;
 
-	src.nid = lnet_nid_to_nid4(&hdr->src_nid);
+	src.nid = hdr->src_nid;
 	src.pid = hdr->src_pid;
 
 	/* Convert ack fields to host byte order */
@@ -4130,7 +4130,7 @@ void lnet_monitor_thr_stop(void)
 		/* Don't moan; this is expected */
 		CDEBUG(D_NET,
 		       "%s: Dropping ACK from %s to %s MD %#llx.%#llx\n",
-		       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(src),
+		       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(&src),
 		       !md ? "invalid" : "inactive",
 		       hdr->msg.ack.dst_wmd.wh_interface_cookie,
 		       hdr->msg.ack.dst_wmd.wh_object_cookie);
@@ -4143,7 +4143,7 @@ void lnet_monitor_thr_stop(void)
 	}
 
 	CDEBUG(D_NET, "%s: ACK from %s into md %#llx\n",
-	       libcfs_nidstr(&ni->ni_nid), libcfs_id2str(src),
+	       libcfs_nidstr(&ni->ni_nid), libcfs_idstr(&src),
 	       hdr->msg.ack.dst_wmd.wh_object_cookie);
 
 	lnet_msg_attach_md(msg, md, 0, 0);
@@ -4543,11 +4543,11 @@ void lnet_monitor_thr_stop(void)
 
 	while ((msg = list_first_entry_or_null(head, struct lnet_msg,
 					       msg_list)) != NULL) {
-		struct lnet_process_id id = { 0 };
+		struct lnet_processid id = {};
 
 		list_del(&msg->msg_list);
 
-		id.nid = lnet_nid_to_nid4(&msg->msg_hdr.src_nid);
+		id.nid = msg->msg_hdr.src_nid;
 		id.pid = msg->msg_hdr.src_pid;
 
 		LASSERT(!msg->msg_md);
@@ -4556,7 +4556,7 @@ void lnet_monitor_thr_stop(void)
 		LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);
 
 		CWARN("Dropping delayed PUT from %s portal %d match %llu offset %d length %d: %s\n",
-		      libcfs_id2str(id),
+		      libcfs_idstr(&id),
 		      msg->msg_hdr.msg.put.ptl_index,
 		      msg->msg_hdr.msg.put.match_bits,
 		      msg->msg_hdr.msg.put.offset,
@@ -4588,14 +4588,14 @@ void lnet_monitor_thr_stop(void)
 
 	while ((msg = list_first_entry_or_null(head, struct lnet_msg,
 					       msg_list)) != NULL) {
-		struct lnet_process_id id;
+		struct lnet_processid id;
 
 		list_del(&msg->msg_list);
 
 		/* md won't disappear under me, since each msg
 		 * holds a ref on it
 		 */
-		id.nid = lnet_nid_to_nid4(&msg->msg_hdr.src_nid);
+		id.nid = msg->msg_hdr.src_nid;
 		id.pid = msg->msg_hdr.src_pid;
 
 		LASSERT(msg->msg_rx_delayed);
@@ -4605,7 +4605,7 @@ void lnet_monitor_thr_stop(void)
 		LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);
 
 		CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",
-		       libcfs_id2str(id), msg->msg_hdr.msg.put.ptl_index,
+		       libcfs_idstr(&id), msg->msg_hdr.msg.put.ptl_index,
 		       msg->msg_hdr.msg.put.match_bits,
 		       msg->msg_hdr.msg.put.offset,
 		       msg->msg_hdr.payload_length);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 19/50] lnet: convert LNetGetID to return an large-addr pid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (17 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 18/50] lnet: convert to struct lnet_process_id in lib-move James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 20/50] lnet: alter lnd_notify_peer_down() to take lnet_nid James Simmons
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

LNetGetID now returns a 'struct processid' containing an
large-address nid.

Various places still convert it to a 4-byte-addr nid for use.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: a42e0f6471bf5aad3 ("LU-10391 lnet: convert LNetGetID to return an large-addr pid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43607
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_lib.c |  7 ++++---
 fs/lustre/lmv/lmv_obd.c     | 12 ++++++------
 include/linux/lnet/api.h    |  2 +-
 net/lnet/lnet/api-ni.c      | 27 ++++++++++++++-------------
 net/lnet/selftest/console.c |  4 ++--
 5 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 1121652..4c91a78 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3400,7 +3400,7 @@ ssize_t ll_copy_user_md(const struct lov_user_md __user *md,
 void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
 {
 	struct root_squash_info *squash = &sbi->ll_squash;
-	struct lnet_process_id id;
+	struct lnet_processid id;
 	bool matched;
 	int i;
 
@@ -3416,9 +3416,10 @@ void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
 		i = 0;
 
 		while (LNetGetId(i++, &id) != -ENOENT) {
-			if (id.nid == LNET_NID_LO_0)
+			if (nid_is_lo0(&id.nid))
 				continue;
-			if (cfs_match_nid(id.nid, &squash->rsi_nosquash_nids)) {
+			if (cfs_match_nid(lnet_nid_to_nid4(&id.nid),
+					  &squash->rsi_nosquash_nids)) {
 				matched = true;
 				break;
 			}
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 5fd00d3..5b43cfd 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1084,7 +1084,7 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 {
 	struct lmv_obd *lmv = &obd->u.lmv;
 	struct lmv_desc *desc;
-	struct lnet_process_id lnet_id;
+	struct lnet_processid lnet_id;
 	int i = 0;
 	int rc;
 
@@ -1116,8 +1116,8 @@ static int lmv_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	 * can distribute subdirs evenly from the beginning.
 	 */
 	while (LNetGetId(i++, &lnet_id) != -ENOENT) {
-		if (lnet_id.nid != LNET_NID_LO_0) {
-			lmv->lmv_qos_rr_index = (u32)lnet_id.nid;
+		if (!nid_is_lo0(&lnet_id.nid)) {
+			lmv->lmv_qos_rr_index = ntohl(lnet_id.nid.nid_addr[0]);
 			break;
 		}
 	}
@@ -1205,16 +1205,16 @@ static int lmv_select_statfs_mdt(struct lmv_obd *lmv, u32 flags)
 
 	/* choose initial MDT for this client */
 	for (i = 0;; i++) {
-		struct lnet_process_id lnet_id;
+		struct lnet_processid lnet_id;
 
 		if (LNetGetId(i, &lnet_id) == -ENOENT)
 			break;
 
-		if (lnet_id.nid != LNET_NID_LO_0) {
+		if (!nid_is_lo0(&lnet_id.nid)) {
 			/* We dont need a full 64-bit modulus, just enough
 			 * to distribute the requests across MDTs evenly.
 			 */
-			lmv->lmv_statfs_start = (u32)lnet_id.nid %
+			lmv->lmv_statfs_start = nidhash(&lnet_id.nid) %
 						lmv->lmv_mdt_count;
 			break;
 		}
diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index ee0a9a6..6d8e915 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -75,7 +75,7 @@
  * \see LNetMEAttach
  * @{
  */
-int LNetGetId(unsigned int index, struct lnet_process_id *id);
+int LNetGetId(unsigned int index, struct lnet_processid *id);
 int LNetDist(lnet_nid_t nid, lnet_nid_t *srcnid, u32 *order);
 lnet_nid_t LNetPrimaryNID(lnet_nid_t nid);
 
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 2221b19..d7ada85 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -205,7 +205,7 @@ static void lnet_set_lnd_timeout(void)
  */
 static atomic_t lnet_dlc_seq_no = ATOMIC_INIT(0);
 
-static int lnet_ping(struct lnet_process_id id, lnet_nid_t src_nid,
+static int lnet_ping(struct lnet_process_id id, struct lnet_nid *src_nid,
 		     signed long timeout, struct lnet_process_id __user *ids,
 		     int n_ids);
 
@@ -3865,7 +3865,8 @@ u32 lnet_get_dlc_seq_locked(void)
 {
 	struct libcfs_ioctl_data *data = arg;
 	struct lnet_ioctl_config_data *config;
-	struct lnet_process_id id = { 0 };
+	struct lnet_process_id id4 = {};
+	struct lnet_processid id = {};
 	struct lnet_ni *ni;
 	struct lnet_nid nid;
 	int rc;
@@ -3877,7 +3878,7 @@ u32 lnet_get_dlc_seq_locked(void)
 	switch (cmd) {
 	case IOC_LIBCFS_GET_NI:
 		rc = LNetGetId(data->ioc_count, &id);
-		data->ioc_nid = id.nid;
+		data->ioc_nid = lnet_nid_to_nid4(&id.nid);
 		return rc;
 
 	case IOC_LIBCFS_FAIL_NID:
@@ -4258,8 +4259,8 @@ u32 lnet_get_dlc_seq_locked(void)
 	case IOC_LIBCFS_PING: {
 		signed long timeout;
 
-		id.nid = data->ioc_nid;
-		id.pid = data->ioc_u32[0];
+		id4.nid = data->ioc_nid;
+		id4.pid = data->ioc_u32[0];
 
 		/* If timeout is negative then set default of 3 minutes */
 		if (((s32)data->ioc_u32[1] <= 0) ||
@@ -4268,7 +4269,7 @@ u32 lnet_get_dlc_seq_locked(void)
 		else
 			timeout = msecs_to_jiffies(data->ioc_u32[1]);
 
-		rc = lnet_ping(id, LNET_NID_ANY, timeout, data->ioc_pbuf1,
+		rc = lnet_ping(id4, &LNET_ANY_NID, timeout, data->ioc_pbuf1,
 			       data->ioc_plen1 / sizeof(struct lnet_process_id));
 
 		if (rc < 0)
@@ -4280,9 +4281,9 @@ u32 lnet_get_dlc_seq_locked(void)
 
 	case IOC_LIBCFS_PING_PEER: {
 		struct lnet_ioctl_ping_data *ping = arg;
+		struct lnet_nid src_nid = LNET_ANY_NID;
 		struct lnet_peer *lp;
 		signed long timeout;
-		lnet_nid_t src_nid = LNET_NID_ANY;
 
 		/* Check if the supplied ping data supports source nid
 		 * NB: This check is sufficient if lnet_ioctl_ping_data has
@@ -4294,7 +4295,7 @@ u32 lnet_get_dlc_seq_locked(void)
 		 */
 		if (ping->ping_hdr.ioc_len >=
 		    sizeof(struct lnet_ioctl_ping_data))
-			src_nid = ping->ping_src;
+			lnet_nid4_to_nid(ping->ping_src, &src_nid);
 
 		/* If timeout is negative then set default of 3 minutes */
 		if (((s32)ping->op_param) <= 0 ||
@@ -4303,7 +4304,7 @@ u32 lnet_get_dlc_seq_locked(void)
 		else
 			timeout = msecs_to_jiffies(ping->op_param);
 
-		rc = lnet_ping(ping->ping_id, src_nid, timeout,
+		rc = lnet_ping(ping->ping_id, &src_nid, timeout,
 			       ping->ping_buf,
 			       ping->ping_count);
 		if (rc < 0)
@@ -4482,7 +4483,7 @@ void LNetDebugPeer(struct lnet_processid *id)
  *		-ENOENT If no interface has been found.
  */
 int
-LNetGetId(unsigned int index, struct lnet_process_id *id)
+LNetGetId(unsigned int index, struct lnet_processid *id)
 {
 	struct lnet_ni *ni;
 	struct lnet_net *net;
@@ -4501,7 +4502,7 @@ void LNetDebugPeer(struct lnet_processid *id)
 			if (index-- != 0)
 				continue;
 
-			id->nid = lnet_nid_to_nid4(&ni->ni_nid);
+			id->nid = ni->ni_nid;
 			id->pid = the_lnet.ln_pid;
 			rc = 0;
 			break;
@@ -4540,7 +4541,7 @@ struct ping_data {
 		complete(&pd->completion);
 }
 
-static int lnet_ping(struct lnet_process_id id, lnet_nid_t src_nid,
+static int lnet_ping(struct lnet_process_id id, struct lnet_nid *src_nid,
 		     signed long timeout, struct lnet_process_id __user *ids,
 		     int n_ids)
 {
@@ -4587,7 +4588,7 @@ static int lnet_ping(struct lnet_process_id id, lnet_nid_t src_nid,
 		goto fail_ping_buffer_decref;
 	}
 
-	rc = LNetGet(src_nid, pd.mdh, id,
+	rc = LNetGet(lnet_nid_to_nid4(src_nid), pd.mdh, id,
 		     LNET_RESERVED_PORTAL,
 		     LNET_PROTO_PING_MATCHBITS, 0, false);
 	if (rc) {
diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c
index 38b169f..85e9300 100644
--- a/net/lnet/selftest/console.c
+++ b/net/lnet/selftest/console.c
@@ -1688,12 +1688,12 @@ static void lstcon_group_ndlink_release(struct lstcon_group *,
 static void
 lstcon_new_session_id(struct lst_sid *sid)
 {
-	struct lnet_process_id id;
+	struct lnet_processid id;
 
 	LASSERT(console_session.ses_state == LST_SESSION_NONE);
 
 	LNetGetId(1, &id);
-	sid->ses_nid = id.nid;
+	sid->ses_nid = lnet_nid_to_nid4(&id.nid);
 	sid->ses_stamp = div_u64(ktime_get_ns(), NSEC_PER_MSEC);
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 20/50] lnet: alter lnd_notify_peer_down() to take lnet_nid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (18 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 19/50] lnet: convert LNetGetID to return an large-addr pid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 21/50] lnet: socklnd: move lnet_hdr unpack into ->pro_unpack James Simmons
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The lnd_notify_peer_down() interface now takes a large nid.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 6a9bdf59e6306d49c ("LU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43608
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h   |  2 +-
 net/lnet/klnds/socklnd/socklnd.c | 12 +++++-------
 net/lnet/lnet/router.c           |  4 ++--
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 40767e6..f7f0b0b 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -328,7 +328,7 @@ struct lnet_lnd {
 			      struct lnet_msg *msg, void **new_privatep);
 
 	/* notification of peer down */
-	void (*lnd_notify_peer_down)(lnet_nid_t peer);
+	void (*lnd_notify_peer_down)(struct lnet_nid *peer);
 
 	/* accept a new connection */
 	int (*lnd_accept)(struct lnet_ni *ni, struct socket *sock);
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 6d1f85c..e3201d1 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -1670,25 +1670,23 @@ struct ksock_peer_ni *
 }
 
 void
-ksocknal_notify_gw_down(lnet_nid_t gw_nid)
+ksocknal_notify_gw_down(struct lnet_nid *gw_nid)
 {
-	/*
-	 * The router is telling me she's been notified of a change in
+	/* The router is telling me she's been notified of a change in
 	 * gateway state....
 	 */
 	struct lnet_processid id = {
 		.pid	= LNET_PID_ANY,
+		.nid	= *gw_nid,
 	};
 
-	CDEBUG(D_NET, "gw %s down\n", libcfs_nid2str(gw_nid));
+	CDEBUG(D_NET, "gw %s down\n", libcfs_nidstr(gw_nid));
 
-	lnet_nid4_to_nid(gw_nid, &id.nid);
 	/* If the gateway crashed, close all open connections... */
 	ksocknal_close_matching_conns(&id, 0);
 	return;
 
-	/*
-	 * We can only establish new connections
+	/* We can only establish new connections
 	 * if we have autroutes, and these connect on demand.
 	 */
 }
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 97e5ab2..87ae1f9 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1681,7 +1681,7 @@ bool lnet_router_checker_active(void)
 }
 
 static inline void
-lnet_notify_peer_down(struct lnet_ni *ni, lnet_nid_t nid)
+lnet_notify_peer_down(struct lnet_ni *ni, struct lnet_nid *nid)
 {
 	if (ni->ni_net->net_lnd->lnd_notify_peer_down)
 		ni->ni_net->net_lnd->lnd_notify_peer_down(nid);
@@ -1796,7 +1796,7 @@ bool lnet_router_checker_active(void)
 	lnet_net_unlock(0);
 
 	if (ni && !alive)
-		lnet_notify_peer_down(ni, lnet_nid_to_nid4(&lpni->lpni_nid));
+		lnet_notify_peer_down(ni, &lpni->lpni_nid);
 
 	cpt = lpni->lpni_cpt;
 	lnet_net_lock(cpt);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 21/50] lnet: socklnd: move lnet_hdr unpack into ->pro_unpack
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (19 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 20/50] lnet: alter lnd_notify_peer_down() to take lnet_nid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 22/50] lnet: socklnd: Change ksock_hello_msg to struct lnet_nid James Simmons
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Converting the lnet_hdr from network-format to host-format
is currently done in ksocknal_process_recv().
Move it to ->pro_unpack() so that a different protocol
can send it in a different format.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 15365f3de34ed7d25 ("LU-10391 socklnd: move lnet_hdr unpack into ->pro_unpack")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43609
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.h       | 2 +-
 net/lnet/klnds/socklnd/socklnd_cb.c    | 4 +---
 net/lnet/klnds/socklnd/socklnd_proto.c | 7 ++++---
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 5d0be68..bd38ee2 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -473,7 +473,7 @@ struct ksock_proto {
 	void	(*pro_pack)(struct ksock_tx *);
 
 	/* message unpack */
-	void	(*pro_unpack)(struct ksock_msg *);
+	void	(*pro_unpack)(struct ksock_msg *msg, struct lnet_hdr *hdr);
 
 	/* queue tx on the connection */
 	struct ksock_tx *(*pro_queue_tx_msg)(struct ksock_conn *, struct ksock_tx *);
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 925494b..822de50 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1174,9 +1174,7 @@ struct ksock_conn_cb *
 
 	case SOCKNAL_RX_LNET_HEADER:
 		/* unpack message header */
-		conn->ksnc_proto->pro_unpack(&conn->ksnc_msg);
-
-		lnet_hdr_from_nid4(&hdr, &conn->ksnc_msg.ksm_u.lnetmsg_nid4);
+		conn->ksnc_proto->pro_unpack(&conn->ksnc_msg, &hdr);
 
 		if (conn->ksnc_peer->ksnp_id.pid & LNET_PID_USERFLAG) {
 			/* Userspace peer_ni */
diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c
index 20a582b..14b1394 100644
--- a/net/lnet/klnds/socklnd/socklnd_proto.c
+++ b/net/lnet/klnds/socklnd/socklnd_proto.c
@@ -760,18 +760,19 @@
 }
 
 static void
-ksocknal_unpack_msg_v1(struct ksock_msg *msg)
+ksocknal_unpack_msg_v1(struct ksock_msg *msg, struct lnet_hdr *hdr)
 {
 	msg->ksm_csum = 0;
 	msg->ksm_type = KSOCK_MSG_LNET;
 	msg->ksm_zc_cookies[0] = 0;
 	msg->ksm_zc_cookies[1] = 0;
+	lnet_hdr_from_nid4(hdr, &msg->ksm_u.lnetmsg_nid4);
 }
 
 static void
-ksocknal_unpack_msg_v2(struct ksock_msg *msg)
+ksocknal_unpack_msg_v2(struct ksock_msg *msg, struct lnet_hdr *hdr)
 {
-	return;  /* Do nothing */
+	lnet_hdr_from_nid4(hdr, &msg->ksm_u.lnetmsg_nid4);
 }
 
 const struct ksock_proto ksocknal_protocol_v1x = {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 22/50] lnet: socklnd: Change ksock_hello_msg to struct lnet_nid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (20 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 21/50] lnet: socklnd: move lnet_hdr unpack into ->pro_unpack James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 23/50] lnet: socklnd: add hello message version 4 James Simmons
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

'struct ksock_hello_msg' now stores 'struct lnet_nid', but it is
converted to 'struct ksock_hello_msg_nid4' - the old format - for
transmit, which is converted back on receive.

This opens the way for a new version of the hello protocol
which will use 16byte addresses.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: d1fb459cca931f84f ("LU-10391 socklnd: Change ksock_hello_msg to struct lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43610
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/socklnd.h           | 24 +++++++++---
 net/lnet/klnds/socklnd/socklnd.c       | 35 ++++++++---------
 net/lnet/klnds/socklnd/socklnd.h       |  5 ++-
 net/lnet/klnds/socklnd/socklnd_cb.c    | 33 ++++++++--------
 net/lnet/klnds/socklnd/socklnd_proto.c | 70 +++++++++++++++++++++++++++-------
 5 files changed, 110 insertions(+), 57 deletions(-)

diff --git a/include/linux/lnet/socklnd.h b/include/linux/lnet/socklnd.h
index 025112b..ddfcf76 100644
--- a/include/linux/lnet/socklnd.h
+++ b/include/linux/lnet/socklnd.h
@@ -39,17 +39,31 @@
 #include <uapi/linux/lnet/socklnd.h>
 
 struct ksock_hello_msg {
-	u32		kshm_magic;	/* magic number of socklnd message */
-	u32		kshm_version;	/* version of socklnd message */
+	u32		kshm_magic;	/* LNET_PROTO_MAGIC */
+	u32		kshm_version;	/* KSOCK_PROTO_V* */
+	struct lnet_nid	kshm_src_nid;	/* sender's nid */
+	struct lnet_nid	kshm_dst_nid;	/* destination nid */
+	lnet_pid_t	kshm_src_pid;	/* sender's pid */
+	lnet_pid_t	kshm_dst_pid;   /* destination pid */
+	u64		kshm_src_incarnation; /* sender's incarnation */
+	u64		kshm_dst_incarnation; /* destination's incarnation */
+	u32		kshm_ctype;	/* SOCKLND_CONN_* */
+	u32		kshm_nips;	/* always sent as zero */
+	u32		kshm_ips[0];	/* deprecated */
+} __packed;
+
+struct ksock_hello_msg_nid4 {
+	u32		kshm_magic;	/* LNET_PROTO_MAGIC */
+	u32		kshm_version;	/* KSOCK_PROTO_V* */
 	lnet_nid_t	kshm_src_nid;	/* sender's nid */
 	lnet_nid_t	kshm_dst_nid;	/* destination nid */
 	lnet_pid_t	kshm_src_pid;	/* sender's pid */
 	lnet_pid_t	kshm_dst_pid;	/* destination pid */
 	u64		kshm_src_incarnation; /* sender's incarnation */
 	u64		kshm_dst_incarnation; /* destination's incarnation */
-	u32		kshm_ctype;	/* connection type */
-	u32		kshm_nips;	/* # IP addrs */
-	u32		kshm_ips[0];	/* IP addrs */
+	u32		kshm_ctype;	/* SOCKLND_CONN_* */
+	u32		kshm_nips;	/* sent as zero */
+	u32		kshm_ips[0];	/* deprecated */
 } __packed;
 
 struct ksock_msg_hdr {
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index e3201d1..4267832 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -854,7 +854,7 @@ struct ksock_peer_ni *
 {
 	rwlock_t *global_lock = &ksocknal_data.ksnd_global_lock;
 	LIST_HEAD(zombies);
-	struct lnet_process_id peerid4;
+	struct lnet_processid peerid;
 	u64 incarnation;
 	struct ksock_conn *conn;
 	struct ksock_conn *conn2;
@@ -928,7 +928,7 @@ struct ksock_peer_ni *
 
 		/* Active connection sends HELLO eagerly */
 		hello->kshm_nips =  0;
-		peerid4 = lnet_pid_to_pid4(&peer_ni->ksnp_id);
+		peerid = peer_ni->ksnp_id;
 
 		write_lock_bh(global_lock);
 		conn->ksnc_proto = peer_ni->ksnp_proto;
@@ -944,34 +944,31 @@ struct ksock_peer_ni *
 #endif
 		}
 
-		rc = ksocknal_send_hello(ni, conn, peerid4.nid, hello);
+		rc = ksocknal_send_hello(ni, conn, &peerid.nid, hello);
 		if (rc)
 			goto failed_1;
 	} else {
-		peerid4.nid = LNET_NID_ANY;
-		peerid4.pid = LNET_PID_ANY;
+		peerid.nid = LNET_ANY_NID;
+		peerid.pid = LNET_PID_ANY;
 
 		/* Passive, get protocol from peer_ni */
 		conn->ksnc_proto = NULL;
 	}
 
-	rc = ksocknal_recv_hello(ni, conn, hello, &peerid4, &incarnation);
+	rc = ksocknal_recv_hello(ni, conn, hello, &peerid, &incarnation);
 	if (rc < 0)
 		goto failed_1;
 
 	LASSERT(!rc || active);
 	LASSERT(conn->ksnc_proto);
-	LASSERT(peerid4.nid != LNET_NID_ANY);
+	LASSERT(!LNET_NID_IS_ANY(&peerid.nid));
 
-	cpt = lnet_cpt_of_nid(peerid4.nid, ni);
+	cpt = lnet_nid2cpt(&peerid.nid, ni);
 
 	if (active) {
 		ksocknal_peer_addref(peer_ni);
 		write_lock_bh(global_lock);
 	} else {
-		struct lnet_processid peerid;
-
-		lnet_pid4_to_pid(peerid4, &peerid);
 		peer_ni = ksocknal_create_peer(ni, &peerid);
 		if (IS_ERR(peer_ni)) {
 			rc = PTR_ERR(peer_ni);
@@ -1004,7 +1001,7 @@ struct ksock_peer_ni *
 		 * Am I already connecting to this guy?  Resolve in
 		 * favour of higher NID...
 		 */
-		if (peerid4.nid < lnet_nid_to_nid4(&ni->ni_nid) &&
+		if (memcmp(&peerid.nid, &ni->ni_nid, sizeof(peerid.nid)) < 0 &&
 		    ksocknal_connecting(peer_ni->ksnp_conn_cb,
 					((struct sockaddr *)&conn->ksnc_peeraddr))) {
 			rc = EALREADY;
@@ -1164,9 +1161,7 @@ struct ksock_peer_ni *
 	}
 
 	write_unlock_bh(global_lock);
-
-	/*
-	 * We've now got a new connection.  Any errors from here on are just
+	/* We've now got a new connection.  Any errors from here on are just
 	 * like "normal" comms errors and we close the connection normally.
 	 * NB (a) we still have to send the reply HELLO for passive
 	 *	connections,
@@ -1175,13 +1170,13 @@ struct ksock_peer_ni *
 	 */
 	CDEBUG(D_NET,
 	       "New conn %s p %d.x %pIS -> %pISp incarnation:%lld sched[%d]\n",
-	       libcfs_id2str(peerid4), conn->ksnc_proto->pro_version,
+	       libcfs_idstr(&peerid), conn->ksnc_proto->pro_version,
 	       &conn->ksnc_myaddr, &conn->ksnc_peeraddr,
 	       incarnation, cpt);
 
 	if (!active) {
 		hello->kshm_nips = 0;
-		rc = ksocknal_send_hello(ni, conn, peerid4.nid, hello);
+		rc = ksocknal_send_hello(ni, conn, &peerid.nid, hello);
 	}
 
 	kvfree(hello);
@@ -1237,10 +1232,10 @@ struct ksock_peer_ni *
 	if (warn) {
 		if (rc < 0)
 			CERROR("Not creating conn %s type %d: %s\n",
-			       libcfs_id2str(peerid4), conn->ksnc_type, warn);
+			       libcfs_idstr(&peerid), conn->ksnc_type, warn);
 		else
 			CDEBUG(D_NET, "Not creating conn %s type %d: %s\n",
-			       libcfs_id2str(peerid4), conn->ksnc_type, warn);
+			       libcfs_idstr(&peerid), conn->ksnc_type, warn);
 	}
 
 	if (!active) {
@@ -1251,7 +1246,7 @@ struct ksock_peer_ni *
 			 */
 			conn->ksnc_type = SOCKLND_CONN_NONE;
 			hello->kshm_nips = 0;
-			ksocknal_send_hello(ni, conn, peerid4.nid, hello);
+			ksocknal_send_hello(ni, conn, &peerid.nid, hello);
 		}
 
 		write_lock_bh(global_lock);
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index bd38ee2..094f635 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -674,10 +674,11 @@ struct ksock_conn_cb *
 int ksocknal_connd(void *arg);
 int ksocknal_reaper(void *arg);
 int ksocknal_send_hello(struct lnet_ni *ni, struct ksock_conn *conn,
-			lnet_nid_t peer_nid, struct ksock_hello_msg *hello);
+			struct lnet_nid *peer_nid,
+			struct ksock_hello_msg *hello);
 int ksocknal_recv_hello(struct lnet_ni *ni, struct ksock_conn *conn,
 			struct ksock_hello_msg *hello,
-			struct lnet_process_id *id,
+			struct lnet_processid *id,
 			u64 *incarnation);
 void ksocknal_read_callback(struct ksock_conn *conn);
 void ksocknal_write_callback(struct ksock_conn *conn);
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 822de50..c93f43f 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1580,7 +1580,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 
 int
 ksocknal_send_hello(struct lnet_ni *ni, struct ksock_conn *conn,
-		    lnet_nid_t peer_nid, struct ksock_hello_msg *hello)
+		    struct lnet_nid *peer_nid, struct ksock_hello_msg *hello)
 {
 	/* CAVEAT EMPTOR: this byte flips 'ipaddrs' */
 	struct ksock_net *net = (struct ksock_net *)ni->ni_data;
@@ -1590,8 +1590,8 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	/* rely on caller to hold a ref on socket so it wouldn't disappear */
 	LASSERT(conn->ksnc_proto);
 
-	hello->kshm_src_nid = lnet_nid_to_nid4(&ni->ni_nid);
-	hello->kshm_dst_nid = peer_nid;
+	hello->kshm_src_nid = ni->ni_nid;
+	hello->kshm_dst_nid = *peer_nid;
 	hello->kshm_src_pid = the_lnet.ln_pid;
 
 	hello->kshm_src_incarnation = net->ksnn_incarnation;
@@ -1619,7 +1619,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 int
 ksocknal_recv_hello(struct lnet_ni *ni, struct ksock_conn *conn,
 		    struct ksock_hello_msg *hello,
-		    struct lnet_process_id *peerid,
+		    struct lnet_processid *peerid,
 		    u64 *incarnation)
 {
 	/* Return < 0	fatal error
@@ -1633,7 +1633,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	int proto_match;
 	int rc;
 	const struct ksock_proto *proto;
-	struct lnet_process_id recv_id;
+	struct lnet_processid recv_id;
 
 	/* socket type set on active connections - not set on passive */
 	LASSERT(!active == !(conn->ksnc_type != SOCKLND_CONN_NONE));
@@ -1683,8 +1683,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 				conn->ksnc_proto = &ksocknal_protocol_v1x;
 #endif
 			hello->kshm_nips = 0;
-			ksocknal_send_hello(ni, conn,
-					    lnet_nid_to_nid4(&ni->ni_nid),
+			ksocknal_send_hello(ni, conn, &ni->ni_nid,
 					    hello);
 		}
 
@@ -1709,7 +1708,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 
 	*incarnation = hello->kshm_src_incarnation;
 
-	if (hello->kshm_src_nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(&hello->kshm_src_nid)) {
 		CERROR("Expecting a HELLO hdr with a NID, but got LNET_NID_ANY from %pIS\n",
 		       &conn->ksnc_peeraddr);
 		return -EPROTO;
@@ -1722,9 +1721,11 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 		recv_id.pid = rpc_get_port((struct sockaddr *)
 					   &conn->ksnc_peeraddr) |
 					   LNET_PID_USERFLAG;
-		recv_id.nid = LNET_MKNID(LNET_NID_NET(&ni->ni_nid),
-					 ntohl(((struct sockaddr_in *)
-					 &conn->ksnc_peeraddr)->sin_addr.s_addr));
+		memset(&recv_id.nid, 0, sizeof(recv_id.nid));
+		recv_id.nid.nid_type = ni->ni_nid.nid_type;
+		recv_id.nid.nid_num = ni->ni_nid.nid_num;
+		recv_id.nid.nid_addr[0] =
+			((struct sockaddr_in *)&conn->ksnc_peeraddr)->sin_addr.s_addr;
 	} else {
 		recv_id.nid = hello->kshm_src_nid;
 		recv_id.pid = hello->kshm_src_pid;
@@ -1737,7 +1738,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 		conn->ksnc_type = ksocknal_invert_type(hello->kshm_ctype);
 		if (conn->ksnc_type == SOCKLND_CONN_NONE) {
 			CERROR("Unexpected type %d from %s ip %pIS\n",
-			       hello->kshm_ctype, libcfs_id2str(*peerid),
+			       hello->kshm_ctype, libcfs_idstr(peerid),
 			       &conn->ksnc_peeraddr);
 			return -EPROTO;
 		}
@@ -1746,12 +1747,12 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	}
 
 	if (peerid->pid != recv_id.pid ||
-	    peerid->nid != recv_id.nid) {
+	    !nid_same(&peerid->nid,  &recv_id.nid)) {
 		LCONSOLE_ERROR_MSG(0x130,
 				   "Connected successfully to %s on host %pIS, but they claimed they were %s; please check your Lustre configuration.\n",
-				   libcfs_id2str(*peerid),
+				   libcfs_idstr(peerid),
 				   &conn->ksnc_peeraddr,
-				   libcfs_id2str(recv_id));
+				   libcfs_idstr(&recv_id));
 		return -EPROTO;
 	}
 
@@ -1762,7 +1763,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 
 	if (ksocknal_invert_type(hello->kshm_ctype) != conn->ksnc_type) {
 		CERROR("Mismatched types: me %d, %s ip %pIS %d\n",
-		       conn->ksnc_type, libcfs_id2str(*peerid),
+		       conn->ksnc_type, libcfs_idstr(peerid),
 		       &conn->ksnc_peeraddr, hello->kshm_ctype);
 		return -EPROTO;
 	}
diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c
index 14b1394..783c62f 100644
--- a/net/lnet/klnds/socklnd/socklnd_proto.c
+++ b/net/lnet/klnds/socklnd/socklnd_proto.c
@@ -493,7 +493,7 @@
 			hmv->magic = LNET_PROTO_MAGIC;
 	}
 
-	hdr->src_nid = cpu_to_le64(hello->kshm_src_nid);
+	hdr->src_nid = cpu_to_le64(lnet_nid_to_nid4(&hello->kshm_src_nid));
 	hdr->src_pid = cpu_to_le32(hello->kshm_src_pid);
 	hdr->type = cpu_to_le32(LNET_MSG_HELLO);
 	hdr->payload_length = cpu_to_le32(hello->kshm_nips * sizeof(u32));
@@ -531,19 +531,49 @@
 ksocknal_send_hello_v2(struct ksock_conn *conn, struct ksock_hello_msg *hello)
 {
 	struct socket *sock = conn->ksnc_sock;
+	struct ksock_hello_msg_nid4 *hello4;
 	int rc;
 
+	hello4 = kzalloc(sizeof(*hello4), GFP_NOFS);
+	if (!hello4) {
+		CERROR("Can't allocate struct ksock_hello_msg_nid4\n");
+		return -ENOMEM;
+	}
+
 	hello->kshm_magic = LNET_PROTO_MAGIC;
 	hello->kshm_version = conn->ksnc_proto->pro_version;
 
+	hello4->kshm_magic = LNET_PROTO_MAGIC;
+	hello4->kshm_version = conn->ksnc_proto->pro_version;
+	hello4->kshm_src_nid = lnet_nid_to_nid4(&hello->kshm_src_nid);
+	hello4->kshm_dst_nid = lnet_nid_to_nid4(&hello->kshm_dst_nid);
+	hello4->kshm_src_pid = hello->kshm_src_pid;
+	hello4->kshm_dst_pid = hello->kshm_dst_pid;
+	hello4->kshm_src_incarnation = hello->kshm_src_incarnation;
+	hello4->kshm_dst_incarnation = hello->kshm_dst_incarnation;
+	hello4->kshm_ctype = hello->kshm_ctype;
+	hello4->kshm_nips = hello->kshm_nips;
+
 	if (the_lnet.ln_testprotocompat) {
 		/* single-shot proto check */
 		if (test_and_clear_bit(0, &the_lnet.ln_testprotocompat))
 			hello->kshm_version++;   /* just different! */
 	}
 
-	rc = lnet_sock_write(sock, hello, offsetof(struct ksock_hello_msg, kshm_ips),
+	hello4->kshm_magic = LNET_PROTO_MAGIC;
+	hello4->kshm_version = hello->kshm_version;
+	hello4->kshm_src_nid = lnet_nid_to_nid4(&hello->kshm_src_nid);
+	hello4->kshm_dst_nid = lnet_nid_to_nid4(&hello->kshm_dst_nid);
+	hello4->kshm_src_pid = hello->kshm_src_pid;
+	hello4->kshm_dst_pid = hello->kshm_dst_pid;
+	hello4->kshm_src_incarnation = hello->kshm_src_incarnation;
+	hello4->kshm_dst_incarnation = hello->kshm_dst_incarnation;
+	hello4->kshm_ctype = hello->kshm_ctype;
+	hello4->kshm_nips = hello->kshm_nips;
+
+	rc = lnet_sock_write(sock, hello4, sizeof(*hello4),
 			     lnet_acceptor_timeout());
+	kfree(hello4);
 	if (rc) {
 		CNETERR("Error %d sending HELLO hdr to %pISp\n",
 			rc, &conn->ksnc_peeraddr);
@@ -600,7 +630,7 @@
 		goto out;
 	}
 
-	hello->kshm_src_nid = le64_to_cpu(hdr->src_nid);
+	lnet_nid4_to_nid(le64_to_cpu(hdr->src_nid), &hello->kshm_src_nid);
 	hello->kshm_src_pid = le32_to_cpu(hdr->src_pid);
 	hello->kshm_src_incarnation = le64_to_cpu(hdr->msg.hello.incarnation);
 	hello->kshm_ctype = le32_to_cpu(hdr->msg.hello.type);
@@ -646,6 +676,7 @@
 		       int timeout)
 {
 	struct socket *sock = conn->ksnc_sock;
+	struct ksock_hello_msg_nid4 *hello4 = (void *)hello;
 	int rc;
 	int i;
 
@@ -654,9 +685,9 @@
 	else
 		conn->ksnc_flip = 1;
 
-	rc = lnet_sock_read(sock, &hello->kshm_src_nid,
-			    offsetof(struct ksock_hello_msg, kshm_ips) -
-				     offsetof(struct ksock_hello_msg, kshm_src_nid),
+	rc = lnet_sock_read(sock, &hello4->kshm_src_nid,
+			    offsetof(struct ksock_hello_msg_nid4, kshm_ips) -
+			    offsetof(struct ksock_hello_msg_nid4, kshm_src_nid),
 			    timeout);
 	if (rc) {
 		CERROR("Error %d reading HELLO from %pIS\n",
@@ -666,14 +697,25 @@
 	}
 
 	if (conn->ksnc_flip) {
-		__swab32s(&hello->kshm_src_pid);
-		__swab64s(&hello->kshm_src_nid);
-		__swab32s(&hello->kshm_dst_pid);
-		__swab64s(&hello->kshm_dst_nid);
-		__swab64s(&hello->kshm_src_incarnation);
-		__swab64s(&hello->kshm_dst_incarnation);
-		__swab32s(&hello->kshm_ctype);
-		__swab32s(&hello->kshm_nips);
+		/* These must be copied in reverse order to avoid corruption. */
+		hello->kshm_nips = __swab32(hello4->kshm_nips);
+		hello->kshm_ctype = __swab32(hello4->kshm_ctype);
+		hello->kshm_dst_incarnation = __swab64(hello4->kshm_dst_incarnation);
+		hello->kshm_src_incarnation = __swab64(hello4->kshm_src_incarnation);
+		hello->kshm_dst_pid = __swab32(hello4->kshm_dst_pid);
+		hello->kshm_src_pid = __swab32(hello4->kshm_src_pid);
+		lnet_nid4_to_nid(hello4->kshm_dst_nid, &hello->kshm_dst_nid);
+		lnet_nid4_to_nid(hello4->kshm_src_nid, &hello->kshm_src_nid);
+	} else {
+		/* These must be copied in reverse order to avoid corruption. */
+		hello->kshm_nips = hello4->kshm_nips;
+		hello->kshm_ctype = hello4->kshm_ctype;
+		hello->kshm_dst_incarnation = hello4->kshm_dst_incarnation;
+		hello->kshm_src_incarnation = hello4->kshm_src_incarnation;
+		hello->kshm_dst_pid = hello4->kshm_dst_pid;
+		hello->kshm_src_pid = hello4->kshm_src_pid;
+		lnet_nid4_to_nid(hello4->kshm_dst_nid, &hello->kshm_dst_nid);
+		lnet_nid4_to_nid(hello4->kshm_src_nid, &hello->kshm_src_nid);
 	}
 
 	if (hello->kshm_nips > LNET_INTERFACES_NUM) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 23/50] lnet: socklnd: add hello message version 4
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (21 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 22/50] lnet: socklnd: Change ksock_hello_msg to struct lnet_nid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 24/50] lnet: Convert ping to support 16-bytes address James Simmons
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

KSOCK_PROTO_V4 uses a 'hello' message that contains
lnet_hdr_nid16 with 16 byte addresses

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 7b31ef0bbac99bfd0 ("LU-10391 socklnd: add hello message version 4")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43611
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h          |  26 ++++++
 include/linux/lnet/socklnd.h           |   5 ++
 include/uapi/linux/lnet/lnet-types.h   |   4 +
 net/lnet/klnds/socklnd/socklnd.h       |   1 +
 net/lnet/klnds/socklnd/socklnd_cb.c    |   9 +-
 net/lnet/klnds/socklnd/socklnd_proto.c | 154 ++++++++++++++++++++++++++++++++-
 6 files changed, 194 insertions(+), 5 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index ce18897..0155111 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -507,6 +507,32 @@ static inline void lnet_hdr_to_nid4(const struct lnet_hdr *hdr,
 	hdr_nid4->msg = hdr->msg;
 }
 
+static inline void lnet_hdr_from_nid16(struct lnet_hdr *hdr,
+				       const struct lnet_hdr_nid16 *vhdr)
+{
+	const struct lnet_hdr *hdr16 = (void *)vhdr;
+
+	hdr->dest_nid = hdr16->dest_nid;
+	hdr->src_nid = hdr16->src_nid;
+	hdr->dest_pid = le32_to_cpu(hdr16->dest_pid);
+	hdr->src_pid = le32_to_cpu(hdr16->src_pid);
+	hdr->type = le32_to_cpu(hdr16->type);
+	hdr->payload_length = le32_to_cpu(hdr16->payload_length);
+}
+
+static inline void lnet_hdr_to_nid16(const struct lnet_hdr *hdr,
+				     struct lnet_hdr_nid16 *vhdr)
+{
+	struct lnet_hdr *hdr16 = (void *)vhdr;
+
+	hdr16->dest_nid = hdr->dest_nid;
+	hdr16->src_nid = hdr->src_nid;
+	hdr16->dest_pid = cpu_to_le32(hdr->dest_pid);
+	hdr16->src_pid = cpu_to_le32(hdr->src_pid);
+	hdr16->type = cpu_to_le32(hdr->type);
+	hdr16->payload_length = cpu_to_le32(hdr->payload_length);
+}
+
 extern struct lnet_lnd the_lolnd;
 extern int avoid_asym_router_failure;
 
diff --git a/include/linux/lnet/socklnd.h b/include/linux/lnet/socklnd.h
index ddfcf76..092ba6e 100644
--- a/include/linux/lnet/socklnd.h
+++ b/include/linux/lnet/socklnd.h
@@ -84,6 +84,10 @@ struct ksock_msg {
 		/* - nothing */
 		/* case ksm_kh.ksh_type == KSOCK_MSG_LNET */
 		struct lnet_hdr_nid4 lnetmsg_nid4;
+		/* case ksm_kh.ksh_type == KSOCK_MSG_LNET &&
+		 *      kshm_version >= KSOCK_PROTO_V4
+		 */
+		struct lnet_hdr_nid16 lnetmsg_nid16;
 	} __packed ksm_u;
 } __packed;
 #define ksm_type ksm_kh.ksh_type
@@ -95,5 +99,6 @@ struct ksock_msg {
  */
 #define KSOCK_PROTO_V2	2
 #define KSOCK_PROTO_V3	3
+#define KSOCK_PROTO_V4	4
 
 #endif
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 4818271..eacc401 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -308,6 +308,10 @@ enum lnet_ins_pos {
  * @{
  */
 
+struct lnet_hdr_nid16 {
+	char	_bytes[sizeof(struct lnet_hdr)];
+} __attribute__((packed));
+
 /**
  * Event queue handler function type.
  *
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 094f635..13abe20 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -500,6 +500,7 @@ struct ksock_proto {
 extern const struct ksock_proto ksocknal_protocol_v1x;
 extern const struct ksock_proto ksocknal_protocol_v2x;
 extern const struct ksock_proto ksocknal_protocol_v3x;
+extern const struct ksock_proto ksocknal_protocol_v4x;
 
 #define KSOCK_PROTO_V1_MAJOR	LNET_PROTO_TCP_VERSION_MAJOR
 #define KSOCK_PROTO_V1_MINOR	LNET_PROTO_TCP_VERSION_MINOR
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index c93f43f..af35c49 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1925,11 +1925,11 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 		if (!list_empty(&peer_ni->ksnp_conns)) {
 			conn = list_first_entry(&peer_ni->ksnp_conns,
 						struct ksock_conn, ksnc_list);
-			LASSERT(conn->ksnc_proto == &ksocknal_protocol_v3x);
+			LASSERT(conn->ksnc_proto == &ksocknal_protocol_v3x ||
+				conn->ksnc_proto == &ksocknal_protocol_v4x);
 		}
 
-		/*
-		 * take all the blocked packets while I've got the lock and
+		/* take all the blocked packets while I've got the lock and
 		 * complete below...
 		 */
 		list_splice_init(&peer_ni->ksnp_tx_queue, &zombies);
@@ -2297,7 +2297,8 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	if (list_empty(&peer_ni->ksnp_conns))
 		return 0;
 
-	if (peer_ni->ksnp_proto != &ksocknal_protocol_v3x)
+	if (peer_ni->ksnp_proto != &ksocknal_protocol_v3x &&
+	    peer_ni->ksnp_proto != &ksocknal_protocol_v4x)
 		return 0;
 
 	if (*ksocknal_tunables.ksnd_keepalive <= 0 ||
diff --git a/net/lnet/klnds/socklnd/socklnd_proto.c b/net/lnet/klnds/socklnd/socklnd_proto.c
index 783c62f..0a93d57 100644
--- a/net/lnet/klnds/socklnd/socklnd_proto.c
+++ b/net/lnet/klnds/socklnd/socklnd_proto.c
@@ -365,6 +365,51 @@
 	}
 }
 
+static int
+ksocknal_match_tx_v4(struct ksock_conn *conn, struct ksock_tx *tx, int nonblk)
+{
+	int nob;
+
+	if (!tx || !tx->tx_lnetmsg)
+		nob = sizeof(struct ksock_msg_hdr);
+	else
+		nob = sizeof(struct ksock_msg_hdr) +
+			sizeof(struct lnet_hdr_nid16) +
+			tx->tx_lnetmsg->msg_len;
+
+	switch (conn->ksnc_type) {
+	default:
+		CERROR("ksnc_type bad: %u\n", conn->ksnc_type);
+		LBUG();
+	case SOCKLND_CONN_ANY:
+		return SOCKNAL_MATCH_NO;
+
+	case SOCKLND_CONN_ACK:
+		if (nonblk)
+			return SOCKNAL_MATCH_YES;
+		else if (!tx || !tx->tx_lnetmsg)
+			return SOCKNAL_MATCH_MAY;
+		else
+			return SOCKNAL_MATCH_NO;
+
+	case SOCKLND_CONN_BULK_OUT:
+		if (nonblk)
+			return SOCKNAL_MATCH_NO;
+		else if (nob < *ksocknal_tunables.ksnd_min_bulk)
+			return SOCKNAL_MATCH_MAY;
+		else
+			return SOCKNAL_MATCH_YES;
+
+	case SOCKLND_CONN_CONTROL:
+		if (nonblk)
+			return SOCKNAL_MATCH_NO;
+		else if (nob >= *ksocknal_tunables.ksnd_min_bulk)
+			return SOCKNAL_MATCH_MAY;
+		else
+			return SOCKNAL_MATCH_YES;
+	}
+}
+
 /* (Sink) handle incoming ZC request from sender */
 static int
 ksocknal_handle_zcreq(struct ksock_conn *c, u64 cookie, int remote)
@@ -425,7 +470,8 @@
 	count = (cookie1 > cookie2) ? 2 : (cookie2 - cookie1 + 1);
 
 	if (cookie2 == SOCKNAL_KEEPALIVE_PING &&
-	    conn->ksnc_proto == &ksocknal_protocol_v3x) {
+	    (conn->ksnc_proto == &ksocknal_protocol_v3x ||
+	     conn->ksnc_proto == &ksocknal_protocol_v4x)) {
 		/* keepalive PING for V3.x, just ignore it */
 		return count == 1 ? 0 : -EPROTO;
 	}
@@ -596,6 +642,24 @@
 }
 
 static int
+ksocknal_send_hello_v4(struct ksock_conn *conn, struct ksock_hello_msg *hello)
+{
+	struct socket *sock = conn->ksnc_sock;
+	int rc;
+
+	hello->kshm_magic   = LNET_PROTO_MAGIC;
+	hello->kshm_version = conn->ksnc_proto->pro_version;
+
+	rc = lnet_sock_write(sock, hello, sizeof(*hello),
+			     lnet_acceptor_timeout());
+
+	if (rc != 0)
+		CNETERR("Error %d sending HELLO hdr to %pISp\n",
+			rc, &conn->ksnc_peeraddr);
+	return rc;
+}
+
+static int
 ksocknal_recv_hello_v1(struct ksock_conn *conn, struct ksock_hello_msg *hello,
 		       int timeout)
 {
@@ -750,6 +814,40 @@
 	return 0;
 }
 
+static int
+ksocknal_recv_hello_v4(struct ksock_conn *conn, struct ksock_hello_msg *hello,
+		       int timeout)
+{
+	struct socket *sock = conn->ksnc_sock;
+	int rc;
+
+	if (hello->kshm_magic == LNET_PROTO_MAGIC)
+		conn->ksnc_flip = 0;
+	else
+		conn->ksnc_flip = 1;
+
+	rc = lnet_sock_read(sock, &hello->kshm_src_nid,
+			    sizeof(*hello) -
+			    offsetof(struct ksock_hello_msg, kshm_src_nid),
+			    timeout);
+	if (rc) {
+		CERROR("Error %d reading HELLO from %pIS\n",
+		       rc, &conn->ksnc_peeraddr);
+		LASSERT(rc < 0 && rc != -EALREADY);
+		return rc;
+	}
+
+	if (conn->ksnc_flip) {
+		__swab32s(&hello->kshm_src_pid);
+		__swab32s(&hello->kshm_dst_pid);
+		__swab64s(&hello->kshm_src_incarnation);
+		__swab64s(&hello->kshm_dst_incarnation);
+		__swab32s(&hello->kshm_ctype);
+	}
+
+	return 0;
+}
+
 static void
 ksocknal_pack_msg_v1(struct ksock_tx *tx)
 {
@@ -802,6 +900,41 @@
 }
 
 static void
+ksocknal_pack_msg_v4(struct ksock_tx *tx)
+{
+	int hdr_size;
+
+	tx->tx_hdr.iov_base = (void *)&tx->tx_msg;
+
+	switch (tx->tx_msg.ksm_type) {
+	case KSOCK_MSG_LNET:
+		LASSERT(tx->tx_lnetmsg);
+		hdr_size = (sizeof(struct ksock_msg_hdr) +
+				sizeof(struct lnet_hdr_nid16));
+
+		lnet_hdr_to_nid16(&tx->tx_lnetmsg->msg_hdr,
+				  &tx->tx_msg.ksm_u.lnetmsg_nid16);
+		tx->tx_hdr.iov_len = hdr_size;
+		tx->tx_resid = hdr_size + tx->tx_lnetmsg->msg_len;
+		tx->tx_nob = hdr_size + tx->tx_lnetmsg->msg_len;
+		break;
+	case KSOCK_MSG_NOOP:
+		LASSERT(!tx->tx_lnetmsg);
+		hdr_size = sizeof(struct ksock_msg_hdr);
+
+		tx->tx_hdr.iov_len = hdr_size;
+		tx->tx_resid = hdr_size;
+		tx->tx_nob = hdr_size;
+		break;
+	default:
+		LASSERT(0);
+	}
+	/* Don't checksum before start sending, because packet can be
+	 * piggybacked with ACK
+	 */
+}
+
+static void
 ksocknal_unpack_msg_v1(struct ksock_msg *msg, struct lnet_hdr *hdr)
 {
 	msg->ksm_csum = 0;
@@ -817,6 +950,12 @@
 	lnet_hdr_from_nid4(hdr, &msg->ksm_u.lnetmsg_nid4);
 }
 
+static void
+ksocknal_unpack_msg_v4(struct ksock_msg *msg, struct lnet_hdr *hdr)
+{
+	lnet_hdr_from_nid16(hdr, &msg->ksm_u.lnetmsg_nid16);
+}
+
 const struct ksock_proto ksocknal_protocol_v1x = {
 	.pro_version		= KSOCK_PROTO_V1,
 	.pro_send_hello		= ksocknal_send_hello_v1,
@@ -855,3 +994,16 @@
 	.pro_handle_zcack	= ksocknal_handle_zcack,
 	.pro_match_tx		= ksocknal_match_tx_v3
 };
+
+const struct ksock_proto ksocknal_protocol_v4x = {
+	.pro_version		= KSOCK_PROTO_V4,
+	.pro_send_hello		= ksocknal_send_hello_v4,
+	.pro_recv_hello		= ksocknal_recv_hello_v4,
+	.pro_pack		= ksocknal_pack_msg_v4,
+	.pro_unpack		= ksocknal_unpack_msg_v4,
+	.pro_queue_tx_msg	= ksocknal_queue_tx_msg_v2,
+	.pro_queue_tx_zcack	= ksocknal_queue_tx_zcack_v3,
+	.pro_handle_zcreq	= ksocknal_handle_zcreq,
+	.pro_handle_zcack	= ksocknal_handle_zcack,
+	.pro_match_tx		= ksocknal_match_tx_v4,
+};
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 24/50] lnet: Convert ping to support 16-bytes address
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (22 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 23/50] lnet: socklnd: add hello message version 4 James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 25/50] lnet: convert nids in lnet_parse to lnet_nid James Simmons
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Now that ksocknal can send hello messages with 16-byte address, we can
change lnet_send_ping() to ping hosts with large-address nids.

Note that this doesn't change the addresses in the ping message sent,
only the sending and receiving of the message.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 3e37ac8bb7e068a30 ("LU-10391 lnet: Convert ping to support 16-bytes address")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43612
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  5 +++--
 net/lnet/lnet/lib-move.c      | 42 ++++++++++++++++++++----------------------
 net/lnet/lnet/peer.c          |  3 +--
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 0155111..297e5ef 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -643,8 +643,9 @@ void lnet_prep_send(struct lnet_msg *msg, int type,
 		    unsigned int len);
 int lnet_send(struct lnet_nid *nid, struct lnet_msg *msg,
 	      struct lnet_nid *rtr_nid);
-int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis,
-		   void *user_ptr, lnet_handler_t handler, bool recovery);
+int lnet_send_ping(struct lnet_nid *dest_nid, struct lnet_handle_md *mdh,
+		   int nnis, void *user_ptr, lnet_handler_t handler,
+		   bool recovery);
 void lnet_return_tx_credits_locked(struct lnet_msg *msg);
 void lnet_return_rx_credits_locked(struct lnet_msg *msg);
 void lnet_schedule_blocked_locked(struct lnet_rtrbufpool *rbp);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index aa230d7..496c895 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -2891,8 +2891,8 @@ enum lnet_mt_event_type {
 };
 
 struct lnet_mt_event_info {
-	enum lnet_mt_event_type mt_type;
-	lnet_nid_t mt_nid;
+	enum lnet_mt_event_type	mt_type;
+	struct lnet_nid		mt_nid;
 };
 
 /* called with res_lock held */
@@ -3176,7 +3176,7 @@ struct lnet_mt_event_info {
 	struct lnet_handle_md mdh;
 	struct lnet_ni *tmp;
 	struct lnet_ni *ni;
-	lnet_nid_t nid;
+	struct lnet_nid nid;
 	int healthv;
 	int rc;
 	time64_t now;
@@ -3258,8 +3258,7 @@ struct lnet_mt_event_info {
 			 * We'll unlink the mdh in this case below.
 			 */
 			LNetInvalidateMDHandle(&ni->ni_ping_mdh);
-			/* FIXME need to handle large-addr nid */
-			nid = lnet_nid_to_nid4(&ni->ni_nid);
+			nid = ni->ni_nid;
 
 			/* remove the NI from the local queue and drop the
 			 * reference count to it while we're recovering
@@ -3284,12 +3283,12 @@ struct lnet_mt_event_info {
 
 			ev_info->mt_type = MT_TYPE_LOCAL_NI;
 			ev_info->mt_nid = nid;
-			rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN,
+			rc = lnet_send_ping(&nid, &mdh, LNET_INTERFACES_MIN,
 					    ev_info, the_lnet.ln_mt_handler,
 					    true);
 			/* lookup the nid again */
 			lnet_net_lock(0);
-			ni = lnet_nid2ni_locked(nid, 0);
+			ni = lnet_nid_to_ni_locked(&nid, 0);
 			if (!ni) {
 				/* the NI has been deleted when we dropped
 				 * the ref count
@@ -3430,7 +3429,7 @@ struct lnet_mt_event_info {
 	struct lnet_handle_md mdh;
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer_ni *tmp;
-	lnet_nid_t nid;
+	struct lnet_nid nid;
 	time64_t now;
 	int healthv;
 	int rc;
@@ -3504,9 +3503,8 @@ struct lnet_mt_event_info {
 
 			/* look at the comments in lnet_recover_local_nis() */
 			mdh = lpni->lpni_recovery_ping_mdh;
+			nid = lpni->lpni_nid;
 			LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh);
-			/* FIXME handle large-addr nid */
-			nid = lnet_nid_to_nid4(&lpni->lpni_nid);
 			lnet_net_lock(0);
 			list_del_init(&lpni->lpni_recovery);
 			lnet_peer_ni_decref_locked(lpni);
@@ -3514,14 +3512,14 @@ struct lnet_mt_event_info {
 
 			ev_info->mt_type = MT_TYPE_PEER_NI;
 			ev_info->mt_nid = nid;
-			rc = lnet_send_ping(nid, &mdh, LNET_INTERFACES_MIN,
+			rc = lnet_send_ping(&nid, &mdh, LNET_INTERFACES_MIN,
 					    ev_info, the_lnet.ln_mt_handler,
 					    true);
 			lnet_net_lock(0);
 			/* lnet_find_peer_ni_locked() grabs a refcount for
 			 * us. No need to take it explicitly.
 			 */
-			lpni = lnet_find_peer_ni_locked(nid);
+			lpni = lnet_peer_ni_find_locked(&nid);
 			if (!lpni) {
 				lnet_net_unlock(0);
 				LNetMDUnlink(mdh);
@@ -3622,7 +3620,7 @@ struct lnet_mt_event_info {
  * Returns < 0 if LNetGet fails
  */
 int
-lnet_send_ping(lnet_nid_t dest_nid,
+lnet_send_ping(struct lnet_nid *dest_nid,
 	       struct lnet_handle_md *mdh, int nnis,
 	       void *user_data, lnet_handler_t handler, bool recovery)
 {
@@ -3631,7 +3629,7 @@ struct lnet_mt_event_info {
 	struct lnet_ping_buffer *pbuf;
 	int rc;
 
-	if (dest_nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(dest_nid)) {
 		rc = -EHOSTUNREACH;
 		goto fail_error;
 	}
@@ -3659,7 +3657,7 @@ struct lnet_mt_event_info {
 		goto fail_error;
 	}
 	id.pid = LNET_PID_LUSTRE;
-	id.nid = dest_nid;
+	id.nid = lnet_nid_to_nid4(dest_nid);
 
 	rc = LNetGet(LNET_NID_ANY, *mdh, id,
 		     LNET_RESERVED_PORTAL,
@@ -3680,13 +3678,13 @@ struct lnet_mt_event_info {
 lnet_handle_recovery_reply(struct lnet_mt_event_info *ev_info,
 			   int status, bool send, bool unlink_event)
 {
-	lnet_nid_t nid = ev_info->mt_nid;
+	struct lnet_nid *nid = &ev_info->mt_nid;
 
 	if (ev_info->mt_type == MT_TYPE_LOCAL_NI) {
 		struct lnet_ni *ni;
 
 		lnet_net_lock(0);
-		ni = lnet_nid2ni_locked(nid, 0);
+		ni = lnet_nid_to_ni_locked(nid, 0);
 		if (!ni) {
 			lnet_net_unlock(0);
 			return;
@@ -3701,7 +3699,7 @@ struct lnet_mt_event_info {
 
 		if (status != 0) {
 			CERROR("local NI (%s) recovery failed with %d\n",
-			       libcfs_nid2str(nid), status);
+			       libcfs_nidstr(nid), status);
 			return;
 		}
 		/* need to increment healthv for the ni here, because in
@@ -3718,7 +3716,7 @@ struct lnet_mt_event_info {
 		int cpt;
 
 		cpt = lnet_net_lock_current();
-		lpni = lnet_find_peer_ni_locked(nid);
+		lpni = lnet_peer_ni_find_locked(nid);
 		if (!lpni) {
 			lnet_net_unlock(cpt);
 			return;
@@ -3733,7 +3731,7 @@ struct lnet_mt_event_info {
 
 		if (status != 0)
 			CERROR("peer NI (%s) recovery failed with %d\n",
-			       libcfs_nid2str(nid), status);
+			       libcfs_nidstr(nid), status);
 	}
 }
 
@@ -3754,7 +3752,7 @@ struct lnet_mt_event_info {
 	switch (event->type) {
 	case LNET_EVENT_UNLINK:
 		CDEBUG(D_NET, "%s recovery ping unlinked\n",
-		       libcfs_nid2str(ev_info->mt_nid));
+		       libcfs_nidstr(&ev_info->mt_nid));
 		/* fall-through */
 	case LNET_EVENT_REPLY:
 		lnet_handle_recovery_reply(ev_info, event->status, false,
@@ -3762,7 +3760,7 @@ struct lnet_mt_event_info {
 		break;
 	case LNET_EVENT_SEND:
 		CDEBUG(D_NET, "%s recovery message sent %s:%d\n",
-		       libcfs_nid2str(ev_info->mt_nid),
+		       libcfs_nidstr(&ev_info->mt_nid),
 		       (event->status) ? "unsuccessfully" :
 		       "successfully", event->status);
 		lnet_handle_recovery_reply(ev_info, event->status, true, false);
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index d0b7bc8..494b7ef 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3471,8 +3471,7 @@ static int lnet_peer_send_ping(struct lnet_peer *lp)
 
 	nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN);
 
-	rc = lnet_send_ping(lnet_nid_to_nid4(&lp->lp_primary_nid),
-			    &lp->lp_ping_mdh, nnis, lp,
+	rc = lnet_send_ping(&lp->lp_primary_nid, &lp->lp_ping_mdh, nnis, lp,
 			    the_lnet.ln_dc_handler, false);
 	/* if LNetMDBind in lnet_send_ping fails we need to decrement the
 	 * refcount on the peer, otherwise LNetMDUnlink will be called
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 25/50] lnet: convert nids in lnet_parse to lnet_nid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (23 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 24/50] lnet: Convert ping to support 16-bytes address James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 26/50] lnet: change src_nid arg to lnet_parse() to 16byte James Simmons
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

src_nid and dest_nid in lnet_parse() are changed to
struct lnet_nid, and this change propagates out to
affect a few support function.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: ac439ce87b0acbff3 ("LU-10391 lnet: convert nids in lnet_parse to lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43613
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  4 +--
 net/lnet/lnet/api-ni.c        | 13 -------
 net/lnet/lnet/lib-move.c      | 80 ++++++++++++++++++++-----------------------
 net/lnet/lnet/peer.c          |  7 ++--
 4 files changed, 43 insertions(+), 61 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 297e5ef..fe2fd83 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -626,7 +626,6 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src,
 void lnet_net_clr_pref_rtrs(struct lnet_net *net);
 int lnet_net_add_pref_rtr(struct lnet_net *net, struct lnet_nid *gw_nid);
 
-int lnet_islocalnid4(lnet_nid_t nid);
 int lnet_islocalnid(struct lnet_nid *nid);
 int lnet_islocalnet(u32 net);
 int lnet_islocalnet_locked(u32 net);
@@ -912,7 +911,8 @@ struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp,
 struct lnet_peer *lnet_find_peer4(lnet_nid_t nid);
 struct lnet_peer *lnet_find_peer(struct lnet_nid *nid);
 void lnet_peer_net_added(struct lnet_net *net);
-void lnet_peer_primary_nid_locked(lnet_nid_t nid, struct lnet_nid *result);
+void lnet_peer_primary_nid_locked(struct lnet_nid *nid,
+				  struct lnet_nid *result);
 int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block);
 void lnet_peer_queue_message(struct lnet_peer *lp, struct lnet_msg *msg);
 int lnet_peer_discovery_start(void);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index d7ada85..0389a89 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1636,19 +1636,6 @@ struct lnet_ni *
 EXPORT_SYMBOL(lnet_nid_to_ni_addref);
 
 int
-lnet_islocalnid4(lnet_nid_t nid)
-{
-	struct lnet_ni *ni;
-	int cpt;
-
-	cpt = lnet_net_lock_current();
-	ni = lnet_nid2ni_locked(nid, cpt);
-	lnet_net_unlock(cpt);
-
-	return !!ni;
-}
-
-int
 lnet_islocalnid(struct lnet_nid *nid)
 {
 	struct lnet_ni	*ni;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 496c895..051bea1 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -239,16 +239,14 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 }
 
 static int
-fail_peer(lnet_nid_t nid4, int outgoing)
+fail_peer(struct lnet_nid *nid, int outgoing)
 {
 	struct lnet_test_peer *tp;
 	struct list_head *el;
 	struct list_head *next;
-	struct lnet_nid nid;
 	LIST_HEAD(cull);
 	int fail = 0;
 
-	lnet_nid4_to_nid(nid4, &nid);
 	/* NB: use lnet_net_lock(0) to serialize operations on test peers */
 	lnet_net_lock(0);
 
@@ -269,7 +267,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		}
 
 		if (LNET_NID_IS_ANY(&tp->tp_nid) ||	/* fail every peer */
-		    nid_same(&nid, &tp->tp_nid)) {	/* fail this peer */
+		    nid_same(nid, &tp->tp_nid)) {	/* fail this peer */
 			fail = 1;
 
 			if (tp->tp_threshold != LNET_MD_THRESH_INF) {
@@ -4238,8 +4236,8 @@ void lnet_monitor_thr_stop(void)
 	struct lnet_msg *msg;
 	u32 payload_length;
 	lnet_pid_t dest_pid;
-	lnet_nid_t dest_nid;
-	lnet_nid_t src_nid;
+	struct lnet_nid dest_nid;
+	struct lnet_nid src_nid;
 	struct lnet_nid from_nid;
 	bool push = false;
 	int for_me;
@@ -4252,19 +4250,18 @@ void lnet_monitor_thr_stop(void)
 	lnet_nid4_to_nid(from_nid4, &from_nid);
 
 	type = hdr->type;
-	src_nid = lnet_nid_to_nid4(&hdr->src_nid);
-	dest_nid = lnet_nid_to_nid4(&hdr->dest_nid);
+	src_nid = hdr->src_nid;
+	dest_nid = hdr->dest_nid;
 	dest_pid = hdr->dest_pid;
 	payload_length = hdr->payload_length;
 
-	/* FIXME handle large-addr nids */
-	for_me = (lnet_nid_to_nid4(&ni->ni_nid) == dest_nid);
-	cpt = lnet_cpt_of_nid(from_nid4, ni);
+	for_me = nid_same(&ni->ni_nid, &dest_nid);
+	cpt = lnet_nid2cpt(&from_nid, ni);
 
 	CDEBUG(D_NET, "TRACE: %s(%s) <- %s : %s\n",
-	       libcfs_nid2str(dest_nid),
+	       libcfs_nidstr(&dest_nid),
 	       libcfs_nidstr(&ni->ni_nid),
-	       libcfs_nid2str(src_nid),
+	       libcfs_nidstr(&src_nid),
 	       lnet_msgtyp2str(type));
 
 	switch (type) {
@@ -4273,7 +4270,7 @@ void lnet_monitor_thr_stop(void)
 		if (payload_length > 0) {
 			CERROR("%s, src %s: bad %s payload %d (0 expected)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
+			       libcfs_nidstr(&src_nid),
 			       lnet_msgtyp2str(type), payload_length);
 			return -EPROTO;
 		}
@@ -4285,7 +4282,7 @@ void lnet_monitor_thr_stop(void)
 		   (u32)(for_me ? LNET_MAX_PAYLOAD : LNET_MTU)) {
 			CERROR("%s, src %s: bad %s payload %d (%d max expected)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
+			       libcfs_nidstr(&src_nid),
 			       lnet_msgtyp2str(type),
 			       payload_length,
 			       for_me ? LNET_MAX_PAYLOAD : LNET_MTU);
@@ -4296,7 +4293,7 @@ void lnet_monitor_thr_stop(void)
 	default:
 		CERROR("%s, src %s: Bad message type 0x%x\n",
 		       libcfs_nid2str(from_nid4),
-		       libcfs_nid2str(src_nid), type);
+		       libcfs_nidstr(&src_nid), type);
 		return -EPROTO;
 	}
 
@@ -4319,40 +4316,39 @@ void lnet_monitor_thr_stop(void)
 	 * or malicious so we chop them off at the knees :)
 	 */
 	if (!for_me) {
-		if (LNET_NIDNET(dest_nid) == LNET_NID_NET(&ni->ni_nid)) {
+		if (LNET_NID_NET(&dest_nid) == LNET_NID_NET(&ni->ni_nid)) {
 			/* should have gone direct */
 			CERROR("%s, src %s: Bad dest nid %s (should have been sent direct)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
-			       libcfs_nid2str(dest_nid));
+			       libcfs_nidstr(&src_nid),
+			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
 		}
 
-		if (lnet_islocalnid4(dest_nid)) {
-			/*
-			 * dest is another local NI; sender should have used
+		if (lnet_islocalnid(&dest_nid)) {
+			/* dest is another local NI; sender should have used
 			 * this node's NID on its own network
 			 */
 			CERROR("%s, src %s: Bad dest nid %s (it's my nid but on a different network)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
-			       libcfs_nid2str(dest_nid));
+			       libcfs_nidstr(&src_nid),
+			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
 		}
 
 		if (rdma_req && type == LNET_MSG_GET) {
 			CERROR("%s, src %s: Bad optimized GET for %s (final destination must be me)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
-			       libcfs_nid2str(dest_nid));
+			       libcfs_nidstr(&src_nid),
+			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
 		}
 
 		if (!the_lnet.ln_routing) {
 			CERROR("%s, src %s: Dropping message for %s (routing not enabled)\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid),
-			       libcfs_nid2str(dest_nid));
+			       libcfs_nidstr(&src_nid),
+			       libcfs_nidstr(&dest_nid));
 			goto drop;
 		}
 	}
@@ -4361,10 +4357,10 @@ void lnet_monitor_thr_stop(void)
 	 * Message looks OK; we're not going to return an error, so we MUST
 	 * call back lnd_recv() come what may...
 	 */
-	if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */
-	    fail_peer(src_nid, 0)) {		/* shall we now? */
+	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
+	    fail_peer(&src_nid, 0)) {			/* shall we now? */
 		CERROR("%s, src %s: Dropping %s to simulate failure\n",
-		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4373,15 +4369,15 @@ void lnet_monitor_thr_stop(void)
 	if (!list_empty(&the_lnet.ln_drop_rules) &&
 	    lnet_drop_rule_match(hdr, lnet_nid_to_nid4(&ni->ni_nid), NULL)) {
 		CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n",
-		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
-		       libcfs_nid2str(dest_nid), lnet_msgtyp2str(type));
+		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
+		       libcfs_nidstr(&dest_nid), lnet_msgtyp2str(type));
 		goto drop;
 	}
 
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("%s, src %s: Dropping %s (out of memory)\n",
-		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4401,7 +4397,7 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_from = from_nid;
 	if (!for_me) {
 		msg->msg_target.pid = dest_pid;
-		lnet_nid4_to_nid(dest_nid, &msg->msg_target.nid);
+		msg->msg_target.nid = dest_nid;
 		msg->msg_routing = 1;
 	}
 
@@ -4411,7 +4407,7 @@ void lnet_monitor_thr_stop(void)
 		lnet_net_unlock(cpt);
 		rc = PTR_ERR(lpni);
 		CERROR("%s, src %s: Dropping %s (error %d looking up sender)\n",
-		       libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid),
+		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type), rc);
 		kfree(msg);
 		if (rc == -ESHUTDOWN)
@@ -4426,8 +4422,8 @@ void lnet_monitor_thr_stop(void)
 	 */
 	if (((lnet_drop_asym_route && for_me) ||
 	     !lpni->lpni_peer_net->lpn_peer->lp_alive) &&
-	    LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid4)) {
-		u32 src_net_id = LNET_NIDNET(src_nid);
+	    LNET_NID_NET(&src_nid) != LNET_NIDNET(from_nid4)) {
+		u32 src_net_id = LNET_NID_NET(&src_nid);
 		struct lnet_peer *gw = lpni->lpni_peer_net->lpn_peer;
 		struct lnet_route *route;
 		bool found = false;
@@ -4462,7 +4458,7 @@ void lnet_monitor_thr_stop(void)
 			 */
 			CERROR("%s, src %s: Dropping asymmetrical route %s\n",
 			       libcfs_nid2str(from_nid4),
-			       libcfs_nid2str(src_nid), lnet_msgtyp2str(type));
+			       libcfs_nidstr(&src_nid), lnet_msgtyp2str(type));
 			kfree(msg);
 			goto drop;
 		}
@@ -4486,7 +4482,7 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_rxni = ni;
 	lnet_ni_addref_locked(ni, cpt);
 	/* Multi-Rail: Primary NID of source. */
-	lnet_peer_primary_nid_locked(src_nid, &msg->msg_initiator);
+	lnet_peer_primary_nid_locked(&src_nid, &msg->msg_initiator);
 
 	/* mark the status of this lpni as UP since we received a message
 	 * from it. The ping response reports back the ns_status which is
@@ -4718,7 +4714,7 @@ void lnet_monitor_thr_stop(void)
 	target.pid = target4.pid;
 
 	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
-	    fail_peer(target4.nid, 1)) {		/* shall we now? */
+	    fail_peer(&target.nid, 1)) {		/* shall we now? */
 		CERROR("Dropping PUT to %s: simulated failure\n",
 		       libcfs_id2str(target4));
 		return -EIO;
@@ -4964,7 +4960,7 @@ struct lnet_msg *
 	target.pid = target4.pid;
 
 	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
-	    fail_peer(target4.nid, 1)) {		/* shall we now? */
+	    fail_peer(&target.nid, 1)) {		/* shall we now? */
 		CERROR("Dropping GET to %s: simulated failure\n",
 		       libcfs_id2str(target4));
 		return -EIO;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 494b7ef..8e7f44c 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1345,13 +1345,12 @@ struct lnet_peer_ni *
 }
 
 void
-lnet_peer_primary_nid_locked(lnet_nid_t nid, struct lnet_nid *result)
+lnet_peer_primary_nid_locked(struct lnet_nid *nid, struct lnet_nid *result)
 {
-	/* FIXME handle large-addr nid */
 	struct lnet_peer_ni *lpni;
 
-	lnet_nid4_to_nid(nid, result);
-	lpni = lnet_find_peer_ni_locked(nid);
+	*result = *nid;
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (lpni) {
 		*result = lpni->lpni_peer_net->lpn_peer->lp_primary_nid;
 		lnet_peer_ni_decref_locked(lpni);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 26/50] lnet: change src_nid arg to lnet_parse() to 16byte
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (24 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 25/50] lnet: convert nids in lnet_parse to lnet_nid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 27/50] lnet: Fix NULL-deref in lnet_nidstr_r() James Simmons
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_parse() now gets the source nid as 'struct lnet_nid *'.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: eb0eedbb1a68297b8 ("LU-10391 lnet: change src_nid arg to lnet_parse() to 16byte")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43614
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  2 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 10 +++++++---
 net/lnet/klnds/socklnd/socklnd_cb.c |  2 +-
 net/lnet/lnet/lib-move.c            | 39 +++++++++++++++++--------------------
 net/lnet/lnet/lo.c                  |  3 +--
 5 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index fe2fd83..33fee17 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -706,7 +706,7 @@ void lnet_ptl_attach_md(struct lnet_me *me, struct lnet_libmd *md,
 
 /* message functions */
 int lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr,
-	       lnet_nid_t fromnid, void *private, int rdma_req);
+	       struct lnet_nid *fromnid, void *private, int rdma_req);
 int lnet_parse_local(struct lnet_ni *ni, struct lnet_msg *msg);
 int lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg);
 
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index c1be2f7..983599f 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -323,6 +323,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	int rc2;
 	int post_credit;
 	struct lnet_hdr hdr;
+	struct lnet_nid srcnid;
 
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
 
@@ -382,7 +383,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	case IBLND_MSG_IMMEDIATE:
 		post_credit = IBLND_POSTRX_DONT_POST;
 		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.immediate.ibim_hdr);
-		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 0);
+		lnet_nid4_to_nid(msg->ibm_srcnid, &srcnid);
+		rc = lnet_parse(ni, &hdr, &srcnid, rx, 0);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
@@ -390,7 +392,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	case IBLND_MSG_PUT_REQ:
 		post_credit = IBLND_POSTRX_DONT_POST;
 		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.putreq.ibprm_hdr);
-		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 1);
+		lnet_nid4_to_nid(msg->ibm_srcnid, &srcnid);
+		rc = lnet_parse(ni, &hdr, &srcnid, rx, 1);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
@@ -454,7 +457,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	case IBLND_MSG_GET_REQ:
 		post_credit = IBLND_POSTRX_DONT_POST;
 		lnet_hdr_from_nid4(&hdr, &msg->ibm_u.get.ibgm_hdr);
-		rc = lnet_parse(ni, &hdr, msg->ibm_srcnid, rx, 1);
+		lnet_nid4_to_nid(msg->ibm_srcnid, &srcnid);
+		rc = lnet_parse(ni, &hdr, &srcnid, rx, 1);
 		if (rc < 0)		/* repost on error */
 			post_credit = IBLND_POSTRX_PEER_CREDIT;
 		break;
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index af35c49..adec183 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1189,7 +1189,7 @@ struct ksock_conn_cb *
 		ksocknal_conn_addref(conn);     /* ++ref while parsing */
 
 		rc = lnet_parse(conn->ksnc_peer->ksnp_ni, &hdr,
-				lnet_nid_to_nid4(&conn->ksnc_peer->ksnp_id.nid),
+				&conn->ksnc_peer->ksnp_id.nid,
 				conn, 0);
 		if (rc < 0) {
 			/* I just received garbage: give up on this conn */
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 051bea1..8a90822 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -4229,8 +4229,8 @@ void lnet_monitor_thr_stop(void)
 EXPORT_SYMBOL(lnet_msgtyp2str);
 
 int
-lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid4,
-	   void *private, int rdma_req)
+lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr,
+	   struct lnet_nid *from_nid, void *private, int rdma_req)
 {
 	struct lnet_peer_ni *lpni;
 	struct lnet_msg *msg;
@@ -4238,7 +4238,6 @@ void lnet_monitor_thr_stop(void)
 	lnet_pid_t dest_pid;
 	struct lnet_nid dest_nid;
 	struct lnet_nid src_nid;
-	struct lnet_nid from_nid;
 	bool push = false;
 	int for_me;
 	u32 type;
@@ -4247,8 +4246,6 @@ void lnet_monitor_thr_stop(void)
 
 	LASSERT(!in_interrupt());
 
-	lnet_nid4_to_nid(from_nid4, &from_nid);
-
 	type = hdr->type;
 	src_nid = hdr->src_nid;
 	dest_nid = hdr->dest_nid;
@@ -4256,7 +4253,7 @@ void lnet_monitor_thr_stop(void)
 	payload_length = hdr->payload_length;
 
 	for_me = nid_same(&ni->ni_nid, &dest_nid);
-	cpt = lnet_nid2cpt(&from_nid, ni);
+	cpt = lnet_nid2cpt(from_nid, ni);
 
 	CDEBUG(D_NET, "TRACE: %s(%s) <- %s : %s\n",
 	       libcfs_nidstr(&dest_nid),
@@ -4269,7 +4266,7 @@ void lnet_monitor_thr_stop(void)
 	case LNET_MSG_GET:
 		if (payload_length > 0) {
 			CERROR("%s, src %s: bad %s payload %d (0 expected)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       lnet_msgtyp2str(type), payload_length);
 			return -EPROTO;
@@ -4281,7 +4278,7 @@ void lnet_monitor_thr_stop(void)
 		if (payload_length >
 		   (u32)(for_me ? LNET_MAX_PAYLOAD : LNET_MTU)) {
 			CERROR("%s, src %s: bad %s payload %d (%d max expected)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       lnet_msgtyp2str(type),
 			       payload_length,
@@ -4292,7 +4289,7 @@ void lnet_monitor_thr_stop(void)
 
 	default:
 		CERROR("%s, src %s: Bad message type 0x%x\n",
-		       libcfs_nid2str(from_nid4),
+		       libcfs_nidstr(from_nid),
 		       libcfs_nidstr(&src_nid), type);
 		return -EPROTO;
 	}
@@ -4319,7 +4316,7 @@ void lnet_monitor_thr_stop(void)
 		if (LNET_NID_NET(&dest_nid) == LNET_NID_NET(&ni->ni_nid)) {
 			/* should have gone direct */
 			CERROR("%s, src %s: Bad dest nid %s (should have been sent direct)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
@@ -4330,7 +4327,7 @@ void lnet_monitor_thr_stop(void)
 			 * this node's NID on its own network
 			 */
 			CERROR("%s, src %s: Bad dest nid %s (it's my nid but on a different network)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
@@ -4338,7 +4335,7 @@ void lnet_monitor_thr_stop(void)
 
 		if (rdma_req && type == LNET_MSG_GET) {
 			CERROR("%s, src %s: Bad optimized GET for %s (final destination must be me)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       libcfs_nidstr(&dest_nid));
 			return -EPROTO;
@@ -4346,7 +4343,7 @@ void lnet_monitor_thr_stop(void)
 
 		if (!the_lnet.ln_routing) {
 			CERROR("%s, src %s: Dropping message for %s (routing not enabled)\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid),
 			       libcfs_nidstr(&dest_nid));
 			goto drop;
@@ -4360,7 +4357,7 @@ void lnet_monitor_thr_stop(void)
 	if (!list_empty(&the_lnet.ln_test_peers) &&	/* normally we don't */
 	    fail_peer(&src_nid, 0)) {			/* shall we now? */
 		CERROR("%s, src %s: Dropping %s to simulate failure\n",
-		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
+		       libcfs_nidstr(from_nid), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4369,7 +4366,7 @@ void lnet_monitor_thr_stop(void)
 	if (!list_empty(&the_lnet.ln_drop_rules) &&
 	    lnet_drop_rule_match(hdr, lnet_nid_to_nid4(&ni->ni_nid), NULL)) {
 		CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n",
-		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
+		       libcfs_nidstr(from_nid), libcfs_nidstr(&src_nid),
 		       libcfs_nidstr(&dest_nid), lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4377,7 +4374,7 @@ void lnet_monitor_thr_stop(void)
 	msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS);
 	if (!msg) {
 		CERROR("%s, src %s: Dropping %s (out of memory)\n",
-		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
+		       libcfs_nidstr(from_nid), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type));
 		goto drop;
 	}
@@ -4394,7 +4391,7 @@ void lnet_monitor_thr_stop(void)
 	msg->msg_offset = 0;
 	msg->msg_hdr = *hdr;
 	/* for building message event */
-	msg->msg_from = from_nid;
+	msg->msg_from = *from_nid;
 	if (!for_me) {
 		msg->msg_target.pid = dest_pid;
 		msg->msg_target.nid = dest_nid;
@@ -4402,12 +4399,12 @@ void lnet_monitor_thr_stop(void)
 	}
 
 	lnet_net_lock(cpt);
-	lpni = lnet_peerni_by_nid_locked(&from_nid, &ni->ni_nid, cpt);
+	lpni = lnet_peerni_by_nid_locked(from_nid, &ni->ni_nid, cpt);
 	if (IS_ERR(lpni)) {
 		lnet_net_unlock(cpt);
 		rc = PTR_ERR(lpni);
 		CERROR("%s, src %s: Dropping %s (error %d looking up sender)\n",
-		       libcfs_nid2str(from_nid4), libcfs_nidstr(&src_nid),
+		       libcfs_nidstr(from_nid), libcfs_nidstr(&src_nid),
 		       lnet_msgtyp2str(type), rc);
 		kfree(msg);
 		if (rc == -ESHUTDOWN)
@@ -4422,7 +4419,7 @@ void lnet_monitor_thr_stop(void)
 	 */
 	if (((lnet_drop_asym_route && for_me) ||
 	     !lpni->lpni_peer_net->lpn_peer->lp_alive) &&
-	    LNET_NID_NET(&src_nid) != LNET_NIDNET(from_nid4)) {
+	    LNET_NID_NET(&src_nid) != LNET_NID_NET(from_nid)) {
 		u32 src_net_id = LNET_NID_NET(&src_nid);
 		struct lnet_peer *gw = lpni->lpni_peer_net->lpn_peer;
 		struct lnet_route *route;
@@ -4457,7 +4454,7 @@ void lnet_monitor_thr_stop(void)
 			 * => asymmetric routing detected but forbidden
 			 */
 			CERROR("%s, src %s: Dropping asymmetrical route %s\n",
-			       libcfs_nid2str(from_nid4),
+			       libcfs_nidstr(from_nid),
 			       libcfs_nidstr(&src_nid), lnet_msgtyp2str(type));
 			kfree(msg);
 			goto drop;
diff --git a/net/lnet/lnet/lo.c b/net/lnet/lnet/lo.c
index 3d3dcf8..90155b5 100644
--- a/net/lnet/lnet/lo.c
+++ b/net/lnet/lnet/lo.c
@@ -40,8 +40,7 @@
 	LASSERT(!lntmsg->msg_routing);
 	LASSERT(!lntmsg->msg_target_is_router);
 
-	return lnet_parse(ni, &lntmsg->msg_hdr,
-			  lnet_nid_to_nid4(&ni->ni_nid), lntmsg, 0);
+	return lnet_parse(ni, &lntmsg->msg_hdr, &ni->ni_nid, lntmsg, 0);
 }
 
 static int
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 27/50] lnet: Fix NULL-deref in lnet_nidstr_r()
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (25 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 26/50] lnet: change src_nid arg to lnet_parse() to 16byte James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 28/50] lnet: change lnet_del_route() to take lnet_nid James Simmons
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

It is valid to pass NULL as the nid for lnet_nidstr_r() - it indicate
"any" nid.  LNET_NID_IS_ANY() tests for this and the function exits
early.

However, 'lnd' is assigned from "nid->nid_type" and 'nnum' from
"nid->nid_num", causing a NULL-pointer dereference.

So move these assignments later.

Fixes: a2cfedead042 ("lnet: introduce struct lnet_nid")
WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 8ed370864c4747281 ("LU-10391 lnet: Fix NULL-deref in lnet_nidstr_r()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44838
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/nidstrings.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index dfd6744..3523b78 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -967,8 +967,8 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 char *
 libcfs_nidstr_r(const struct lnet_nid *nid, char *buf, size_t buf_size)
 {
-	u32 nnum = be16_to_cpu(nid->nid_num);
-	u32 lnd  = nid->nid_type;
+	u32 nnum;
+	u32 lnd;
 	struct netstrfns *nf;
 
 	if (LNET_NID_IS_ANY(nid)) {
@@ -977,6 +977,8 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 		return buf;
 	}
 
+	nnum = be16_to_cpu(nid->nid_num);
+	lnd = nid->nid_type;
 	nf = libcfs_lnd2netstrfns(lnd);
 	if (nf) {
 		size_t addr_len;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 28/50] lnet: change lnet_del_route() to take lnet_nid
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (26 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 27/50] lnet: Fix NULL-deref in lnet_nidstr_r() James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 29/50] lustre: llite: Move free user pages James Simmons
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The gateway NID passed to lnet_del_route is now a struct lnet_nid.
Instead of passing LNET_NID_ANY as a wildcard, we pass
a NULL pointer.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 58993799f29d12c1a ("U-10391 lnet: change lnet_del_route() to take lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/43615
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  2 +-
 net/lnet/lnet/api-ni.c        |  3 ++-
 net/lnet/lnet/peer.c          |  7 +++----
 net/lnet/lnet/router.c        | 29 +++++++++++++++++------------
 4 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 33fee17..b6a7a54 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -576,7 +576,7 @@ void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive,
 			time64_t when);
 int lnet_add_route(u32 net, u32 hops, struct lnet_nid *gateway,
 		   u32 priority, u32 sensitivity);
-int lnet_del_route(u32 net, lnet_nid_t gw_nid);
+int lnet_del_route(u32 net, struct lnet_nid *gw_nid);
 void lnet_move_route(struct lnet_route *route, struct lnet_peer *lp,
 		     struct list_head *rt_list);
 void lnet_destroy_routes(void);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 0389a89..7a05752 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3901,8 +3901,9 @@ u32 lnet_get_dlc_seq_locked(void)
 		if (config->cfg_hdr.ioc_len < sizeof(*config))
 			return -EINVAL;
 
+		lnet_nid4_to_nid(config->cfg_nid, &nid);
 		mutex_lock(&the_lnet.ln_api_mutex);
-		rc = lnet_del_route(config->cfg_net, config->cfg_nid);
+		rc = lnet_del_route(config->cfg_net, &nid);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc;
 
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 8e7f44c..f70ceb5 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -627,7 +627,7 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 {
 	struct lnet_peer_ni *lp;
 	struct lnet_peer_ni *tmp;
-	lnet_nid_t gw_nid;
+	struct lnet_nid gw_nid;
 	int i;
 
 	for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
@@ -639,10 +639,9 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 			if (!lnet_isrouter(lp))
 				continue;
 
-			/* FIXME handle large-addr nid */
-			gw_nid = lnet_nid_to_nid4(&lp->lpni_peer_net->lpn_peer->lp_primary_nid);
+			gw_nid = lp->lpni_peer_net->lpn_peer->lp_primary_nid;
 			lnet_net_unlock(LNET_LOCK_EX);
-			lnet_del_route(LNET_NET_ANY, gw_nid);
+			lnet_del_route(LNET_NET_ANY, &gw_nid);
 			lnet_net_lock(LNET_LOCK_EX);
 		}
 	}
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 87ae1f9..beded3e 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -116,7 +116,7 @@ static int rtr_sensitivity_set(const char *val,
 
 static void lnet_add_route_to_rnet(struct lnet_remotenet *rnet,
 				   struct lnet_route *route);
-static void lnet_del_route_from_rnet(lnet_nid_t gw_nid,
+static void lnet_del_route_from_rnet(struct lnet_nid *gw_nid,
 				     struct list_head *route_list,
 				     struct list_head *zombies);
 
@@ -175,7 +175,7 @@ static void lnet_del_route_from_rnet(lnet_nid_t gw_nid,
 	/* use the gateway's lp_primary_nid to delete the route as the
 	 * lr_nid can be a constituent NID of the peer
 	 */
-	lnet_del_route_from_rnet(lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid),
+	lnet_del_route_from_rnet(&route->lr_gateway->lp_primary_nid,
 				 &rnet->lrn_routes, l);
 
 	if (lp) {
@@ -788,7 +788,8 @@ static void lnet_shuffle_seed(void)
 }
 
 void
-lnet_del_route_from_rnet(lnet_nid_t gw_nid, struct list_head *route_list,
+lnet_del_route_from_rnet(struct lnet_nid *gw_nid,
+			 struct list_head *route_list,
 			 struct list_head *zombies)
 {
 	struct lnet_peer *gateway;
@@ -797,8 +798,7 @@ static void lnet_shuffle_seed(void)
 
 	list_for_each_entry_safe(route, tmp, route_list, lr_list) {
 		gateway = route->lr_gateway;
-		if (gw_nid != LNET_NID_ANY &&
-		    gw_nid != lnet_nid_to_nid4(&gateway->lp_primary_nid))
+		if (gw_nid && !nid_same(gw_nid, &gateway->lp_primary_nid))
 			continue;
 
 		/* move to zombie to delete outside the lock
@@ -817,7 +817,7 @@ static void lnet_shuffle_seed(void)
 }
 
 int
-lnet_del_route(u32 net, lnet_nid_t gw_nid)
+lnet_del_route(u32 net, struct lnet_nid *gw)
 {
 	LIST_HEAD(rnet_zombies);
 	struct lnet_remotenet *rnet;
@@ -825,12 +825,13 @@ static void lnet_shuffle_seed(void)
 	struct list_head *rn_list;
 	struct lnet_peer_ni *lpni;
 	struct lnet_route *route;
+	struct lnet_nid gw_nid;
 	LIST_HEAD(zombies);
 	struct lnet_peer *lp = NULL;
 	int i = 0;
 
 	CDEBUG(D_NET, "Del route: net %s : gw %s\n",
-	       libcfs_net2str(net), libcfs_nid2str(gw_nid));
+	       libcfs_net2str(net), libcfs_nidstr(gw));
 
 	/* NB Caller may specify either all routes via the given gateway
 	 * or a specific route entry actual NIDs)
@@ -838,11 +839,15 @@ static void lnet_shuffle_seed(void)
 
 	lnet_net_lock(LNET_LOCK_EX);
 
-	lpni = lnet_find_peer_ni_locked(gw_nid);
+	if (gw)
+		lpni = lnet_peer_ni_find_locked(gw);
+	else
+		lpni = NULL;
 	if (lpni) {
 		lp = lpni->lpni_peer_net->lpn_peer;
 		LASSERT(lp);
-		gw_nid = lnet_nid_to_nid4(&lp->lp_primary_nid);
+		gw_nid = lp->lp_primary_nid;
+		gw = &gw_nid;
 		lnet_peer_ni_decref_locked(lpni);
 	}
 
@@ -852,7 +857,7 @@ static void lnet_shuffle_seed(void)
 			lnet_net_unlock(LNET_LOCK_EX);
 			return -ENOENT;
 		}
-		lnet_del_route_from_rnet(gw_nid, &rnet->lrn_routes,
+		lnet_del_route_from_rnet(gw, &rnet->lrn_routes,
 					 &zombies);
 		if (list_empty(&rnet->lrn_routes))
 			list_move(&rnet->lrn_list, &rnet_zombies);
@@ -863,7 +868,7 @@ static void lnet_shuffle_seed(void)
 		rn_list = &the_lnet.ln_remote_nets_hash[i];
 
 		list_for_each_entry_safe(rnet, tmp, rn_list, lrn_list) {
-			lnet_del_route_from_rnet(gw_nid, &rnet->lrn_routes,
+			lnet_del_route_from_rnet(gw, &rnet->lrn_routes,
 						 &zombies);
 			if (list_empty(&rnet->lrn_routes))
 				list_move(&rnet->lrn_list, &rnet_zombies);
@@ -903,7 +908,7 @@ static void lnet_shuffle_seed(void)
 void
 lnet_destroy_routes(void)
 {
-	lnet_del_route(LNET_NET_ANY, LNET_NID_ANY);
+	lnet_del_route(LNET_NET_ANY, NULL);
 }
 
 int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 29/50] lustre: llite: Move free user pages
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (27 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 28/50] lnet: change lnet_del_route() to take lnet_nid James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 30/50] lustre: llite: Do not get/put DIO pages James Simmons
                   ` (20 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

It is incorrect to release our reference on the user pages
before we're done with them - We need to keep it until the
i/o is complete, otherwise we access them after releasing
our reference.  This has not caused any known bugs so far,
but it's still wrong.

So only drop these references when we free the aio struct,
which is only freed once i/o is complete.

Also rename free_user_pages to release_user_pages, because
it does not free them - it just releases our reference.

This also helps performance by moving free_user_pages to
the daemon threads.  This is a 5-10% boost.

This patch reduces i/o time in ms/GiB by:
Write: 18 ms/GiB
ead: 19 ms/GiB

Totals:
Write: 180 ms/GiB
Read: 178 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write           5183 MiB/s
read            5201 MiB/s

Plus this patch:
write           5702 MiB/s
read            5756 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 7f9b8465bc1125e51 ("LU-13799 llite: Move free user pages")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39443
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 18 +++++++++++++++
 fs/lustre/llite/rw26.c        | 52 +++++++++----------------------------------
 fs/lustre/obdclass/cl_io.c    | 21 ++++++++++++++++-
 3 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 9815b19..af708cc 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -2620,6 +2620,21 @@ struct cl_sync_io {
 	struct cl_dio_aio	*csi_aio;
 };
 
+
+/* direct IO pages */
+struct ll_dio_pages {
+	struct cl_dio_aio	*ldp_aio;
+	/*
+	 * page array to be written. we don't support
+	 * partial pages except the last one.
+	 */
+	struct page		**ldp_pages;
+	/* # of pages in the array. */
+	size_t			ldp_count;
+	/* the file offset of the first page. */
+	loff_t			ldp_file_offset;
+};
+
 /* To support Direct AIO */
 struct cl_dio_aio {
 	struct cl_sync_io	cda_sync;
@@ -2628,10 +2643,13 @@ struct cl_dio_aio {
 	struct kiocb		*cda_iocb;
 	ssize_t			cda_bytes;
 	struct cl_dio_aio	*cda_ll_aio;
+	struct ll_dio_pages	cda_dio_pages;
 	unsigned int		cda_no_aio_complete:1,
 				cda_no_aio_free:1;
 };
 
+void ll_release_user_pages(struct page **pages, int npages);
+
 /** @} cl_sync_io */
 
 /** \defgroup cl_env cl_env
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 16cccfa..a5cdb01 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -150,22 +150,6 @@ static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)
 	return result;
 }
 
-/*
- * ll_free_user_pages - tear down page struct array
- * @pages: array of page struct pointers underlying target buffer
- */
-static void ll_free_user_pages(struct page **pages, int npages)
-{
-	int i;
-
-	for (i = 0; i < npages; i++) {
-		if (!pages[i])
-			break;
-		put_page(pages[i]);
-	}
-	kvfree(pages);
-}
-
 static ssize_t ll_get_user_pages(int rw, struct iov_iter *iter,
 				struct page ***pages, ssize_t *npages,
 				size_t maxsize)
@@ -209,28 +193,15 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i)
 	return res;
 }
 
-/* direct IO pages */
-struct ll_dio_pages {
-	struct cl_dio_aio	*ldp_aio;
-	/*
-	 * page array to be written. we don't support
-	 * partial pages except the last one.
-	 */
-	struct page		**ldp_pages;
-	/* # of pages in the array. */
-	size_t			ldp_count;
-	/* the file offset of the first page. */
-	loff_t			ldp_file_offset;
-};
-
 static int
 ll_direct_rw_pages(const struct lu_env *env, struct cl_io *io, size_t size,
-		   int rw, struct inode *inode, struct ll_dio_pages *pv)
+		   int rw, struct inode *inode, struct cl_dio_aio *aio)
 {
+	struct ll_dio_pages *pv = &aio->cda_dio_pages;
 	struct cl_page *page;
 	struct cl_2queue *queue = &io->ci_queue;
 	struct cl_object *obj = io->ci_obj;
-	struct cl_sync_io *anchor = &pv->ldp_aio->cda_sync;
+	struct cl_sync_io *anchor = &aio->cda_sync;
 	loff_t offset = pv->ldp_file_offset;
 	int io_pages = 0;
 	size_t page_size = cl_page_size(obj);
@@ -290,8 +261,7 @@ struct ll_dio_pages {
 		smp_mb();
 		rc = cl_io_submit_rw(env, io, iot, queue);
 		if (rc == 0) {
-			cl_page_list_splice(&queue->c2_qout,
-					&pv->ldp_aio->cda_pages);
+			cl_page_list_splice(&queue->c2_qout, &aio->cda_pages);
 		} else {
 			atomic_add(-queue->c2_qin.pl_nr,
 				   &anchor->csi_sync_nr);
@@ -371,7 +341,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	LASSERT(ll_aio->cda_iocb == iocb);
 
 	while (iov_iter_count(iter)) {
-		struct ll_dio_pages pvec = {};
+		struct ll_dio_pages *pvec;
 		struct page **pages;
 
 		count = min_t(size_t, iov_iter_count(iter), MAX_DIO_SIZE);
@@ -392,26 +362,26 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			result = -ENOMEM;
 			goto out;
 		}
-		pvec.ldp_aio = ldp_aio;
+
+		pvec = &ldp_aio->cda_dio_pages;
 
 		result = ll_get_user_pages(rw, iter, &pages,
-					   &pvec.ldp_count, count);
+					   &pvec->ldp_count, count);
 		if (unlikely(result <= 0)) {
 			cl_sync_io_note(env, &ldp_aio->cda_sync, result);
 			goto out;
 		}
 
 		count = result;
-		pvec.ldp_file_offset = file_offset;
-		pvec.ldp_pages = pages;
+		pvec->ldp_file_offset = file_offset;
+		pvec->ldp_pages = pages;
 
 		result = ll_direct_rw_pages(env, io, count,
-					    rw, inode, &pvec);
+					    rw, inode, ldp_aio);
 		/* We've submitted pages and can now remove the extra
 		 * reference for that
 		 */
 		cl_sync_io_note(env, &ldp_aio->cda_sync, result);
-		ll_free_user_pages(pages, pvec.ldp_count);
 
 		if (unlikely(result < 0))
 			goto out;
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 038ab4c..e4fc795 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1139,8 +1139,11 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 		aio->cda_iocb->ki_complete(aio->cda_iocb,
 					   ret ?: aio->cda_bytes, 0);
 
-	if (aio->cda_ll_aio)
+	if (aio->cda_ll_aio) {
+		ll_release_user_pages(aio->cda_dio_pages.ldp_pages,
+				      aio->cda_dio_pages.ldp_count);
 		cl_sync_io_note(env, &aio->cda_ll_aio->cda_sync, ret);
+	}
 }
 
 struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj,
@@ -1195,6 +1198,22 @@ void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio)
 }
 EXPORT_SYMBOL(cl_aio_free);
 
+/*
+ * ll_release_user_pages - tear down page struct array
+ * @pages: array of page struct pointers underlying target buffer
+ */
+void ll_release_user_pages(struct page **pages, int npages)
+{
+	int i;
+
+	for (i = 0; i < npages; i++) {
+		if (!pages[i])
+			break;
+		put_page(pages[i]);
+	}
+	kvfree(pages);
+}
+
 /**
  * Indicate that transfer of a single page completed.
  */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 30/50] lustre: llite: Do not get/put DIO pages
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (28 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 29/50] lustre: llite: Move free user pages James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 31/50] lustre: llite: Remove unnecessary page get/put James Simmons
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

We've already told the kernel we're working with these pages
using the get/put_user_pages functions, and userspace must
hold references on them throughout the i/o anyway.

So getting/putting these vmpages is unnecessary.  This
saves around 7% of the time in DIO page submission, netting
about that much of a performance improvement.

This patch reduces i/o time in ms/GiB by:
Write: 22 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 135 ms/GiB
Read: 143 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write           6470 MiB/s
read            6354 MiB/s

Plus this patch:
write           7531 MiB/s
read            7179 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 881b4c722296ff7ac ("LU-13799 llite: Do not get/put DIO pages")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39438
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_page.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 60a28d6..ae51ba0 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -52,20 +52,6 @@
  * Page operations.
  *
  */
-
-static void vvp_page_fini_common(struct vvp_page *vpg, struct pagevec *pvec)
-{
-	struct page *vmpage = vpg->vpg_page;
-
-	LASSERT(vmpage);
-	if (pvec) {
-		if (!pagevec_add(pvec, vmpage))
-			pagevec_release(pvec);
-	} else {
-		put_page(vmpage);
-	}
-}
-
 static void vvp_page_fini(const struct lu_env *env,
 			  struct cl_page_slice *slice,
 			  struct pagevec *pvec)
@@ -78,7 +64,13 @@ static void vvp_page_fini(const struct lu_env *env,
 	 * VPG_FREEING state.
 	 */
 	LASSERT((struct cl_page *)vmpage->private != slice->cpl_page);
-	vvp_page_fini_common(vpg, pvec);
+	LASSERT(vmpage);
+	if (pvec) {
+		if (!pagevec_add(pvec, vmpage))
+			pagevec_release(pvec);
+	} else {
+		put_page(vmpage);
+	}
 }
 
 static int vvp_page_own(const struct lu_env *env,
@@ -432,18 +424,8 @@ static int vvp_transient_page_is_vmlocked(const struct lu_env *env,
 	return -EBUSY;
 }
 
-static void vvp_transient_page_fini(const struct lu_env *env,
-				    struct cl_page_slice *slice,
-				    struct pagevec *pvec)
-{
-	struct vvp_page *vpg = cl2vvp_page(slice);
-
-	vvp_page_fini_common(vpg, pvec);
-}
-
 static const struct cl_page_operations vvp_transient_page_ops = {
 	.cpo_discard		= vvp_transient_page_discard,
-	.cpo_fini		= vvp_transient_page_fini,
 	.cpo_is_vmlocked	= vvp_transient_page_is_vmlocked,
 	.cpo_print		= vvp_page_print,
 };
@@ -457,7 +439,6 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
 	vpg->vpg_page = vmpage;
-	get_page(vmpage);
 
 	if (page->cp_type == CPT_TRANSIENT) {
 		/* DIO pages are referenced by userspace, we don't need to take
@@ -466,6 +447,7 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		cl_page_slice_add(page, &vpg->vpg_cl, obj,
 				  &vvp_transient_page_ops);
 	} else {
+		get_page(vmpage);
 		/* in cache, decref in vvp_page_delete */
 		refcount_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 31/50] lustre: llite: Remove unnecessary page get/put
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (29 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 30/50] lustre: llite: Do not get/put DIO pages James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 32/50] lustre: llite: LL_IOC_LMV_GETSTRIPE 'default' shows inherit layout James Simmons
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Part of the aio cleanup code has the slightly strange
behavior of doing get on every page before calling page
cleanup, then doing a put after.

This was required because we call cl_page_list_del before
calling cl_page_delete, and cl_page_list_del was holding
the last reference on the page struct.

If we reverse the order, then we don't need the extra
get/put to keep the pages live.  This should save
significant CPU time in the ptlrpcd threads when finishing
i/o, since this removes a get/put on every page.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: c2e94f08cf3ff000b ("LU-13799 llite: Remove unnecessary page get/put")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/cl_io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index e4fc795..6dd029a 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1129,10 +1129,8 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 	while (aio->cda_pages.pl_nr > 0) {
 		struct cl_page *page = cl_page_list_first(&aio->cda_pages);
 
-		cl_page_get(page);
-		cl_page_list_del(env, &aio->cda_pages, page);
 		cl_page_delete(env, page);
-		cl_page_put(env, page);
+		cl_page_list_del(env, &aio->cda_pages, page);
 	}
 
 	if (!aio->cda_no_aio_complete)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 32/50] lustre: llite: LL_IOC_LMV_GETSTRIPE 'default' shows inherit layout
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (30 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 31/50] lustre: llite: Remove unnecessary page get/put James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 33/50] lustre: hsm: update size upon completion of data version James Simmons
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Once system-wide default LMV is set, "lfs getdirstripe -D subdir"
should show inherited layout from it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15200
Lustre-commit: 61b1fad9e3fb21edc ("LU-15200 llite: "lfs getdirstripe -D" shows inherit layout")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45570
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index b4870d9..d3df4d0 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -759,7 +759,7 @@ int ll_dir_getstripe_default(struct inode *inode, void **plmm, int *plmm_size,
 	rc = ll_dir_get_default_layout(inode, (void **)&lmm, &lmm_size,
 				       &req, valid, 0);
 	if (rc == -ENODATA && !fid_is_root(ll_inode2fid(inode)) &&
-	    !(valid & (OBD_MD_MEA|OBD_MD_DEFAULT_MEA)) && root_request) {
+	    !(valid & OBD_MD_MEA) && root_request) {
 		int rc2 = ll_dir_get_default_layout(inode, (void **)&lmm,
 						    &lmm_size, &root_req, valid,
 						    GET_DEFAULT_LAYOUT_ROOT);
@@ -1602,6 +1602,43 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 				goto finish_req;
 			}
 
+			if (root_request) {
+				struct lmv_user_md *lum;
+				struct ll_inode_info *lli;
+
+				lum = (struct lmv_user_md *)lmm;
+				lli = ll_i2info(inode);
+				if (lum->lum_max_inherit == LMV_INHERIT_NONE ||
+				    (lum->lum_max_inherit > 0 &&
+				     lum->lum_max_inherit < lli->lli_dir_depth)) {
+					rc = -ENODATA;
+					goto finish_req;
+				}
+
+				if (lum->lum_max_inherit ==
+				    lli->lli_dir_depth) {
+					lum->lum_max_inherit = LMV_INHERIT_NONE;
+					lum->lum_max_inherit_rr =
+						LMV_INHERIT_RR_NONE;
+					goto out_copy;
+				}
+				if (lum->lum_max_inherit > lli->lli_dir_depth &&
+				    lum->lum_max_inherit <= LMV_INHERIT_MAX)
+					lum->lum_max_inherit -=
+						lli->lli_dir_depth;
+
+				if (lum->lum_max_inherit_rr >
+					lli->lli_dir_depth &&
+				    lum->lum_max_inherit_rr <=
+					LMV_INHERIT_RR_MAX)
+					lum->lum_max_inherit_rr -=
+						lli->lli_dir_depth;
+				else if (lum->lum_max_inherit_rr ==
+						lli->lli_dir_depth)
+					lum->lum_max_inherit_rr =
+						LMV_INHERIT_RR_NONE;
+			}
+out_copy:
 			if (copy_to_user(ulmv, lmm, lmmsize))
 				rc = -EFAULT;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 33/50] lustre: hsm: update size upon completion of data version
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (31 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 32/50] lustre: llite: LL_IOC_LMV_GETSTRIPE 'default' shows inherit layout James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 34/50] lustre: llite: Delay dput in ll_dirty_page_discard_warn James Simmons
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

We found a HSM retore followed by a HSM release will set the
file size with 0 wrongly during the tests.
The reason is that the file size and blocks information is
incorrect obtained via @ll_merger_attr().
The data version operation will flush dirty pages from all
clients, the size and blocks information returns from the Lustre
OST is correct.
In this patch, we update the size and block attributes for a file
upon the completion of the data version operation accordingly.
By this way, HSM release will set the size and blocks information
correctly after data version ioctl operation.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15381
Lustre-commit: dd3b5601ec6905b00 ("LU-15381 hsm: update size upon completion of data version")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/45935
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c                  |  5 +++--
 fs/lustre/osc/osc_io.c                  | 30 +++++++++++++++++++++++++-----
 include/uapi/linux/lustre/lustre_user.h |  1 +
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 6b95133..373efdd 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2771,8 +2771,8 @@ int ll_hsm_release(struct inode *inode)
 	struct lu_env *env;
 	struct obd_client_handle *och = NULL;
 	u64 data_version = 0;
-	int rc;
 	u16 refcheck;
+	int rc;
 
 	CDEBUG(D_INODE, "%s: Releasing file " DFID ".\n",
 	       ll_i2sbi(inode)->ll_fsname,
@@ -2785,7 +2785,8 @@ int ll_hsm_release(struct inode *inode)
 	}
 
 	/* Grab latest data_version and [am]time values */
-	rc = ll_data_version(inode, &data_version, LL_DV_WR_FLUSH);
+	rc = ll_data_version(inode, &data_version,
+			     LL_DV_WR_FLUSH | LL_DV_SZ_UPDATE);
 	if (rc != 0)
 		goto out;
 
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index b84022b..db91bf2 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -833,7 +833,11 @@ static void osc_io_data_version_end(const struct lu_env *env,
 {
 	struct cl_data_version_io *dv = &slice->cis_io->u.ci_data_version;
 	struct osc_io *oio = cl2osc_io(env, slice);
+	struct cl_object *obj = slice->cis_obj;
 	struct osc_async_cbargs *cbargs = &oio->oi_cbarg;
+	struct cl_attr *attr = &osc_env_info(env)->oti_attr;
+	struct obdo *oa = &oio->oi_oa;
+	unsigned int cl_valid = 0;
 
 	wait_for_completion(&cbargs->opc_sync);
 
@@ -841,14 +845,30 @@ static void osc_io_data_version_end(const struct lu_env *env,
 		slice->cis_io->ci_result = cbargs->opc_rc;
 	} else {
 		slice->cis_io->ci_result = 0;
-		if (!(oio->oi_oa.o_valid &
+		if (!(oa->o_valid &
 		      (OBD_MD_LAYOUT_VERSION | OBD_MD_FLDATAVERSION)))
 			slice->cis_io->ci_result = -ENOTSUPP;
 
-		if (oio->oi_oa.o_valid & OBD_MD_LAYOUT_VERSION)
-			dv->dv_layout_version = oio->oi_oa.o_layout_version;
-		if (oio->oi_oa.o_valid & OBD_MD_FLDATAVERSION)
-			dv->dv_data_version = oio->oi_oa.o_data_version;
+		if (oa->o_valid & OBD_MD_LAYOUT_VERSION)
+			dv->dv_layout_version = oa->o_layout_version;
+		if (oa->o_valid & OBD_MD_FLDATAVERSION)
+			dv->dv_data_version = oa->o_data_version;
+
+		if (dv->dv_flags & LL_DV_SZ_UPDATE) {
+			if (oa->o_valid & OBD_MD_FLSIZE) {
+				attr->cat_size = oa->o_size;
+				cl_valid |= CAT_SIZE;
+			}
+
+			if (oa->o_valid & OBD_MD_FLBLOCKS) {
+				attr->cat_blocks = oa->o_blocks;
+				cl_valid |= CAT_BLOCKS;
+			}
+
+			cl_object_attr_lock(obj);
+			cl_object_attr_update(env, obj, attr, cl_valid);
+			cl_object_attr_unlock(obj);
+		}
 	}
 }
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 3b53a5b..2fdd7df 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1848,6 +1848,7 @@ struct ioc_data_version {
 enum ioc_data_version_flags {
 	LL_DV_RD_FLUSH	= (1 << 0), /* Flush dirty pages from clients */
 	LL_DV_WR_FLUSH	= (1 << 1), /* Flush all caching pages from clients */
+	LL_DV_SZ_UPDATE	= (1 << 2), /* Update the file size on the client */
 };
 
 #ifndef offsetof
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 34/50] lustre: llite: Delay dput in ll_dirty_page_discard_warn
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (32 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 33/50] lustre: hsm: update size upon completion of data version James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 35/50] lnet: libcfs: Use FAIL_CHECK_QUIET for fake i/o James Simmons
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

Otherwise we can be final dput and need to wait for pages
to clear which is bad because this is called from ptlrpcd
that is not supposed to block esp. for network traffic as
it can cause livelocks if it happens to be needed to kill
the very same RPC we are waiting on.

Additionally pass in the inode from IO since the page
we are using might come from directio and that is
probably not even a valid inode.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15340
Lustre-commit: a1d75780ba19cfca5 ("LU-15340 llite: Delay dput in ll_dirty_page_discard_warn")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45784
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  2 +-
 fs/lustre/llite/llite_lib.c      | 49 +++++++++++++++++++++++++++++++---------
 fs/lustre/llite/vvp_page.c       |  2 +-
 3 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index a8d43bd..f51ab19 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1208,7 +1208,7 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 void ll_umount_begin(struct super_block *sb);
 int ll_remount_fs(struct super_block *sb, int *flags, char *data);
 int ll_show_options(struct seq_file *seq, struct dentry *dentry);
-void ll_dirty_page_discard_warn(struct page *page, int ioret);
+void ll_dirty_page_discard_warn(struct inode *inode, int ioret);
 int ll_prep_inode(struct inode **inode, struct req_capsule *pill,
 		  struct super_block *sb, struct lookup_intent *it);
 int ll_obd_statfs(struct inode *inode, void __user *arg);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 4c91a78..423c531 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3334,18 +3334,36 @@ int ll_get_obd_name(struct inode *inode, unsigned int cmd, unsigned long arg)
 	return 0;
 }
 
-void ll_dirty_page_discard_warn(struct page *page, int ioret)
+struct dname_buf {
+	struct work_struct db_work;
+	struct dentry *db_dentry;
+	/* Let's hope the path is not too long, 32 bytes for the work struct
+	 * on my kernel
+	 */
+	char buf[PAGE_SIZE - sizeof(struct work_struct) - sizeof(void *)];
+};
+
+static void ll_dput_later(struct work_struct *work)
 {
-	char *buf, *path = NULL;
+	struct dname_buf *db = container_of(work, struct dname_buf, db_work);
+
+	dput(db->db_dentry);
+	free_page((unsigned long)db);
+}
+
+void ll_dirty_page_discard_warn(struct inode *inode, int ioret)
+{
+	struct dname_buf *db;
+	char *path = NULL;
 	struct dentry *dentry = NULL;
-	struct inode *inode = page->mapping->host;
 
 	/* this can be called inside spin lock so use GFP_ATOMIC. */
-	buf = (char *)__get_free_page(GFP_ATOMIC);
-	if (buf) {
+	db = (struct dname_buf *)__get_free_page(GFP_ATOMIC);
+	if (db) {
 		dentry = d_find_alias(inode);
 		if (dentry)
-			path = dentry_path_raw(dentry, buf, PAGE_SIZE);
+			path = dentry_path_raw(dentry, db->buf,
+					       sizeof(db->buf));
 	}
 
 	/* The below message is checked in recovery-small.sh test_24b */
@@ -3356,11 +3374,20 @@ void ll_dirty_page_discard_warn(struct page *page, int ioret)
 	       PFID(ll_inode2fid(inode)),
 	       (path && !IS_ERR(path)) ? path : "", ioret);
 
-	if (dentry)
-		dput(dentry);
-
-	if (buf)
-		free_page((unsigned long)buf);
+	if (dentry) {
+		/* We cannot dput here since if we happen to be the last holder
+		 * then we can end up waiting for page evictions that
+		 * in turn wait for RPCs that need this instance of ptlrpcd
+		 * (callng brw_interpret->*page_completion*->vmpage_error->here)
+		 * LU-15340
+		 */
+		INIT_WORK(&db->db_work, ll_dput_later);
+		db->db_dentry = dentry;
+		schedule_work(&db->db_work);
+	} else {
+		if (db)
+			free_page((unsigned long)db);
+	}
 }
 
 ssize_t ll_copy_user_md(const struct lov_user_md __user *md,
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index ae51ba0..1e95ede 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -244,7 +244,7 @@ static void vvp_vmpage_error(struct inode *inode, struct page *vmpage,
 		if ((ioret == -ESHUTDOWN || ioret == -EINTR ||
 		     ioret == -EIO) && obj->vob_discard_page_warned == 0) {
 			obj->vob_discard_page_warned = 1;
-			ll_dirty_page_discard_warn(vmpage, ioret);
+			ll_dirty_page_discard_warn(inode, ioret);
 		}
 	}
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 35/50] lnet: libcfs: Use FAIL_CHECK_QUIET for fake i/o
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (33 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 34/50] lustre: llite: Delay dput in ll_dirty_page_discard_warn James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 36/50] lnet: Avoid peer NI recovery for local interface James Simmons
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Logging to the console is relatively expensive and doing it
for fake i/o is very expensive in terms of CPU time.

If we use FAIL_CHECK_QUIET, a debug message is logged only once
to the console, and the rest at D_INFO level (probably not at all).

This should hugely reduce the CPU cost of the debugging.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14935
Lustre-commit: 890466a32d3e8683a ("LU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44651
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h    | 1 +
 include/linux/libcfs/libcfs_fail.h | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index d57c25c..0732fe9a 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -524,6 +524,7 @@
 /* Assign references to moved code to reduce code changes */
 #define OBD_FAIL_PRECHECK(id)			CFS_FAIL_PRECHECK(id)
 #define OBD_FAIL_CHECK(id)			CFS_FAIL_CHECK(id)
+#define OBD_FILE_CHECK_QUIET(id)		CFS_FILE_CHECK_QUIET(id)
 #define OBD_FAIL_CHECK_VALUE(id, value)		CFS_FAIL_CHECK_VALUE(id, value)
 #define OBD_FAIL_CHECK_ORSET(id, value)		CFS_FAIL_CHECK_ORSET(id, value)
 #define OBD_FAIL_CHECK_RESET(id, value)		CFS_FAIL_CHECK_RESET(id, value)
diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h
index 731401b..552aad6 100644
--- a/include/linux/libcfs/libcfs_fail.h
+++ b/include/linux/libcfs/libcfs_fail.h
@@ -87,15 +87,15 @@ static inline bool CFS_FAIL_PRECHECK(u32 id)
 		 (cfs_fail_loc & id & CFS_FAULT));
 }
 
-static inline int cfs_fail_check_set(u32 id, u32 value,
-				     int set, int quiet)
+static inline int cfs_fail_check_set(u32 id, u32 value, int set, int quiet)
 {
+	unsigned long failed_once = cfs_fail_loc & CFS_FAILED; /* ok if racy */
 	int ret = 0;
 
 	if (unlikely(CFS_FAIL_PRECHECK(id))) {
 		ret = __cfs_fail_check_set(id, value, set);
 		if (ret) {
-			if (quiet) {
+			if (quiet && failed_once) {
 				CDEBUG(D_INFO, "*** cfs_fail_loc=%x, val=%u***\n",
 				       id, value);
 			} else {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 36/50] lnet: Avoid peer NI recovery for local interface
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (34 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 35/50] lnet: libcfs: Use FAIL_CHECK_QUIET for fake i/o James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 37/50] lustre: osc: add OBD_IOC_GETATTR support for osc James Simmons
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If a MR peer has a MR peer entry for itself (can happen if manually
created or discovery is run on itself for some reason), then it is
possible for it to put its own interfaces into peer recovery. Problems
with local interfaces should be handled via local NI recovery.

HPE-bug-id: LUS-10661
WC-bug-id: https://jira.whamcloud.com/browse/LU-15398
Lustre-commit: fb5d7036ec356c825 ("LU-15398 lnet: Avoid peer NI recovery for local interface")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45933
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-msg.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 88f017b..f476975 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -877,6 +877,12 @@
 			if (!lnet_isrouter(lpni))
 				handle_remote_health = false;
 		}
+		/* Do not put my interfaces into peer NI recovery. They should
+		 * be handled with local NI recovery.
+		 */
+		if (handle_remote_health && lpni &&
+		    lnet_nid_to_ni_locked(&lpni->lpni_nid, 0))
+			handle_remote_health = false;
 		lnet_net_unlock(0);
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 37/50] lustre: osc: add OBD_IOC_GETATTR support for osc
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (35 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 36/50] lnet: Avoid peer NI recovery for local interface James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 38/50] lustre: sec: present .fscrypt in subdir mount James Simmons
                   ` (12 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

Add a supporting OBD_IOC_GETATTR case to osc_iocontrol().

WC-bug-id: https://jira.whamcloud.com/browse/LU-15452
Lustre-commit: 4143c3bdec2a87319 ("LU-15452 utils: support lctl getattr for osc")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46131
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_request.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index c442819..43cd6c5 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -3222,18 +3222,21 @@ static int osc_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 					   data->ioc_inlbuf1, 0);
 		if (rc > 0)
 			rc = 0;
-		goto out;
+		break;
+	case OBD_IOC_GETATTR:
+		rc = obd_getattr(NULL, exp, &data->ioc_obdo1);
+		break;
 	case IOC_OSC_SET_ACTIVE:
 		rc = ptlrpc_set_import_active(obd->u.cli.cl_import,
 					      data->ioc_offset);
-		goto out;
+		break;
 	default:
 		CDEBUG(D_INODE, "%s: unrecognised ioctl %#x by %s\n",
 		       obd->obd_name, cmd, current->comm);
 		rc = -ENOTTY;
-		goto out;
+		break;
 	}
-out:
+
 	module_put(THIS_MODULE);
 	return rc;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 38/50] lustre: sec: present .fscrypt in subdir mount
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (36 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 37/50] lustre: osc: add OBD_IOC_GETATTR support for osc James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 39/50] lnet: improve hash distribution across CPTs James Simmons
                   ` (11 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

fscrypt userspace tool works with a .fscrypt directory at the root of
the file system. In case of subdirectory mount, we virtually present
this .fscrypt directory at the root of the mount point so that fscrypt
can be used. This makes it possible to even do a subdirectory mount of
an encrypted directory, making clients access encrypted content only.
Internally, the .fscrypt directory is always stored at the root of
Lustre.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15176
Lustre-commit: c12378fba7f0aa2f2 ("LU-15176 sec: present .fscrypt in subdir mount")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46167
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c                |  9 +++++++--
 fs/lustre/llite/llite_lib.c             | 11 ++++++++++-
 fs/lustre/llite/namei.c                 |  9 +++++++++
 include/uapi/linux/lustre/lustre_user.h |  1 +
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index f310d4c..b0e4f76 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -233,8 +233,13 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
 	}
 	rc = fscrypt_setup_filename(dir, &dname, lookup, fname);
 	if (rc == -ENOENT && lookup &&
-	    !fscrypt_has_encryption_key(dir) &&
-	    unlikely(filename_is_volatile(iname->name, iname->len, NULL))) {
+	    ((is_root_inode(dir) && iname->len == strlen(dot_fscrypt_name) &&
+	      strncmp(iname->name, dot_fscrypt_name, iname->len) == 0) ||
+	     (!fscrypt_has_encryption_key(dir) &&
+	      unlikely(filename_is_volatile(iname->name, iname->len, NULL))))) {
+		/* In case of subdir mount of an encrypted directory, we allow
+		 * lookup of /.fscrypt directory.
+		 */
 		/* For purpose of migration or mirroring without enc key, we
 		 * allow lookup of volatile file without enc context.
 		 */
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 423c531..8c0a3e0 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3157,7 +3157,16 @@ struct md_op_data *ll_prep_md_op_data(struct md_op_data *op_data,
 		return ERR_PTR(-ENOMEM);
 
 	ll_i2gids(op_data->op_suppgids, i1, i2);
-	op_data->op_fid1 = *ll_inode2fid(i1);
+	/* If the client is using a subdir mount and looks at what it sees as
+	 * /.fscrypt, interpret it as the .fscrypt dir at the root of the fs.
+	 */
+	if (unlikely(i1->i_sb && i1->i_sb->s_root && is_root_inode(i1) &&
+		     !fid_is_root(ll_inode2fid(i1)) &&
+		     name && namelen == strlen(dot_fscrypt_name) &&
+		     strncmp(name, dot_fscrypt_name, namelen) == 0))
+		lu_root_fid(&op_data->op_fid1);
+	else
+		op_data->op_fid1 = *ll_inode2fid(i1);
 
 	if (S_ISDIR(i1->i_mode)) {
 		down_read_non_owner(&ll_i2info(i1)->lli_lsm_sem);
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 1e3a4fd..c05e3bf 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -2068,6 +2068,15 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	/* If the client is using a subdir mount and does a rename to what it
+	 * sees as /.fscrypt, interpret it as the .fscrypt dir at fs root.
+	 */
+	if (unlikely(is_root_inode(tgt) && !fid_is_root(ll_inode2fid(tgt)) &&
+		     tgt_dchild->d_name.len == strlen(dot_fscrypt_name) &&
+		     strncmp(tgt_dchild->d_name.name, dot_fscrypt_name,
+			     tgt_dchild->d_name.len) == 0))
+		lu_root_fid(&op_data->op_fid2);
+
 	if (src_dchild->d_inode)
 		op_data->op_fid3 = *ll_inode2fid(src_dchild->d_inode);
 	if (tgt_dchild->d_inode)
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 2fdd7df..96efdf0 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1856,6 +1856,7 @@ enum ioc_data_version_flags {
 #endif
 
 #define dot_lustre_name ".lustre"
+#define dot_fscrypt_name ".fscrypt"
 
 /********* HSM **********/
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 39/50] lnet: improve hash distribution across CPTs
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (37 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 38/50] lustre: sec: present .fscrypt in subdir mount James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 40/50] lustre: osc: osc_extent_wait() deadlock James Simmons
                   ` (10 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Change the nid-to-cpt allocation function to use
(sum-by-multiplication of nid bytes) mod (number of CPTs)
to match nid to a CPT. This patch only addresses IPV4 nids.

Make the matching change for the nid-to-cpt function
used by the 'lnetctl cpt-of-nid' utility.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14676
Lustre-commit: 9b6e27755507b9bb4 ("LU-14676 lnet: improve hash distribution across CPTs")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46233
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 7a05752..1978905 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1447,19 +1447,20 @@ struct lnet_net *
 lnet_nid4_cpt_hash(lnet_nid_t nid, unsigned int number)
 {
 	u64 key = nid;
-	unsigned int val;
-
-	LASSERT(number >= 1 && number <= LNET_CPT_NUMBER);
+	u64 pair_bits = 0x0001000100010001LLU;
+	u64 mask = pair_bits * 0xFF;
+	u64 pair_sum;
 
-	if (number == 1)
-		return 0;
+	/* Use (sum-by-multiplication of nid bytes) mod (number of CPTs)
+	 * to match nid to a CPT.
+	 */
+	pair_sum = (key & mask) + ((key >> 8) & mask);
+	pair_sum = (pair_sum * pair_bits) >> 48;
 
-	val = hash_long(key, LNET_CPT_BITS);
-	/* NB: LNET_CP_NUMBER doesn't have to be PO2 */
-	if (val < number)
-		return val;
+	CDEBUG(D_NET, "Match nid %s to cpt %u\n",
+	       libcfs_nid2str(nid), (unsigned int)(pair_sum) % number);
 
-	return (unsigned int)(key + val + (val >> 1)) % number;
+	return (unsigned int)(pair_sum) % number;
 }
 
 unsigned int
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 40/50] lustre: osc: osc_extent_wait() deadlock
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (38 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 39/50] lnet: improve hash distribution across CPTs James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 41/50] lustre: quota: delete unused quota ID James Simmons
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <andriy.skulysh@hpe.com>

Thread 1:
vvp_io_write_commit
osc_io_commit_async
osc_page_cache_add
osc_extent_find
osc_extent_wait

Thread 2:
ptlrpcd_check
ptlrpc_check_set
brw_queue_work
osc_extent_make_ready
vvp_page_make_ready_start
__lock_page

We must not hold a page lock while we do osc_extent_find()

HPE-bug-id: LUS-10414
Fixes: f8a5fb036a ("lustre: vvp: dirty pages with pagevec")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15477
Lustre-commit: 821a8d7b481d34a54 ("LU-15477 osc: osc_extent_wait() deadlock")
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46281
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 7bd28c5..072cfac 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -2487,16 +2487,17 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 		LASSERT(ergo(grants > 0, grants >= tmp));
 
 		rc = 0;
+
+		/* We must not hold a page lock while we do osc_enter_cache()
+		 * or osc_extent_find(), so we must mark dirty & unlock
+		 * any pages in the write commit pagevec.
+		 */
+		if (pagevec_count(pvec)) {
+			cb(env, io, pvec);
+			pagevec_reinit(pvec);
+		}
+
 		if (grants == 0) {
-			/* We haven't allocated grant for this page, and we
-			 * must not hold a page lock while we do enter_cache,
-			 * so we must mark dirty & unlock any pages in the
-			 * write commit pagevec.
-			 */
-			if (pagevec_count(pvec)) {
-				cb(env, io, pvec);
-				pagevec_reinit(pvec);
-			}
 			rc = osc_enter_cache(env, cli, oap, tmp);
 			if (rc == 0)
 				grants = tmp;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 41/50] lustre: quota: delete unused quota ID
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (39 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 40/50] lustre: osc: osc_extent_wait() deadlock James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 42/50] lnet: Check LNET_NID_IS_ANY in LNET_NID_NET James Simmons
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Hongchao Zhang, Lustre Development List

From: Hongchao Zhang <hongchao@whamcloud.com>

Add ability for user land to delete unused quota ID.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15218
Lustre-commit: 78be823f333968197 ("LU-15218 quota: delete unused quota ID")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45548
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                   | 1 +
 include/uapi/linux/lustre/lustre_user.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index d3df4d0..4165726 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1145,6 +1145,7 @@ int quotactl_ioctl(struct super_block *sb, struct if_quotactl *qctl)
 	case LUSTRE_Q_SETQUOTAPOOL:
 	case LUSTRE_Q_SETINFOPOOL:
 	case LUSTRE_Q_SETDEFAULT_POOL:
+	case LUSTRE_Q_DELETEQID:
 		if (!capable(CAP_SYS_ADMIN))
 			return -EPERM;
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 96efdf0..9892fc5 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -992,6 +992,7 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen)
 #define LUSTRE_Q_SETINFOPOOL	0x800012	/* set pool quota info */
 #define LUSTRE_Q_GETDEFAULT_POOL	0x800013 /* get default pool quota*/
 #define LUSTRE_Q_SETDEFAULT_POOL	0x800014 /* set default pool quota */
+#define LUSTRE_Q_DELETEQID	0x800015  /* delete quota ID */
 /* In the current Lustre implementation, the grace time is either the time
  * or the timestamp to be used after some quota ID exceeds the soft limt,
  * 48 bits should be enough, its high 16 bits can be used as quota flags.
@@ -1011,6 +1012,7 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen)
  * and high 16 bits will contain this flag (see above comment).
  */
 #define LQUOTA_FLAG_DEFAULT	0x0001
+#define LQUOTA_FLAG_DELETED	0x0002
 
 #define LUSTRE_Q_CMD_IS_POOL(cmd)		\
 	(cmd == LUSTRE_Q_GETQUOTAPOOL ||	\
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 42/50] lnet: Check LNET_NID_IS_ANY in LNET_NID_NET
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (40 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 41/50] lustre: quota: delete unused quota ID James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 43/50] lustre: llite: clear async errors on write commit sync James Simmons
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If LNET_NID_NET is passed the wildcard NID (LNET_ANY_NID) then we
should return the wildcard net (LNET_NET_ANY). This also allows NULL
to be used as an argument to LNET_NID_NET.

Fixes: 6935d7108f ("lnet: Change lnet_send() to take large-addr nids")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15478
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/46292
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/lnet-types.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index eacc401..c5fca5c 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -125,7 +125,10 @@ static inline int nid_is_nid4(const struct lnet_nid *nid)
 
 static inline __u32 LNET_NID_NET(const struct lnet_nid *nid)
 {
-	return LNET_MKNET(nid->nid_type, __be16_to_cpu(nid->nid_num));
+	if (LNET_NID_IS_ANY(nid))
+		return LNET_NET_ANY;
+	else
+		return LNET_MKNET(nid->nid_type, __be16_to_cpu(nid->nid_num));
 }
 
 static inline void lnet_nid4_to_nid(lnet_nid_t nid4, struct lnet_nid *nid)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 43/50] lustre: llite: clear async errors on write commit sync
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (41 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 42/50] lnet: Check LNET_NID_IS_ANY in LNET_NID_NET James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 44/50] lnet: lnet_peer_data_present() memory leak James Simmons
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vladimir Saveliev, Lustre Development List

From: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>

Async errors should be cleared after vvp_io_commit_sync(). Otherwise,
that will be done in ll_flush() called from
linux/fs/open.c:filp_close() and close(2) will fail. ll_flush()
replaces any error code with EIO which is confusing.

HPE-bug-id: LUS-7529
WC-bug-id: https://jira.whamcloud.com/browse/LU-15459
Lustre-commit: 73d5ee7033d0bd7dc ("LU-15459 llite: clear async errors on write commit sync")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/46178
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_io.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 40047f8..77e54ce 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -1109,6 +1109,8 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 
 	/* out of quota, try sync write */
 	if (rc == -EDQUOT && !cl_io_is_mkwrite(io)) {
+		struct ll_inode_info *lli = ll_i2info(inode);
+
 		rc = vvp_io_commit_sync(env, io, queue,
 					vio->u.readwrite.vui_from,
 					vio->u.readwrite.vui_to);
@@ -1116,6 +1118,9 @@ int vvp_io_write_commit(const struct lu_env *env, struct cl_io *io)
 			vio->u.readwrite.vui_written += rc;
 			rc = 0;
 		}
+		if (lli->lli_clob)
+			lov_read_and_clear_async_rc(lli->lli_clob);
+		lli->lli_async_rc = 0;
 	}
 
 	/* update inode size */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 44/50] lnet: lnet_peer_data_present() memory leak
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (42 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 43/50] lustre: llite: clear async errors on write commit sync James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:30 ` [lustre-devel] [PATCH 45/50] lnet: Don't use pref NI for reserved portal James Simmons
                   ` (5 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If the ping buffer has nnis <= 1 then the ref on the ping buffer does
not get dropped. This causes a memory leak.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15440
Lustre-commit: 56384a4fc39ff99c8 ("U-15440 lnet: lnet_peer_data_present() memory leak")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/46052
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index f70ceb5..16a694c 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3344,8 +3344,10 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 	 * primary NID to the correct value here. Moreover, this peer
 	 * can show up with only the loopback NID in the ping buffer.
 	 */
-	if (pbuf->pb_info.pi_nnis <= 1)
+	if (pbuf->pb_info.pi_nnis <= 1) {
+		lnet_ping_buffer_decref(pbuf);
 		goto out;
+	}
 	nid = pbuf->pb_info.pi_ni[1].ns_nid;
 	if (nid_is_lo0(&lp->lp_primary_nid)) {
 		rc = lnet_peer_set_primary_nid(lp, nid, flags);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 45/50] lnet: Don't use pref NI for reserved portal
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (43 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 44/50] lnet: lnet_peer_data_present() memory leak James Simmons
@ 2022-03-20 13:30 ` James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 46/50] lnet: o2iblnd: avoid memory copy for short msg James Simmons
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:30 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Don't use the preferred NI when sending traffic on the LNet reserved
portal. This allows local recovery pings to utilize any local NI as
source in the case where we do not have a multi-rail peer entry for
the local host. This is typically the case when MR is not being
configured statically (i.e. when discovery is being used for MR
configuration).

lnet_get_best_ni() was modified to include health values of the NIs
being compared in its debug output.

HPE-bug-id: LUS-10658
WC-bug-id: https://jira.whamcloud.com/browse/LU-15446
lustre-commit: a2815441381cb6cee ("LU-15446 lnet: Don't use pref NI for reserved portal")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/46078
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/lnet/lib-move.c | 61 +++++++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 8a90822..3ad13d0 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1516,13 +1516,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 
 		if (best_ni)
 			CDEBUG(D_NET,
-			       "compare ni %s [c:%d, d:%d, s:%d, p:%u, g:%u] with best_ni %s [c:%d, d:%d, s:%d, p:%u, g:%u]\n",
+			       "compare ni %s [c:%d, d:%d, s:%d, p:%u, g:%u, h:%d] with best_ni %s [c:%d, d:%d, s:%d, p:%u, g:%u, h:%d]\n",
 			       libcfs_nidstr(&ni->ni_nid), ni_credits, distance,
-			       ni->ni_seq, ni_sel_prio, ni_dev_prio,
+			       ni->ni_seq, ni_sel_prio, ni_dev_prio, ni_healthv,
 			       (best_ni) ? libcfs_nidstr(&best_ni->ni_nid)
 			       : "not selected", best_credits, shortest_distance,
 			       (best_ni) ? best_ni->ni_seq : 0,
-			       best_sel_prio, best_dev_prio);
+			       best_sel_prio, best_dev_prio, best_healthv);
 		else
 			goto select_ni;
 
@@ -1569,6 +1569,19 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	return best_ni;
 }
 
+static bool
+lnet_reserved_msg(struct lnet_msg *msg)
+{
+	if (msg->msg_type == LNET_MSG_PUT) {
+		if (msg->msg_hdr.msg.put.ptl_index == LNET_RESERVED_PORTAL)
+			return true;
+	} else if (msg->msg_type == LNET_MSG_GET) {
+		if (msg->msg_hdr.msg.get.ptl_index == LNET_RESERVED_PORTAL)
+			return true;
+	}
+	return false;
+}
+
 /*
  * Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery,
  * because such traffic is required to perform discovery. We therefore
@@ -1580,14 +1593,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 static bool
 lnet_msg_discovery(struct lnet_msg *msg)
 {
-	if (msg->msg_type == LNET_MSG_PUT) {
-		if (msg->msg_hdr.msg.put.ptl_index != LNET_RESERVED_PORTAL)
-			return true;
-	} else if (msg->msg_type == LNET_MSG_GET) {
-		if (msg->msg_hdr.msg.get.ptl_index != LNET_RESERVED_PORTAL)
-			return true;
-	}
-	return false;
+	return !(lnet_reserved_msg(msg) || lnet_msg_is_response(msg));
 }
 
 #define SRC_SPEC	0x0001
@@ -2334,7 +2340,6 @@ struct lnet_ni *
 lnet_select_preferred_best_ni(struct lnet_send_data *sd)
 {
 	struct lnet_ni *best_ni = NULL;
-	struct lnet_peer_ni *best_lpni = sd->sd_best_lpni;
 
 	/* We must use a consistent source address when sending to a
 	 * non-MR peer. However, a non-MR peer can have multiple NIDs
@@ -2344,25 +2349,27 @@ struct lnet_ni *
 	 *
 	 * So we need to pick the NI the peer prefers for this
 	 * particular network.
+	 *
+	 * An exception is traffic on LNET_RESERVED_PORTAL. Internal LNet
+	 * traffic doesn't care which source NI is used, and we don't actually
+	 * want to restrict local recovery pings to a single source NI.
 	 */
+	if (!lnet_reserved_msg(sd->sd_msg))
+		best_ni = lnet_find_existing_preferred_best_ni(sd->sd_best_lpni,
+							       sd->sd_cpt);
 
-	best_ni = lnet_find_existing_preferred_best_ni(sd->sd_best_lpni,
-						       sd->sd_cpt);
+	if (!best_ni)
+		best_ni = lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
+							sd->sd_best_lpni->lpni_peer_net,
+							sd->sd_msg,
+							sd->sd_md_cpt);
 
-	/* if best_ni is still not set just pick one */
+	/* If there is no best_ni we don't have a route */
 	if (!best_ni) {
-		best_ni =
-			lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
-						      sd->sd_best_lpni->lpni_peer_net,
-						      sd->sd_msg,
-						      sd->sd_md_cpt);
-		/* If there is no best_ni we don't have a route */
-		if (!best_ni) {
-			CERROR("no path to %s from net %s\n",
-			       libcfs_nidstr(&best_lpni->lpni_nid),
-			       libcfs_net2str(best_lpni->lpni_net->net_id));
-			return -EHOSTUNREACH;
-		}
+		CERROR("no path to %s from net %s\n",
+		       libcfs_nidstr(&sd->sd_best_lpni->lpni_nid),
+		       libcfs_net2str(sd->sd_best_lpni->lpni_net->net_id));
+		return -EHOSTUNREACH;
 	}
 
 	sd->sd_best_ni = best_ni;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 46/50] lnet: o2iblnd: avoid memory copy for short msg
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (44 preceding siblings ...)
  2022-03-20 13:30 ` [lustre-devel] [PATCH 45/50] lnet: Don't use pref NI for reserved portal James Simmons
@ 2022-03-20 13:31 ` James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 47/50] lustre: llite: set default LMV hash type with 2.12 MDS James Simmons
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

Modern cards allow to send a kernel memory data without mapping
or copy to the preallocated buffer.
It reduce a lnet selftest cpu consumption by 3% for messages
less than 4k size.

HPE-bug-id: LUS-1796
WC-bug-id: https://jira.whamcloud.com/browse/LU-14008
Lustre-commit: bebd87cc6c9acc577 ("LU-14008 o2iblnd: avoid memory copy for short msg")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40262
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  3 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  3 ++
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 63 ++++++++++++++++++++++++++++---------
 3 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 9ce6082..8dce4179 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -628,10 +628,9 @@ static unsigned int kiblnd_send_wrs(struct kib_conn *conn)
 	 */
 	int ret;
 	int multiplier = 1 + conn->ibc_max_frags;
-	enum kib_dev_caps dev_caps = conn->ibc_hdev->ibh_dev->ibd_dev_caps;
 
 	/* FastReg needs two extra WRs for map and invalidate */
-	if (dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED)
+	if (IS_FAST_REG_DEV(conn->ibc_hdev->ibh_dev))
 		multiplier += 2;
 
 	/* account for a maximum of ibc_queue_depth in-flight transfers */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 5a4b4f8..e798695 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -149,6 +149,9 @@ enum kib_dev_caps {
 	IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT	= BIT(1),
 };
 
+#define IS_FAST_REG_DEV(dev) \
+	((dev)->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED)
+
 struct kib_dev {
 	struct list_head	ibd_list;	/* chain on kib_devs */
 	struct list_head	ibd_fail_list;	/* chain on kib_failed_devs */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 983599f..a88939e7 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -42,8 +42,11 @@
 static void kiblnd_peer_alive(struct kib_peer_ni *peer_ni);
 static void kiblnd_peer_connect_failed(struct kib_peer_ni *peer_ni, int active,
 				       int error);
-static void kiblnd_init_tx_msg(struct lnet_ni *ni, struct kib_tx *tx,
-			       int type, int body_nob);
+static struct ib_rdma_wr *
+kiblnd_init_tx_msg_payload(struct lnet_ni *ni, struct kib_tx *tx,
+			   int type, int body_nob, int payload_nob);
+#define kiblnd_init_tx_msg(ni, tx, type, body) \
+	kiblnd_init_tx_msg_payload(ni, tx, type, body, 0)
 static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 			    int resid, struct kib_rdma_desc *dstrd,
 			    u64 dstcookie);
@@ -572,7 +575,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 	 * in trying to map the memory, because it'll just fail. So
 	 * preemptively fail with an appropriate message
 	 */
-	if ((dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_ENABLED) &&
+	if (IS_FAST_REG_DEV(dev) &&
 	    !(dev->ibd_dev_caps & IBLND_DEV_CAPS_FASTREG_GAPS_SUPPORT) &&
 	    tx->tx_gaps) {
 		CERROR("Using FastReg with no GAPS support, but tx has gaps\n");
@@ -1021,9 +1024,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	tx->tx_nsge++;
 }
 
-static void
-kiblnd_init_tx_msg(struct lnet_ni *ni, struct kib_tx *tx, int type,
-		   int body_nob)
+static struct ib_rdma_wr *
+kiblnd_init_tx_msg_payload(struct lnet_ni *ni, struct kib_tx *tx, int type,
+			   int body_nob, int payload)
 {
 	struct ib_rdma_wr *wrq = &tx->tx_wrq[tx->tx_nwrq];
 	int nob = offsetof(struct kib_msg, ibm_u) + body_nob;
@@ -1032,7 +1035,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	LASSERT(tx->tx_nwrq < IBLND_MAX_RDMA_FRAGS + 1);
 	LASSERT(nob <= IBLND_MSG_SIZE);
 
-	kiblnd_init_msg(tx->tx_msg, type, body_nob);
+	kiblnd_init_msg(tx->tx_msg, type, body_nob + payload);
 
 	*wrq = (struct ib_rdma_wr) {
 		.wr = {
@@ -1047,6 +1050,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	kiblnd_init_tx_sge(tx, tx->tx_msgaddr, nob);
 
 	tx->tx_nwrq++;
+	return wrq;
 }
 
 static int
@@ -1654,15 +1658,44 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	ibmsg = tx->tx_msg;
 	lnet_hdr_to_nid4(hdr, &ibmsg->ibm_u.immediate.ibim_hdr);
 
-	rc = copy_from_iter(&ibmsg->ibm_u.immediate.ibim_payload, payload_nob,
-			    &from);
-	if (rc != payload_nob) {
-		kiblnd_pool_free_node(&tx->tx_pool->tpo_pool, &tx->tx_list);
-		return -EFAULT;
-	}
+	if (payload_nob) {
+		struct ib_rdma_wr *wrq;
+		int i;
+
+		nob = offsetof(struct kib_immediate_msg, ibim_payload[0]);
+		wrq = kiblnd_init_tx_msg_payload(ni, tx, IBLND_MSG_IMMEDIATE,
+						 nob, payload_nob);
+
+		rd = tx->tx_rd;
+		rc = kiblnd_setup_rd_kiov(ni, tx, rd,
+					  payload_niov, payload_kiov,
+					  payload_offset, payload_nob);
+		if (rc != 0) {
+			CERROR("Can't setup IMMEDIATE src for %s: %d\n",
+			       libcfs_nidstr(&target->nid), rc);
+			kiblnd_tx_done(tx);
+			return -EIO;
+		}
 
-	nob = offsetof(struct kib_immediate_msg, ibim_payload[payload_nob]);
-	kiblnd_init_tx_msg(ni, tx, IBLND_MSG_IMMEDIATE, nob);
+		/* lets generate a SGE chain */
+		for (i = 0; i < rd->rd_nfrags; i++) {
+			kiblnd_init_tx_sge(tx, rd->rd_frags[i].rf_addr,
+					   rd->rd_frags[i].rf_nob);
+			wrq->wr.num_sge++;
+		}
+	} else {
+		rc = copy_from_iter(&ibmsg->ibm_u.immediate.ibim_payload,
+				    payload_nob, &from);
+		if (rc != payload_nob) {
+			kiblnd_pool_free_node(&tx->tx_pool->tpo_pool,
+					      &tx->tx_list);
+			return -EFAULT;
+		}
+
+		nob = offsetof(struct kib_immediate_msg,
+			       ibim_payload[payload_nob]);
+		kiblnd_init_tx_msg(ni, tx, IBLND_MSG_IMMEDIATE, nob);
+	}
 
 	/* finalise lntmsg on completion */
 	tx->tx_lntmsg[0] = lntmsg;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 47/50] lustre: llite: set default LMV hash type with 2.12 MDS
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (45 preceding siblings ...)
  2022-03-20 13:31 ` [lustre-devel] [PATCH 46/50] lnet: o2iblnd: avoid memory copy for short msg James Simmons
@ 2022-03-20 13:31 ` James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 48/50] lnet: Stop discovery on deleted peer NI James Simmons
                   ` (2 subsequent siblings)
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

If default LMV hash type is CRUSH, or unset, it should be converted
to fnv_16_64, because 2.12 MDS doesn't understand this.

Fix LMV_HASH_FLAG_KNOWN to match actual known flags.

Fixes: d956d88c463f ("lustre: dne: introduce new directory hash type: "crush")
Fixes: 7d33e94b9575 ("lustre: lmv: change default hash type to crush")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15502
Lustre-commit: 51c6d596d4f3e82de ("LU-15502 llite: set default LMV hash type with 2.12 MDS
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46378
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                   | 28 +++++++++++++++++++++++-----
 include/uapi/linux/lustre/lustre_user.h |  2 +-
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 4165726..cfd8184 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -469,7 +469,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 		enum lmv_hash_type type = lump->lum_hash_type &
 					  LMV_HASH_TYPE_MASK;
 
-		if (type == LMV_HASH_TYPE_CRUSH ||
+		if (type >= LMV_HASH_TYPE_CRUSH ||
 		    type == LMV_HASH_TYPE_UNKNOWN)
 			lump->lum_hash_type = (lump->lum_hash_type ^ type) |
 					      LMV_HASH_TYPE_FNV_1A_64;
@@ -590,11 +590,29 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 		case LOV_USER_MAGIC_COMP_V1:
 			lum_size = ((struct lov_comp_md_v1 *)lump)->lcm_size;
 			break;
-		case LMV_USER_MAGIC:
-			if (lump->lmm_magic != cpu_to_le32(LMV_USER_MAGIC))
-				lustre_swab_lmv_user_md((struct lmv_user_md *)lump);
-			lum_size = sizeof(struct lmv_user_md);
+		case LMV_USER_MAGIC: {
+			struct lmv_user_md *lmv = (struct lmv_user_md *)lump;
+
+			/* MDS < 2.14 doesn't support 'crush' hash type, and
+			 * cannot handle unknown hash if client doesn't set a
+			 * valid one. switch to fnv_1a_64.
+			 */
+			if (!(exp_connect_flags2(sbi->ll_md_exp) &
+			      OBD_CONNECT2_CRUSH)) {
+				enum lmv_hash_type type = lmv->lum_hash_type &
+							  LMV_HASH_TYPE_MASK;
+
+				if (type >= LMV_HASH_TYPE_CRUSH ||
+				    type == LMV_HASH_TYPE_UNKNOWN)
+					lmv->lum_hash_type =
+						(lmv->lum_hash_type ^ type) |
+						LMV_HASH_TYPE_FNV_1A_64;
+			}
+			if (lmv->lum_magic != cpu_to_le32(LMV_USER_MAGIC))
+				lustre_swab_lmv_user_md(lmv);
+			lum_size = sizeof(*lmv);
 			break;
+		}
 		case LOV_USER_MAGIC_SPECIFIC: {
 			struct lov_user_md_v3 *v3 =
 				(struct lov_user_md_v3 *)lump;
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 9892fc5..3017148 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -741,7 +741,7 @@ static inline bool lmv_is_known_hash_type(__u32 type)
 #define LMV_HASH_FLAG_LAYOUT_CHANGE	\
 	(LMV_HASH_FLAG_MIGRATION | LMV_HASH_FLAG_SPLIT | LMV_HASH_FLAG_MERGE)
 
-#define LMV_HASH_FLAG_KNOWN		0xfe000000
+#define LMV_HASH_FLAG_KNOWN		0xbe000000
 
 /* both SPLIT and MIGRATION are set for directory split */
 static inline bool lmv_hash_is_splitting(__u32 hash)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 48/50] lnet: Stop discovery on deleted peer NI
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (46 preceding siblings ...)
  2022-03-20 13:31 ` [lustre-devel] [PATCH 47/50] lustre: llite: set default LMV hash type with 2.12 MDS James Simmons
@ 2022-03-20 13:31 ` James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 49/50] lustre: sec: fix DIO for encrypted files James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 50/50] lustre: ptlrpc: Use after free of 'conn' in rhashtable retry James Simmons
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

lnet_discover_peer_locked() needs to check whether the peer NI that is
undergoing discovery has been deleted (i.e. its assocaited peer has
LNET_PEER_MARK_DELETED state). Otherwise, we may enter an infinite
loop because this peer will never be considered up to date.

Fixes: 4f69acf8aa ("lnet: Prevent discovery on deleted peer")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15512
Lustre-commit: 94f4e1f517d71ffd6 ("LU-15512 lnet: Stop discovery on deleted peer NI")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/46429
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 16a694c..98f71dd 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2578,6 +2578,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 			break;
 		if (lnet_peer_is_uptodate(lp))
 			break;
+		if (lp->lp_state & LNET_PEER_MARK_DELETED)
+			break;
 		lnet_peer_queue_for_discovery(lp);
 		count++;
 		CDEBUG(D_NET, "Discovery attempt # %d\n", count);
@@ -2620,7 +2622,9 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 		rc = lp->lp_dc_error;
 	else if (!block)
 		CDEBUG(D_NET, "non-blocking discovery\n");
-	else if (!lnet_peer_is_uptodate(lp) && !lnet_is_discovery_disabled(lp))
+	else if (!lnet_peer_is_uptodate(lp) &&
+		 !(lnet_is_discovery_disabled(lp) ||
+		   (lp->lp_state & LNET_PEER_MARK_DELETED)))
 		goto again;
 
 	CDEBUG(D_NET, "peer %s NID %s: %d. %s\n",
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 49/50] lustre: sec: fix DIO for encrypted files
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (47 preceding siblings ...)
  2022-03-20 13:31 ` [lustre-devel] [PATCH 48/50] lnet: Stop discovery on deleted peer NI James Simmons
@ 2022-03-20 13:31 ` James Simmons
  2022-03-20 13:31 ` [lustre-devel] [PATCH 50/50] lustre: ptlrpc: Use after free of 'conn' in rhashtable retry James Simmons
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

With Direct IO, we do not have proper page cache pages. So we need to
retrieve by ourselves the page mapping and the page index of the page
to be encrypted/decrypted.

For the index, we need to use the offset of the page within the file,
and not the object.
So we rename cl_page's cp_osc_index to cp_page_index for that purpose.
cp_osc_index is redundant with osc_async_page's oap_obj_off and only
used by osc_index(), so we also adapt this function.
cp_page_index is initialized in cl_page_alloc(), and accessed in
the OSC layer where the llcrypt primitives are called.

For the mapping, problem is page->mapping is not set to NULL on page
allocation, so it cannot safely be used to see if a page is a direct
I/O page.
Use cl_page for direct I/O and page->mapping for buffered
I/O.  (clpage->cp_inode is only set for direct I/O and
cannot easily be always set.)
Without this, we sometimes get panics when page2inode is
used in the OSC layer.  (Note the remaining use in dom is
safe because ll_dom_readpage is a page cache helper and
will never see DIO pages.)

Fixes: 2de53869ee ("lustre: sec: get rid of bad rss-counter state messages")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15608
Lustre-commit: 966ca46e4aa2eb39c ("LU-15608 sec: fix DIO for encrypted files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46664
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h  |  3 +-
 fs/lustre/include/lustre_osc.h |  2 +-
 fs/lustre/include/obd.h        |  3 ++
 fs/lustre/llite/file.c         |  3 ++
 fs/lustre/obdclass/cl_page.c   |  9 +++-
 fs/lustre/osc/osc_page.c       |  1 -
 fs/lustre/osc/osc_request.c    | 97 +++++++++++++++++++-----------------------
 7 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index af708cc..ab7f0f2 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -735,7 +735,8 @@ struct cl_page {
 	refcount_t			 cp_ref;
 	/** layout_entry + stripe index, composed using lov_comp_index() */
 	unsigned int			 cp_lov_index;
-	pgoff_t				 cp_osc_index;
+	/** page->index of the page within the whole file */
+	pgoff_t				 cp_page_index;
 	/** An object this page is a part of. Immutable after creation. */
 	struct cl_object		*cp_obj;
 	/** vmpage */
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 4c5eb1f..7551390 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -855,7 +855,7 @@ static inline struct osc_page *oap2osc(struct osc_async_page *oap)
 
 static inline pgoff_t osc_index(struct osc_page *opg)
 {
-	return opg->ops_cl.cpl_page->cp_osc_index;
+	return opg->ops_oap.oap_obj_off >> PAGE_SHIFT;
 }
 
 static inline struct cl_page *oap2cl_page(struct osc_async_page *oap)
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index ecee321..c5e2a24 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -1213,6 +1213,9 @@ static inline void client_adjust_max_dirty(struct client_obd *cli)
 					   1 << (20 - PAGE_SHIFT));
 }
 
+/* Must be used for page cache pages only,
+ * not safe otherwise (e.g. direct IO pages)
+ */
 static inline struct inode *page2inode(struct page *page)
 {
 	if (page->mapping) {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 373efdd..4855156 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -440,6 +440,9 @@ int ll_file_release(struct inode *inode, struct file *file)
 
 static inline int ll_dom_readpage(void *data, struct page *page)
 {
+	/* since ll_dom_readpage is a page cache helper, it is safe to assume
+	 * mapping and host pointers are set here
+	 */
 	struct inode *inode = page2inode(page);
 	struct niobuf_local *lnb = data;
 	void *kaddr;
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 4bfa1c5..9326743 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -235,9 +235,16 @@ struct cl_page *cl_page_alloc(const struct lu_env *env, struct cl_object *o,
 		cl_page->cp_vmpage = vmpage;
 		cl_page->cp_state = CPS_CACHED;
 		cl_page->cp_type = type;
-		cl_page->cp_inode = NULL;
+		if (type == CPT_TRANSIENT)
+			/* ref to correct inode will be added
+			 * in ll_direct_rw_pages
+			 */
+			cl_page->cp_inode = NULL;
+		else
+			cl_page->cp_inode = page2inode(vmpage);
 		INIT_LIST_HEAD(&cl_page->cp_batch);
 		lu_ref_init(&cl_page->cp_reference);
+		cl_page->cp_page_index = ind;
 		cl_object_for_each(o2, o) {
 			if (o2->co_ops->coo_page_init) {
 				result = o2->co_ops->coo_page_init(env, o2,
diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c
index cba5d02..f46b4e7 100644
--- a/fs/lustre/osc/osc_page.c
+++ b/fs/lustre/osc/osc_page.c
@@ -266,7 +266,6 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj,
 
 	opg->ops_srvlock = osc_io_srvlock(oio);
 	cl_page_slice_add(cl_page, &opg->ops_cl, obj, &osc_page_ops);
-	cl_page->cp_osc_index = index;
 
 	/* reserve an LRU space for this cl_page */
 	if (cl_page->cp_type == CPT_CACHEABLE) {
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 43cd6c5..124d3c57 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1417,21 +1417,13 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	struct inode *inode = NULL;
 	bool directio = false;
 	bool enable_checksum = true;
+	struct cl_page *clpage;
 
 	if (pga[0]->pg) {
-		inode = page2inode(pga[0]->pg);
-		if (!inode) {
-			/* Try to get reference to inode from cl_page if we are
-			 * dealing with direct IO, as handled pages are not
-			 * actual page cache pages.
-			 */
-			struct osc_async_page *oap = brw_page2oap(pga[0]);
-			struct cl_page *clpage = oap2cl_page(oap);
-
-			inode = clpage->cp_inode;
-			if (inode)
-				directio = true;
-		}
+		clpage = oap2cl_page(brw_page2oap(pga[0]));
+		inode = clpage->cp_inode;
+		if (clpage->cp_type == CPT_TRANSIENT)
+			directio = true;
 	}
 	if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ))
 		return -ENOMEM; /* Recoverable */
@@ -1453,11 +1445,11 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode) &&
 	    fscrypt_has_encryption_key(inode)) {
 		for (i = 0; i < page_count; i++) {
-			struct brw_page *pg = pga[i];
+			struct brw_page *brwpg = pga[i];
 			struct page *data_page = NULL;
 			bool retried = false;
 			bool lockedbymyself;
-			u32 nunits = (pg->off & ~PAGE_MASK) + pg->count;
+			u32 nunits = (brwpg->off & ~PAGE_MASK) + brwpg->count;
 			struct address_space *map_orig = NULL;
 			pgoff_t index_orig;
 
@@ -1472,23 +1464,24 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			 * end in vvp_page_completion_write/cl_page_completion,
 			 * which means only once the page is fully processed.
 			 */
-			lockedbymyself = trylock_page(pg->pg);
+			lockedbymyself = trylock_page(brwpg->pg);
 			if (directio) {
-				map_orig = pg->pg->mapping;
-				pg->pg->mapping = inode->i_mapping;
-				index_orig = pg->pg->index;
-				pg->pg->index = pg->off >> PAGE_SHIFT;
+				map_orig = brwpg->pg->mapping;
+				brwpg->pg->mapping = inode->i_mapping;
+				index_orig = brwpg->pg->index;
+				clpage = oap2cl_page(brw_page2oap(brwpg));
+				brwpg->pg->index = clpage->cp_page_index;
 			}
 			data_page =
-				fscrypt_encrypt_pagecache_blocks(pg->pg,
+				fscrypt_encrypt_pagecache_blocks(brwpg->pg,
 								 nunits, 0,
 								 GFP_NOFS);
 			if (directio) {
-				pg->pg->mapping = map_orig;
-				pg->pg->index = index_orig;
+				brwpg->pg->mapping = map_orig;
+				brwpg->pg->index = index_orig;
 			}
 			if (lockedbymyself)
-				unlock_page(pg->pg);
+				unlock_page(brwpg->pg);
 			if (IS_ERR(data_page)) {
 				rc = PTR_ERR(data_page);
 				if (rc == -ENOMEM && !retried) {
@@ -1503,10 +1496,11 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			 * disambiguation in osc_release_bounce_pages().
 			 */
 			SetPageChecked(data_page);
-			pg->pg = data_page;
+			brwpg->pg = data_page;
 			/* there should be no gap in the middle of page array */
 			if (i == page_count - 1) {
-				struct osc_async_page *oap = brw_page2oap(pg);
+				struct osc_async_page *oap =
+					brw_page2oap(brwpg);
 
 				oa->o_size = oap->oap_count +
 					     oap->oap_obj_off +
@@ -1515,10 +1509,10 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 			/* len is forced to nunits, and relative offset to 0
 			 * so store the old, clear text info
 			 */
-			pg->bp_count_diff = nunits - pg->count;
-			pg->count = nunits;
-			pg->bp_off_diff = pg->off & ~PAGE_MASK;
-			pg->off = pg->off & PAGE_MASK;
+			brwpg->bp_count_diff = nunits - brwpg->count;
+			brwpg->count = nunits;
+			brwpg->bp_off_diff = brwpg->off & ~PAGE_MASK;
+			brwpg->off = brwpg->off & PAGE_MASK;
 		}
 	} else if (opc == OST_WRITE && inode && IS_ENCRYPTED(inode)) {
 		struct osc_async_page *oap = brw_page2oap(pga[0]);
@@ -1991,8 +1985,9 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 	const char *obd_name = cli->cl_import->imp_obd->obd_name;
 	struct ost_body *body;
 	u32 client_cksum = 0;
-	struct inode *inode;
+	struct inode *inode = NULL;
 	unsigned int blockbits = 0, blocksize = 0;
+	struct cl_page *clpage;
 
 	if (rc < 0 && rc != -EDQUOT) {
 		DEBUG_REQ(D_INFO, req, "Failed request: rc = %d", rc);
@@ -2186,19 +2181,12 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 		rc = 0;
 	}
 
-	inode = page2inode(aa->aa_ppga[0]->pg);
-	if (!inode) {
-		/* Try to get reference to inode from cl_page if we are
-		 * dealing with direct IO, as handled pages are not
-		 * actual page cache pages.
-		 */
-		struct osc_async_page *oap = brw_page2oap(aa->aa_ppga[0]);
-
-		inode = oap2cl_page(oap)->cp_inode;
-		if (inode) {
-			blockbits = inode->i_blkbits;
-			blocksize = 1 << blockbits;
-		}
+	/* get the inode from the first cl_page */
+	clpage = oap2cl_page(brw_page2oap(aa->aa_ppga[0]));
+	inode = clpage->cp_inode;
+	if (clpage->cp_type == CPT_TRANSIENT && inode) {
+		blockbits = inode->i_blkbits;
+		blocksize = 1 << blockbits;
 	}
 	if (inode && IS_ENCRYPTED(inode)) {
 		int idx;
@@ -2208,19 +2196,19 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 			goto out;
 		}
 		for (idx = 0; idx < aa->aa_page_count; idx++) {
-			struct brw_page *pg = aa->aa_ppga[idx];
+			struct brw_page *brwpg = aa->aa_ppga[idx];
 			unsigned int offs = 0;
 
 			while (offs < PAGE_SIZE) {
 				/* do not decrypt if page is all 0s */
-				if (memchr_inv(page_address(pg->pg) + offs, 0,
-					       LUSTRE_ENCRYPTION_UNIT_SIZE) == NULL) {
+				if (!memchr_inv(page_address(brwpg->pg) + offs, 0,
+					        LUSTRE_ENCRYPTION_UNIT_SIZE)) {
 					/* if page is empty forward info to
 					 * upper layers (ll_io_zero_page) by
 					 * clearing PagePrivate2
 					 */
 					if (!offs)
-						ClearPagePrivate2(pg->pg);
+						ClearPagePrivate2(brwpg->pg);
 					break;
 				}
 
@@ -2230,24 +2218,25 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 					 * input parameter. Page does not need
 					 * to be locked.
 					 */
-					u64 lblk_num =
-						((u64)(pg->off >> PAGE_SHIFT) <<
-						      (PAGE_SHIFT - blockbits)) +
-						      (offs >> blockbits);
+					u64 lblk_num;
 					unsigned int i;
 
+					clpage = oap2cl_page(brw_page2oap(brwpg));
+					lblk_num = ((u64)(brwpg->off >> PAGE_SHIFT) <<
+							 (PAGE_SHIFT - blockbits)) +
+							 (offs >> blockbits);
 					for (i = offs;
 					     i < offs + LUSTRE_ENCRYPTION_UNIT_SIZE;
 					     i += blocksize, lblk_num++) {
 						rc = fscrypt_decrypt_block_inplace(inode,
-										   pg->pg,
+										   brwpg->pg,
 										   blocksize, i,
 										   lblk_num);
 						if (rc)
 							break;
 					}
 				} else {
-					rc = fscrypt_decrypt_pagecache_blocks(pg->pg,
+					rc = fscrypt_decrypt_pagecache_blocks(brwpg->pg,
 									      LUSTRE_ENCRYPTION_UNIT_SIZE,
 									      offs);
 				}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [lustre-devel] [PATCH 50/50] lustre: ptlrpc: Use after free of 'conn' in rhashtable retry
  2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
                   ` (48 preceding siblings ...)
  2022-03-20 13:31 ` [lustre-devel] [PATCH 49/50] lustre: sec: fix DIO for encrypted files James Simmons
@ 2022-03-20 13:31 ` James Simmons
  49 siblings, 0 replies; 51+ messages in thread
From: James Simmons @ 2022-03-20 13:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

Use after free of 'conn' in the uncommon case of
rhashtable_lookup_get_insert_fast failing with -EBUSY or -ENOMEM

Move kfree(conn) below the retry and set conn2 to NULL
on error, propagating to conn and returning to the caller.

HPE-bug-id: LUS-10776
Fixes: ac2370ac2b ("staging: lustre: ptlrpc: convert conn_hash to rhashtable");
WC-bug-id: https://jira.whamcloud.com/browse/LU-15634
Lustre-commit: 9dcbf8b3d44f9bb2b ("LU-15634 ptlrpc: Use after free of 'conn' in rhashtable retry")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/46763
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/connection.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ptlrpc/connection.c b/fs/lustre/ptlrpc/connection.c
index 8dbaea40..d1f53c6 100644
--- a/fs/lustre/ptlrpc/connection.c
+++ b/fs/lustre/ptlrpc/connection.c
@@ -119,10 +119,12 @@ struct ptlrpc_connection *
 				msleep(20);
 				goto try_again;
 			}
-			return NULL;
+			conn2 = NULL;
 		}
+		kfree(conn);
 		conn = conn2;
-		ptlrpc_connection_addref(conn);
+		if (conn)
+			ptlrpc_connection_addref(conn);
 	}
 out:
 	CDEBUG(D_INFO, "conn=%p refcount %d to %s\n",
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-03-20 13:34 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-20 13:30 [lustre-devel] [PATCH 00/50] lustre: update to OpenSFS tree as of March 20, 2022 James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 01/50] lustre: type cleanups and remove debug statements James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 02/50] lustre: osc: Fix grant test for ARM James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 03/50] lnet: extend nids in struct lnet_msg James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 04/50] lnet: Change lnet_send() to take large-addr nids James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 05/50] lnet: use large nids in struct lnet_event James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 06/50] lnet: socklnd: prepare for new KSOCK_MSG type James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 07/50] lnet: socklnd: don't deref lnet_hdr in LNDs James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 08/50] lustre: sec: make client encryption compatible with ext4 James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 09/50] lustre: sec: allow subdir mount of encrypted dir James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 10/50] lustre: fld: repeat rpc in fld_client_rpc after EAGAIN James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 11/50] lustre: fld: don't obtain a slot for fld request James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 12/50] lustre: update version to 2.14.57 James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 13/50] lustre: llite: deadlock in ll_new_node() James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 14/50] lnet: o2iblnd: avoid static allocation for msg tx James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 15/50] lnet: separate lnet_hdr in msg from that in lnd James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 16/50] lnet: change lnet_hdr to store large nids James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 17/50] lnet: change lnet_prep_send to take net_processid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 18/50] lnet: convert to struct lnet_process_id in lib-move James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 19/50] lnet: convert LNetGetID to return an large-addr pid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 20/50] lnet: alter lnd_notify_peer_down() to take lnet_nid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 21/50] lnet: socklnd: move lnet_hdr unpack into ->pro_unpack James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 22/50] lnet: socklnd: Change ksock_hello_msg to struct lnet_nid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 23/50] lnet: socklnd: add hello message version 4 James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 24/50] lnet: Convert ping to support 16-bytes address James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 25/50] lnet: convert nids in lnet_parse to lnet_nid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 26/50] lnet: change src_nid arg to lnet_parse() to 16byte James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 27/50] lnet: Fix NULL-deref in lnet_nidstr_r() James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 28/50] lnet: change lnet_del_route() to take lnet_nid James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 29/50] lustre: llite: Move free user pages James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 30/50] lustre: llite: Do not get/put DIO pages James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 31/50] lustre: llite: Remove unnecessary page get/put James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 32/50] lustre: llite: LL_IOC_LMV_GETSTRIPE 'default' shows inherit layout James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 33/50] lustre: hsm: update size upon completion of data version James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 34/50] lustre: llite: Delay dput in ll_dirty_page_discard_warn James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 35/50] lnet: libcfs: Use FAIL_CHECK_QUIET for fake i/o James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 36/50] lnet: Avoid peer NI recovery for local interface James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 37/50] lustre: osc: add OBD_IOC_GETATTR support for osc James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 38/50] lustre: sec: present .fscrypt in subdir mount James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 39/50] lnet: improve hash distribution across CPTs James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 40/50] lustre: osc: osc_extent_wait() deadlock James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 41/50] lustre: quota: delete unused quota ID James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 42/50] lnet: Check LNET_NID_IS_ANY in LNET_NID_NET James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 43/50] lustre: llite: clear async errors on write commit sync James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 44/50] lnet: lnet_peer_data_present() memory leak James Simmons
2022-03-20 13:30 ` [lustre-devel] [PATCH 45/50] lnet: Don't use pref NI for reserved portal James Simmons
2022-03-20 13:31 ` [lustre-devel] [PATCH 46/50] lnet: o2iblnd: avoid memory copy for short msg James Simmons
2022-03-20 13:31 ` [lustre-devel] [PATCH 47/50] lustre: llite: set default LMV hash type with 2.12 MDS James Simmons
2022-03-20 13:31 ` [lustre-devel] [PATCH 48/50] lnet: Stop discovery on deleted peer NI James Simmons
2022-03-20 13:31 ` [lustre-devel] [PATCH 49/50] lustre: sec: fix DIO for encrypted files James Simmons
2022-03-20 13:31 ` [lustre-devel] [PATCH 50/50] lustre: ptlrpc: Use after free of 'conn' in rhashtable retry James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).