lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023
@ 2023-01-23 23:00 James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better James Simmons
                   ` (41 more replies)
  0 siblings, 42 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Merge the latest work from OpenSFS tree into the native Linux
client.

Alexander Boyko (1):
  lustre: ptlrpc: don't panic during reconnection

Alexander Zarochentsev (1):
  lnet: libcfs: cfs_hash_for_each_empty optimization

Alexey Lyashkov (1):
  lnet: selftest: lst read-outside of allocation

Andreas Dilger (4):
  lustre: misc: fix stats snapshot_time to use wallclock
  lustre: misc: rename lprocfs_stats functions
  lustre: ptlrpc: NUL terminate long jobid strings
  lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK

Aurelien Degremont (2):
  lustre: llite: remove false outdated comment
  lnet: socklnd: clarify error message on timeout

Bobi Jam (1):
  lustre: llite: revert: "llite: clear stale page's uptodate bit"

Chris Horn (1):
  lnet: Drop LNet message if deadline exceeded

Cyril Bordage (2):
  lnet: handles unregister/register events
  lnet: increase transaction timeout

Etienne AUJAMES (2):
  lustre: pools: force creation of a component without a pool
  lustre: llite: replace selinux_is_enabled()

Frank Sehr (1):
  lnet: Allow IP specification

Gian-Carlo DeFazio (1):
  lnet: asym route inconsistency warning

James Simmons (5):
  lnet: change lnet_find_best_lpni to handle large NIDs
  lnet: selftest: migrate LNet selftest group handling to Netlink
  lnet: use Netlink to support LNet ping commands
  lnet: validate data sent from user land properly
  lnet: modify lnet_inetdev to work with large NIDS

Lai Siyao (3):
  lustre: llite: wake_up after cl_object_kill
  lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h
  lustre: llite: always enable remote subdir mount

Lei Feng (2):
  lustre: ldebugfs: add histogram to stats counter
  lustre: ldebugfs: make job_stats and rename_stats valid YAML

Li Dongyang (2):
  lustre: obdclass: fix T10PI prototypes
  lustre: obdclass: prefer T10 checksum if the target supports it

Mr NeilBrown (2):
  lnet: lnet_peer_merge_data to understand large addr
  lnet: router_discover - handle large addrs in ping

Oleg Drokin (1):
  lustre: update version to 2.15.53

Patrick Farrell (2):
  lustre: osc: pack osc_async_page better
  lustre: osc: Fix possible null pointer

Qian Yingjin (2):
  lustre: pcc: use two bits to indicate pcc type for attach
  lustre: llite: update statx size/ctime for fallocate

Sebastien Buisson (2):
  lustre: sec: reserve flag for fid2path for encrypted files
  lustre: enc: S_ENCRYPTED flag on OST objects for enc files

Serguei Smirnov (1):
  lnet: o2iblnd: reset hiw proportionally

Shaun Tancheff (3):
  lustre: ptlrpc: fiemap flexible array
  lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck
  lustre: move to kobj_type default_groups

 fs/lustre/include/cl_object.h           |  15 +-
 fs/lustre/include/lprocfs_status.h      |  15 +-
 fs/lustre/include/lustre_dlm_flags.h    |   6 -
 fs/lustre/include/lustre_osc.h          |  41 +-
 fs/lustre/include/obd.h                 |   3 +-
 fs/lustre/include/obd_cksum.h           |  15 +-
 fs/lustre/include/obd_class.h           |  43 ++
 fs/lustre/include/obd_support.h         |   1 +
 fs/lustre/ldlm/ldlm_lib.c               |   1 +
 fs/lustre/ldlm/ldlm_pool.c              |   8 +-
 fs/lustre/ldlm/ldlm_resource.c          |   8 +-
 fs/lustre/llite/dir.c                   |  22 +-
 fs/lustre/llite/file.c                  |  10 +-
 fs/lustre/llite/lcommon_cl.c            |   5 +
 fs/lustre/llite/llite_internal.h        |  46 +-
 fs/lustre/llite/llite_lib.c             |  16 +
 fs/lustre/llite/lproc_llite.c           |  36 +-
 fs/lustre/llite/namei.c                 |  93 ++--
 fs/lustre/llite/rw.c                    |  10 +-
 fs/lustre/llite/vvp_io.c                | 136 +-----
 fs/lustre/llite/vvp_page.c              |   5 -
 fs/lustre/llite/xattr.c                 |  10 +-
 fs/lustre/llite/xattr_cache.c           |   6 +-
 fs/lustre/llite/xattr_security.c        | 193 +++++++--
 fs/lustre/lmv/lproc_lmv.c               |   4 +-
 fs/lustre/lov/lproc_lov.c               |   4 +-
 fs/lustre/mdc/lproc_mdc.c               |  14 +-
 fs/lustre/mdc/mdc_dev.c                 |  13 +-
 fs/lustre/obdclass/cl_page.c            |  37 +-
 fs/lustre/obdclass/genops.c             |   4 +-
 fs/lustre/obdclass/integrity.c          |  12 +-
 fs/lustre/obdclass/lprocfs_counters.c   |  13 +
 fs/lustre/obdclass/lprocfs_status.c     |  82 +++-
 fs/lustre/obdclass/lu_object.c          |   5 +-
 fs/lustre/obdclass/obd_config.c         |  15 +-
 fs/lustre/osc/lproc_osc.c               |  18 +-
 fs/lustre/osc/osc_cache.c               |  10 +-
 fs/lustre/osc/osc_dev.c                 |  17 +-
 fs/lustre/osc/osc_io.c                  |   5 +-
 fs/lustre/osc/osc_page.c                |   4 +-
 fs/lustre/osc/osc_request.c             |  14 +-
 fs/lustre/ptlrpc/lproc_ptlrpc.c         |  10 +-
 fs/lustre/ptlrpc/niobuf.c               |  19 +-
 fs/lustre/ptlrpc/pack_generic.c         |   6 +
 fs/lustre/ptlrpc/sec.c                  |  17 +-
 fs/lustre/ptlrpc/wiretest.c             |   6 +-
 include/linux/lnet/lib-lnet.h           |   6 +-
 include/linux/lnet/lib-types.h          |  46 ++
 include/uapi/linux/lnet/libcfs_ioctl.h  |   2 +-
 include/uapi/linux/lnet/lnet-dlc.h      |   2 +
 include/uapi/linux/lnet/lnet-types.h    |  27 +-
 include/uapi/linux/lnet/lnetst.h        |   2 +
 include/uapi/linux/lustre/lustre_idl.h  |   1 +
 include/uapi/linux/lustre/lustre_user.h |  55 ++-
 include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |  22 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |  30 +-
 net/lnet/klnds/socklnd/socklnd.c        |  87 ++--
 net/lnet/klnds/socklnd/socklnd_cb.c     |  10 +-
 net/lnet/libcfs/hash.c                  |  19 +-
 net/lnet/lnet/api-ni.c                  | 540 ++++++++++++++++++++----
 net/lnet/lnet/config.c                  |  58 ++-
 net/lnet/lnet/lib-move.c                |  73 ++--
 net/lnet/lnet/lib-msg.c                 |   2 +-
 net/lnet/lnet/nidstrings.c              |  24 ++
 net/lnet/lnet/peer.c                    |  84 ++--
 net/lnet/lnet/router.c                  |  19 +-
 net/lnet/selftest/conctl.c              | 421 ++++++++++++++----
 net/lnet/selftest/conrpc.c              |  22 +-
 net/lnet/selftest/console.c             |  27 +-
 net/lnet/selftest/console.h             |   4 +-
 net/lnet/selftest/selftest.h            |  60 ++-
 72 files changed, 1896 insertions(+), 824 deletions(-)

-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 02/42] lnet: lnet_peer_merge_data to understand large addr James Simmons
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

The oap_cmd field was used to store a number of other flags, but
those were redundant with oap_brw_page.flag, and never used.
That allows shrinking oap_cmd down to 2 bits.

Modern GCC allows specifying a bitfield for an enum, so the size
can be explicitly set.

The oap_page_off always holds < PAGE_SIZE, so it can safely fit
into PAGE_SHIFT bits, similar to ops_from. However, since this
field is used in math operations and we don't need the space,
always allocate it as an aligned 16-bit field.

This allows packing oap_async_flags, oap_cmd, and oap_page_off
into a 32-bit space.  This avoids having holes in the struct. The
explicit oap_padding fields are needed so that "packed" does not
cause the fields to be misaligned, but still allows packing with
the following 4-byte field in osc_page.

Also move oap_brw_page to the end of the struct, since the
bp_padding field therein is useless and can be removed. This
allows better packing with the bitfields in struct osc_page.

    brw_page       old size:  32, holes: 0, padding: 4
    brw_page       new size:  28, holes: 0, padding: 0
    osc_async_page old size: 104, holes: 8, padding: 4
    osc_async_page new size:  92, holes: 0, bit holes: 10
    osc_page       old size: 144, holes: 8, bit holes:  4
    osc_page       new size: 128, holes: 0, bit holes:  4

Together this saves 16 bytes *per page* in cache,
and fits osc_page into a noce-sized allocation.
That is 512MiB on a system with 128GiB of cache.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15619
Lustre-commit: 0bfc8eca5c3d26235 ("LU-15619 osc: pack osc_async_page better")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46721
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h | 25 +++++++++++++++++--------
 fs/lustre/include/obd.h        |  3 +--
 fs/lustre/osc/osc_io.c         |  5 ++---
 fs/lustre/osc/osc_page.c       |  4 +---
 4 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index d15f46b4a34a..526093ebff18 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -60,7 +60,7 @@ struct osc_quota_info {
 	struct rcu_head		rcu;
 };
 
-enum async_flags {
+enum oap_async_flags {
 	ASYNC_READY		= 0x1,	/* osc_make_ready will not be called
 					 * before this page is added to an rpc
 					 */
@@ -71,24 +71,32 @@ enum async_flags {
 					 * to give the caller a chance to update
 					 * or cancel the size of the io
 					 */
-	ASYNC_HP = 0x10,
+	ASYNC_HP		= 0x8,
+	OAP_ASYNC_MAX,
+	OAP_ASYNC_BITS = 4
 };
 
+/* add explicit padding to keep fields aligned despite "packed",
+ * which is needed to pack with following field in osc_page
+ */
+#define OAP_PAD_BITS (16 - OBD_BRW_WRITE - OAP_ASYNC_BITS)
 struct osc_async_page {
-	unsigned short		oap_cmd;
+	unsigned short		oap_page_off;	/* :PAGE_SHIFT */
+	unsigned int		oap_cmd:OBD_BRW_WRITE;
+	enum oap_async_flags    oap_async_flags:OAP_ASYNC_BITS;
+	unsigned int		oap_padding1:OAP_PAD_BITS;	/* unused */
+	unsigned int		oap_padding2;			/* unused */
 
 	struct list_head        oap_pending_item;
 	struct list_head        oap_rpc_item;
 
 	u64			oap_obj_off;
-	unsigned int		oap_page_off;
-	enum async_flags	oap_async_flags;
-
-	struct brw_page		oap_brw_page;
 
 	struct ptlrpc_request	*oap_request;
 	struct osc_object	*oap_obj;
-};
+
+	struct brw_page         oap_brw_page;
+} __packed;
 
 #define oap_page	oap_brw_page.pg
 #define oap_count	oap_brw_page.count
@@ -96,6 +104,7 @@ struct osc_async_page {
 
 static inline struct osc_async_page *brw_page2oap(struct brw_page *pga)
 {
+	BUILD_BUG_ON(OAP_ASYNC_MAX - 1 >= (1 << OAP_ASYNC_BITS));
 	return container_of(pga, struct osc_async_page, oap_brw_page);
 }
 
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 56e56414fd72..e9752a306294 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -123,8 +123,7 @@ struct brw_page {
 	u16			bp_off_diff;
 	/* used for encryption: difference with count in clear text page */
 	u16			bp_count_diff;
-	u32			bp_padding;
-};
+} __packed;
 
 struct timeout_item {
 	enum timeout_event	ti_event;
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index b9362d96b78d..c9a317575993 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -514,9 +514,8 @@ static bool trunc_check_cb(const struct lu_env *env, struct cl_io *io,
 				      start, current->comm);
 
 		if (PageLocked(page->cp_vmpage))
-			CDEBUG(D_CACHE, "page %p index %lu locked for %d.\n",
-			       ops, osc_index(ops),
-			       oap->oap_cmd & OBD_BRW_RWMASK);
+			CDEBUG(D_CACHE, "page %p index %lu locked for cmd=%d\n",
+			       ops, osc_index(ops), oap->oap_cmd);
 	}
 	return true;
 }
diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c
index 667825a90442..feec99fe0ca2 100644
--- a/fs/lustre/osc/osc_page.c
+++ b/fs/lustre/osc/osc_page.c
@@ -296,10 +296,8 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg,
 	oap->oap_count = opg->ops_to - opg->ops_from + 1;
 	oap->oap_brw_flags = OBD_BRW_SYNC | brw_flags;
 
-	if (oio->oi_cap_sys_resource) {
+	if (oio->oi_cap_sys_resource)
 		oap->oap_brw_flags |= OBD_BRW_SYS_RESOURCE;
-		oap->oap_cmd |= OBD_BRW_SYS_RESOURCE;
-	}
 
 	osc_page_transfer_get(opg, "transfer\0imm");
 	osc_page_transfer_add(env, opg, crt);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 02/42] lnet: lnet_peer_merge_data to understand large addr
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 03/42] lnet: router_discover - handle large addrs in ping James Simmons
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Large addr now understood.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 57bcd6aa7f5f347de ("LU-10391 lnet: lnet_peer_merge_data to understand large addr")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44630
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 84 ++++++++++++++++++++++++++++++--------------
 1 file changed, 57 insertions(+), 27 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 8c603c903521..a9759860a5cd 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2917,6 +2917,8 @@ u32 *ping_iter_first(struct lnet_ping_iter *pi,
 	 */
 	if (nid)
 		lnet_nid4_to_nid(pbuf->pb_info.pi_ni[0].ns_nid, nid);
+
+	pi->pos += sizeof(struct lnet_ni_status);
 	return &pbuf->pb_info.pi_ni[0].ns_status;
 }
 
@@ -2953,6 +2955,19 @@ u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid)
 	return NULL;
 }
 
+static int ping_info_count_entries(struct lnet_ping_buffer *pbuf)
+{
+	struct lnet_ping_iter pi;
+	u32 *st;
+	int nnis = 0;
+
+	for (st = ping_iter_first(&pi, pbuf, NULL); st;
+	     st = ping_iter_next(&pi, NULL))
+		nnis += 1;
+
+	return nnis;
+}
+
 /*
  * Build a peer from incoming data.
  *
@@ -2976,10 +2991,14 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 {
 	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
-	lnet_nid_t *curnis = NULL;
-	struct lnet_ni_status *addnis = NULL;
-	lnet_nid_t *delnis = NULL;
+	struct lnet_nid *curnis = NULL;
+	struct lnet_ni_large_status *addnis = NULL;
+	struct lnet_nid *delnis = NULL;
+	struct lnet_ping_iter pi;
 	struct lnet_nid nid;
+	u32 *stp;
+	struct lnet_nid primary = {};
+	bool want_large_primary;
 	unsigned int flags;
 	int ncurnis;
 	int naddnis;
@@ -3003,7 +3022,8 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 		lp->lp_state &= ~LNET_PEER_ROUTER_ENABLED;
 	spin_unlock(&lp->lp_lock);
 
-	nnis = max_t(int, lp->lp_nnis, pbuf->pb_info.pi_nnis);
+	nnis = ping_info_count_entries(pbuf);
+	nnis = max_t(int, lp->lp_nnis, nnis);
 	curnis = kmalloc_array(nnis, sizeof(*curnis), GFP_NOFS);
 	addnis = kmalloc_array(nnis, sizeof(*addnis), GFP_NOFS);
 	delnis = kmalloc_array(nnis, sizeof(*delnis), GFP_NOFS);
@@ -3018,18 +3038,31 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	/* Construct the list of NIDs present in peer. */
 	lpni = NULL;
 	while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL)
-		curnis[ncurnis++] = lnet_nid_to_nid4(&lpni->lpni_nid);
+		curnis[ncurnis++] = lpni->lpni_nid;
 
-	/*
-	 * Check for NIDs in pbuf not present in curnis[].
-	 * The loop starts at 1 to skip the loopback NID.
+	/* Check for NIDs in pbuf not present in curnis[].
+	 * Skip the first, which is loop-back.  Take second as
+	 * primary, unless a large primary is found.
 	 */
-	for (i = 1; i < pbuf->pb_info.pi_nnis; i++) {
+	ping_iter_first(&pi, pbuf, NULL);
+	stp = ping_iter_next(&pi, &nid);
+	if (stp)
+		primary = nid;
+	want_large_primary = (pbuf->pb_info.pi_features &
+			      LNET_PING_FEAT_PRIMARY_LARGE);
+	for (; stp; stp = ping_iter_next(&pi, &nid)) {
 		for (j = 0; j < ncurnis; j++)
-			if (pbuf->pb_info.pi_ni[i].ns_nid == curnis[j])
+			if (nid_same(&nid, &curnis[j]))
 				break;
-		if (j == ncurnis)
-			addnis[naddnis++] = pbuf->pb_info.pi_ni[i];
+		if (j == ncurnis) {
+			addnis[naddnis].ns_nid = nid;
+			addnis[naddnis].ns_status = *stp;
+			naddnis += 1;
+		}
+		if (want_large_primary && nid.nid_size) {
+			primary = nid;
+			want_large_primary = false;
+		}
 	}
 	/*
 	 * Check for NIDs in curnis[] not present in pbuf.
@@ -3039,25 +3072,24 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	 * present in curnis[] then this peer is for this node.
 	 */
 	for (i = 0; i < ncurnis; i++) {
-		if (curnis[i] == LNET_NID_LO_0)
+		if (nid_is_lo0(&curnis[i]))
 			continue;
-		for (j = 1; j < pbuf->pb_info.pi_nnis; j++) {
-			if (curnis[i] == pbuf->pb_info.pi_ni[j].ns_nid) {
+		ping_iter_first(&pi, pbuf, NULL);
+		while ((stp = ping_iter_next(&pi, &nid)) != NULL) {
+			if (nid_same(&curnis[i], &nid)) {
 				/* update the information we cache for the
 				 * peer with the latest information we
 				 * received
 				 */
-				lnet_nid4_to_nid(curnis[i], &nid);
-				lpni = lnet_peer_ni_find_locked(&nid);
+				lpni = lnet_peer_ni_find_locked(&curnis[i]);
 				if (lpni) {
-					lpni->lpni_ns_status =
-						pbuf->pb_info.pi_ni[j].ns_status;
+					lpni->lpni_ns_status = *stp;
 					lnet_peer_ni_decref_locked(lpni);
 				}
 				break;
 			}
 		}
-		if (j == pbuf->pb_info.pi_nnis)
+		if (!stp)
 			delnis[ndelnis++] = curnis[i];
 	}
 
@@ -3070,16 +3102,15 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 		goto out;
 
 	for (i = 0; i < naddnis; i++) {
-		lnet_nid4_to_nid(addnis[i].ns_nid, &nid);
-		rc = lnet_peer_add_nid(lp, &nid, flags);
+		rc = lnet_peer_add_nid(lp, &addnis[i].ns_nid, flags);
 		if (rc) {
 			CERROR("Error adding NID %s to peer %s: %d\n",
-			       libcfs_nid2str(addnis[i].ns_nid),
+			       libcfs_nidstr(&addnis[i].ns_nid),
 			       libcfs_nidstr(&lp->lp_primary_nid), rc);
 			if (rc == -ENOMEM)
 				goto out;
 		}
-		lpni = lnet_peer_ni_find_locked(&nid);
+		lpni = lnet_peer_ni_find_locked(&addnis[i].ns_nid);
 		if (lpni) {
 			lpni->lpni_ns_status = addnis[i].ns_status;
 			lnet_peer_ni_decref_locked(lpni);
@@ -3092,13 +3123,12 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 		 * being told that the router changed its primary_nid
 		 * then it's okay to delete it.
 		 */
-		lnet_nid4_to_nid(delnis[i], &nid);
 		if (lp->lp_rtr_refcount > 0)
 			flags |= LNET_PEER_RTR_NI_FORCE_DEL;
-		rc = lnet_peer_del_nid(lp, &nid, flags);
+		rc = lnet_peer_del_nid(lp, &delnis[i], flags);
 		if (rc) {
 			CERROR("Error deleting NID %s from peer %s: %d\n",
-			       libcfs_nid2str(delnis[i]),
+			       libcfs_nidstr(&delnis[i]),
 			       libcfs_nidstr(&lp->lp_primary_nid), rc);
 			if (rc == -ENOMEM)
 				goto out;
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 03/42] lnet: router_discover - handle large addrs in ping
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 02/42] lnet: lnet_peer_merge_data to understand large addr James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 04/42] lnet: Drop LNet message if deadline exceeded James Simmons
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_router_discover_ping_reply() now considers the large
nids in the ping message.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 2d916eec68e8a7d35 ("LU-10391 lnet: router_discover - handle large addrs in ping")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44631
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 358c3f1fcb1c..88a5b69e1f2e 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -388,12 +388,13 @@ lnet_router_discovery_ping_reply(struct lnet_peer *lp,
 				 struct lnet_ping_buffer *pbuf)
 __must_hold(&the_lnet.ln_api_mutex)
 {
+	struct lnet_ping_iter piter;
 	struct lnet_peer_net *llpn;
 	struct lnet_route *route;
+	struct lnet_nid nid;
 	bool single_hop = false;
 	bool net_up = false;
-	u32 net;
-	int i;
+	u32 *stp;
 
 	if (pbuf->pb_info.pi_features & LNET_PING_FEAT_RTE_DISABLED) {
 		CERROR("Peer %s is being used as a gateway but routing feature is not turned on\n",
@@ -427,13 +428,12 @@ __must_hold(&the_lnet.ln_api_mutex)
 
 		single_hop = false;
 		net_up = false;
-		for (i = 1; i < pbuf->pb_info.pi_nnis; i++) {
-			net = LNET_NIDNET(pbuf->pb_info.pi_ni[i].ns_nid);
-
-			if (route->lr_net == net) {
+		for (stp = ping_iter_first(&piter, pbuf, &nid);
+		     stp;
+		     stp = ping_iter_next(&piter, &nid)) {
+			if (route->lr_net == LNET_NID_NET(&nid)) {
 				single_hop = true;
-				if (pbuf->pb_info.pi_ni[i].ns_status ==
-				    LNET_NI_STATUS_UP) {
+				if (*stp == LNET_NI_STATUS_UP) {
 					net_up = true;
 					break;
 				}
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 04/42] lnet: Drop LNet message if deadline exceeded
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (2 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 03/42] lnet: router_discover - handle large addrs in ping James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 05/42] lnet: change lnet_find_best_lpni to handle large NIDs James Simmons
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The LNet message deadline is set when a message is committed for
sending. A message can be queued while waiting for send credit(s)
after it has been committed. Thus, it is possible for a message
deadline to be exceeded while on the queue. We should check for this
when posting messages to LND layer.

HPE-bug-id: LUS-11333
WC-bug-id: https://jira.whamcloud.com/browse/LU-16303
Lustre-commit: 52db11cdceef0851b ("LU-16303 lnet: Drop LNet message if deadline exceeded")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49078
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 57 +++++++++++++++++++++++++++-------------
 net/lnet/lnet/lib-msg.c  |  2 +-
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 225accaf5d08..f602492ee75f 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -572,41 +572,52 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
 	return rc;
 }
 
-/* returns true if this message should be dropped */
-static bool
+/* Returns:
+ *  -ETIMEDOUT if the message deadline has been exceeded
+ *  -EHOSTUNREACH if the peer is down
+ *  0 if this message should not be dropped
+ */
+static int
 lnet_check_message_drop(struct lnet_ni *ni, struct lnet_peer_ni *lpni,
 			struct lnet_msg *msg)
 {
+	/* Drop message if we've exceeded the message deadline */
+	if (ktime_after(ktime_get(), msg->msg_deadline))
+		return -ETIMEDOUT;
+
 	if (msg->msg_target.pid & LNET_PID_USERFLAG)
-		return false;
+		return 0;
 
 	if (!lnet_peer_aliveness_enabled(lpni))
-		return false;
+		return 0;
 
 	/* If we're resending a message, let's attempt to send it even if
 	 * the peer is down to fulfill our resend quota on the message
 	 */
 	if (msg->msg_retry_count > 0)
-		return false;
+		return 0;
 
-	/* try and send recovery messages irregardless */
+	/* try and send recovery messages regardless */
 	if (msg->msg_recovery)
-		return false;
+		return 0;
 
 	/* always send any responses */
 	if (lnet_msg_is_response(msg))
-		return false;
+		return 0;
 
 	/* always send non-routed messages */
 	if (!msg->msg_routing)
-		return false;
+		return 0;
 
 	/* assume peer_ni is alive as long as we're within the configured
 	 * peer timeout
 	 */
-	return ktime_get_seconds() >=
-		(lpni->lpni_last_alive +
-		 lpni->lpni_net->net_tunables.lct_peer_timeout);
+	if (ktime_get_seconds() >=
+	    (lpni->lpni_last_alive +
+	     lpni->lpni_net->net_tunables.lct_peer_timeout))
+		return -EHOSTUNREACH;
+
+	return 0;
 }
 
 /**
@@ -628,6 +639,7 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
 	struct lnet_ni *ni = msg->msg_txni;
 	int cpt = msg->msg_tx_cpt;
 	struct lnet_tx_queue *tq = ni->ni_tx_queues[cpt];
+	int rc;
 
 	/* non-lnet_send() callers have checked before */
 	LASSERT(!do_send || msg->msg_tx_delayed);
@@ -639,7 +651,8 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
 		LASSERT(!nid_same(&lp->lpni_nid, &the_lnet.ln_loni->ni_nid));
 
 	/* NB 'lp' is always the next hop */
-	if (lnet_check_message_drop(ni, lp, msg)) {
+	rc = lnet_check_message_drop(ni, lp, msg);
+	if (rc) {
 		the_lnet.ln_counters[cpt]->lct_common.lcc_drop_count++;
 		the_lnet.ln_counters[cpt]->lct_common.lcc_drop_length +=
 			msg->msg_len;
@@ -653,14 +666,22 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
 					msg->msg_type,
 					LNET_STATS_TYPE_DROP);
 
-		CNETERR("Dropping message for %s: peer not alive\n",
-			libcfs_idstr(&msg->msg_target));
-		msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED;
+		if (rc == -EHOSTUNREACH) {
+			CNETERR("Dropping message for %s: peer not alive\n",
+				libcfs_idstr(&msg->msg_target));
+			msg->msg_health_status = LNET_MSG_STATUS_REMOTE_DROPPED;
+		} else {
+			CNETERR("Dropping message for %s: exceeded message deadline\n",
+				libcfs_idstr(&msg->msg_target));
+			msg->msg_health_status =
+				LNET_MSG_STATUS_NETWORK_TIMEOUT;
+		}
+
 		if (do_send)
-			lnet_finalize(msg, -EHOSTUNREACH);
+			lnet_finalize(msg, rc);
 
 		lnet_net_lock(cpt);
-		return -EHOSTUNREACH;
+		return rc;
 	}
 
 	if (msg->msg_md &&
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 898d8670aedf..82d117dc6b61 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -779,7 +779,7 @@ lnet_health_check(struct lnet_msg *msg)
 		lo = true;
 
 	if (hstatus != LNET_MSG_STATUS_OK &&
-	    ktime_compare(ktime_get(), msg->msg_deadline) >= 0)
+	    ktime_after(ktime_get(), msg->msg_deadline))
 		return -1;
 
 	/* always prefer txni/txpeer if they message is committed for both
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 05/42] lnet: change lnet_find_best_lpni to handle large NIDs
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (3 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 04/42] lnet: Drop LNet message if deadline exceeded James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 06/42] lustre: ldebugfs: add histogram to stats counter James Simmons
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Currently lnet_find_best_lpni() only handles small NID addresses
for the dst_nid. Change this to large NID address to allow IPv6
and other large address protocols.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 4c0c01e29c891d100 ("LU-10391 lnet: change lnet_find_best_lpni to handle large NIDs")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49181
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/lnet/lib-move.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index f602492ee75f..95abe4f15c48 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1114,7 +1114,7 @@ lnet_return_rx_credits_locked(struct lnet_msg *msg)
 }
 
 static struct lnet_peer_ni *
-lnet_select_peer_ni(struct lnet_ni *best_ni, lnet_nid_t dst_nid,
+lnet_select_peer_ni(struct lnet_ni *best_ni, struct lnet_nid *dst_nid,
 		    struct lnet_peer *peer,
 		    struct lnet_peer_ni *best_lpni,
 		    struct lnet_peer_net *peer_net)
@@ -1220,7 +1220,7 @@ lnet_select_peer_ni(struct lnet_ni *best_ni, lnet_nid_t dst_nid,
 	/* if we still can't find a peer ni then we can't reach it */
 	if (!best_lpni) {
 		u32 net_id = (peer_net) ? peer_net->lpn_net_id :
-			     LNET_NIDNET(dst_nid);
+			     LNET_NID_NET(dst_nid);
 		CDEBUG(D_NET, "no peer_ni found on peer net %s\n",
 		       libcfs_net2str(net_id));
 		return NULL;
@@ -1242,7 +1242,7 @@ lnet_select_peer_ni(struct lnet_ni *best_ni, lnet_nid_t dst_nid,
  * ones.
  */
 static inline struct lnet_peer_ni *
-lnet_find_best_lpni(struct lnet_ni *lni, lnet_nid_t dst_nid,
+lnet_find_best_lpni(struct lnet_ni *lni, struct lnet_nid *dst_nid,
 		    struct lnet_peer *peer, u32 net_id)
 {
 	struct lnet_peer_net *peer_net;
@@ -1350,7 +1350,7 @@ lnet_find_route_locked(struct lnet_remotenet *rnet, u32 src_net,
 			 * src_net provided. If the src_net is LNET_NID_ANY,
 			 * then select the best interface available.
 			 */
-			lpni = lnet_find_best_lpni(NULL, LNET_NID_ANY,
+			lpni = lnet_find_best_lpni(NULL, NULL,
 						   route->lr_gateway,
 						   src_net);
 			if (!lpni) {
@@ -1395,8 +1395,7 @@ lnet_find_route_locked(struct lnet_remotenet *rnet, u32 src_net,
 		 * src_net provided. If the src_net is LNET_NID_ANY,
 		 * then select the best interface available.
 		 */
-		lpni = lnet_find_best_lpni(NULL, LNET_NID_ANY,
-					   route->lr_gateway,
+		lpni = lnet_find_best_lpni(NULL, NULL, route->lr_gateway,
 					   src_net);
 		if (!lpni) {
 			CDEBUG(D_NET,
@@ -2108,7 +2107,7 @@ lnet_handle_find_routed_path(struct lnet_send_data *sd,
 			       libcfs_net2str(best_lpn->lpn_net_id));
 
 			sd->sd_best_lpni = lnet_find_best_lpni(sd->sd_best_ni,
-							       lnet_nid_to_nid4(&sd->sd_dst_nid),
+							       &sd->sd_dst_nid,
 							       lp,
 							       best_lpn->lpn_net_id);
 			if (!sd->sd_best_lpni) {
@@ -2540,8 +2539,7 @@ lnet_handle_any_mr_dsta(struct lnet_send_data *sd)
 					lnet_msg_discovery(sd->sd_msg));
 	if (sd->sd_best_ni) {
 		sd->sd_best_lpni =
-		  lnet_find_best_lpni(sd->sd_best_ni,
-				      lnet_nid_to_nid4(&sd->sd_dst_nid),
+		  lnet_find_best_lpni(sd->sd_best_ni, &sd->sd_dst_nid,
 				      sd->sd_peer,
 				      sd->sd_best_ni->ni_net->net_id);
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 06/42] lustre: ldebugfs: add histogram to stats counter
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (4 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 05/42] lnet: change lnet_find_best_lpni to handle large NIDs James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 07/42] lustre: llite: wake_up after cl_object_kill James Simmons
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

Add histogram to stats counter.
Example of enabling histogram for read/write_bytes in mdt/obdfilter
job stats.

Sample job_stats:
- job_id:          md5sum.0
snapshot_time   : 3143196.864165417 secs.nsecs
start_time      : 3143196.707206168 secs.nsecs
elapsed_time    : 0.156959249 secs.nsecs
  read_bytes:      { samples: 2, ..., hist: { 32K: 1, 1M: 1 } }
  write_bytes:     { samples: 1, ..., hist: { 1K: 1 } }

WC-bug-id: https://jira.whamcloud.com/browse/LU-16087
Lustre-commit: fde40ce32c91c804c ("LU-16087 lprocfs: add histogram to stats counter")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48278
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lprocfs_status.h    |  7 +++++++
 fs/lustre/obdclass/lprocfs_counters.c | 13 +++++++++++++
 fs/lustre/obdclass/lprocfs_status.c   | 21 +++++++++++++++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h
index 5cea77d1c246..f53125c96683 100644
--- a/fs/lustre/include/lprocfs_status.h
+++ b/fs/lustre/include/lprocfs_status.h
@@ -124,12 +124,16 @@ struct rename_stats {
  * squares (for multi-valued counter samples only). This allows
  * external computation of standard deviation, but involves a 64-bit
  * multiply per counter increment.
+ *
+ * LPROCFS_CNTR_HISTOGRAM indicates that the counter should track a
+ * exponential histogram.
  */
 
 enum lprocfs_counter_config {
 	LPROCFS_CNTR_EXTERNALLOCK	= 0x0001,
 	LPROCFS_CNTR_AVGMINMAX		= 0x0002,
 	LPROCFS_CNTR_STDDEV		= 0x0004,
+	LPROCFS_CNTR_HISTOGRAM		= 0x0008,
 
 	/* counter unit type */
 	LPROCFS_TYPE_REQS		= 0x0000, /* default if config = 0 */
@@ -147,6 +151,8 @@ enum lprocfs_counter_config {
 	LPROCFS_TYPE_BYTES_FULL		= LPROCFS_TYPE_BYTES |
 					  LPROCFS_CNTR_AVGMINMAX |
 					  LPROCFS_CNTR_STDDEV,
+	LPROCFS_TYPE_BYTES_FULL_HISTOGRAM	= LPROCFS_TYPE_BYTES_FULL |
+						  LPROCFS_CNTR_HISTOGRAM,
 };
 
 #define LC_MIN_INIT ((~(u64)0) >> 1)
@@ -155,6 +161,7 @@ struct lprocfs_counter_header {
 	enum lprocfs_counter_config	lc_config;
 	const char			*lc_name;   /* must be static */
 	const char			*lc_units;  /* must be static */
+	struct obd_histogram		*lc_hist;
 };
 
 struct lprocfs_counter {
diff --git a/fs/lustre/obdclass/lprocfs_counters.c b/fs/lustre/obdclass/lprocfs_counters.c
index 55cb12cc1399..4112dc355e94 100644
--- a/fs/lustre/obdclass/lprocfs_counters.c
+++ b/fs/lustre/obdclass/lprocfs_counters.c
@@ -48,6 +48,7 @@ void lprocfs_counter_add(struct lprocfs_stats *stats, int idx, long amount)
 	struct lprocfs_counter_header *header;
 	int smp_id;
 	unsigned long flags = 0;
+	struct obd_histogram *hist;
 
 	if (!stats)
 		return;
@@ -87,6 +88,18 @@ void lprocfs_counter_add(struct lprocfs_stats *stats, int idx, long amount)
 		if (amount > percpu_cntr->lc_max)
 			percpu_cntr->lc_max = amount;
 	}
+	/* no counter in interrupt has historgram for now */
+	hist = stats->ls_cnt_header[idx].lc_hist;
+	if (hist != NULL) {
+		unsigned int val = 0;
+
+		if (likely(amount != 0))
+			val = min(fls(amount - 1), OBD_HIST_MAX - 1);
+		spin_lock(&hist->oh_lock);
+		hist->oh_buckets[val]++;
+		spin_unlock(&hist->oh_lock);
+	}
+
 	lprocfs_stats_unlock(stats, LPROCFS_GET_SMP_ID, &flags);
 }
 EXPORT_SYMBOL(lprocfs_counter_add);
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 64d7cc48cd0c..01b8132087d6 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1311,13 +1311,20 @@ EXPORT_SYMBOL(lprocfs_stats_collector);
 void lprocfs_clear_stats(struct lprocfs_stats *stats)
 {
 	struct lprocfs_counter *percpu_cntr;
-	int i;
-	int j;
+	int i, j;
 	unsigned int num_entry;
 	unsigned long flags = 0;
 
 	num_entry = lprocfs_stats_lock(stats, LPROCFS_GET_NUM_CPU, &flags);
 
+	/* clear histogram if exists */
+	for (j = 0; j < stats->ls_num; j++) {
+		struct obd_histogram *hist = stats->ls_cnt_header[j].lc_hist;
+
+		if (hist != NULL)
+			lprocfs_oh_clear(hist);
+	}
+
 	for (i = 0; i < num_entry; i++) {
 		if (!stats->ls_percpu[i])
 			continue;
@@ -1497,6 +1504,16 @@ void lprocfs_counter_init_units(struct lprocfs_stats *stats, int index,
 	header->lc_name = name;
 	header->lc_units = units;
 
+	if (config & LPROCFS_CNTR_HISTOGRAM) {
+		stats->ls_cnt_header[index].lc_hist =
+			kzalloc(sizeof(*stats->ls_cnt_header[index].lc_hist),
+				GFP_NOFS);
+		if (stats->ls_cnt_header[index].lc_hist == NULL)
+			CERROR("LprocFS: Failed to allocate histogram:[%d]%s/%s\n",
+			       index, name, units);
+		else
+			spin_lock_init(&stats->ls_cnt_header[index].lc_hist->oh_lock);
+	}
 	num_cpu = lprocfs_stats_lock(stats, LPROCFS_GET_NUM_CPU, &flags);
 	for (i = 0; i < num_cpu; ++i) {
 		if (!stats->ls_percpu[i])
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 07/42] lustre: llite: wake_up after cl_object_kill
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (5 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 06/42] lustre: ldebugfs: add histogram to stats counter James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 08/42] lustre: pcc: use two bits to indicate pcc type for attach James Simmons
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

cl_inode_fini() calls cl_object_kill() to set LU_OBJECT_HEARD_BANSHEE,
and then calls cl_object_put_last() to wait for object refcount to
become one, It should wake_up() in the middle in case someone is
waiting on the flag.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16308
Lustre-commit: 77107d8e78ffd952a ("LU-16308 llite: wake_up after cl_object_kill")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49130
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/lcommon_cl.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/lustre/llite/lcommon_cl.c b/fs/lustre/llite/lcommon_cl.c
index f0d8f78fc82b..2735d5c164f8 100644
--- a/fs/lustre/llite/lcommon_cl.c
+++ b/fs/lustre/llite/lcommon_cl.c
@@ -229,6 +229,11 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj)
 
 		wq = lu_site_wq_from_fid(site, &header->loh_fid);
 
+		/* LU_OBJECT_HEARD_BANSHEE is set in cl_object_kill(), in case
+		 * someone is waiting on this, wake up and then wait for object
+		 * refcount becomes one.
+		 */
+		wake_up(wq);
 		wait_event(*wq, atomic_read(&header->loh_ref) == 1);
 	}
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 08/42] lustre: pcc: use two bits to indicate pcc type for attach
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (6 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 07/42] lustre: llite: wake_up after cl_object_kill James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 09/42] lustre: ldebugfs: make job_stats and rename_stats valid YAML James Simmons
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

PCC currenty supports two types: readwrite and readonly.
The attach data structure @lu_pcc_attach is using 32 bit value to
indicate the PCC type:
    struct lu_pcc_attach {
        __u32 pcca_type;
        __u32 pcca_id;
    };

In this patch, it changes to use 2 bits to represent the PCC type.
The left bits in @pcca_type can be used as flags for attach such
as a flag to indicate using the asynchronous attach via the
command "lfs pcc attach -A" for PCCRO.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16313
Lustre-commit: 6e90974b1f4ac24c5 ("LU-16313 pcc: use two bits to indicate pcc type for attach")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49160
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_user.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index e97ccb0635ba..c2096ba1cdbe 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -2324,18 +2324,22 @@ struct lu_heat {
 };
 
 enum lu_pcc_type {
-	LU_PCC_NONE = 0,
-	LU_PCC_READWRITE,
+	LU_PCC_NONE		= 0x0,
+	LU_PCC_READWRITE	= 0x01,
+	LU_PCC_READONLY		= 0x02,
+	LU_PCC_TYPE_MASK	= LU_PCC_READWRITE | LU_PCC_READONLY,
 	LU_PCC_MAX
 };
 
 static inline const char *pcc_type2string(enum lu_pcc_type type)
 {
-	switch (type) {
+	switch (type & LU_PCC_TYPE_MASK) {
 	case LU_PCC_NONE:
 		return "none";
 	case LU_PCC_READWRITE:
 		return "readwrite";
+	case LU_PCC_READONLY:
+		return "readonly";
 	default:
 		return "fault";
 	}
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 09/42] lustre: ldebugfs: make job_stats and rename_stats valid YAML
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (7 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 08/42] lustre: pcc: use two bits to indicate pcc type for attach James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 10/42] lustre: misc: fix stats snapshot_time to use wallclock James Simmons
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

Adjust the format of job_stats and rename_stats to make
them valid YAML.  This fixes the output to correctly indent
the items to follow YAML formatting rules.

Add a test case to verify the format of these params is valid
YAML to avoid other errors being introduced in the future.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16110
Lustre-commit: e96cb6ff1fea7a2bc ("LU-16110 lprocfs: make job_stats and rename_stats valid YAML")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48417
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lprocfs_status.h  |  2 +-
 fs/lustre/llite/lproc_llite.c       |  8 ++++---
 fs/lustre/mdc/lproc_mdc.c           |  3 ++-
 fs/lustre/obdclass/genops.c         |  2 +-
 fs/lustre/obdclass/lprocfs_status.c | 36 ++++++++++++++++++++++-------
 fs/lustre/osc/lproc_osc.c           |  6 +++--
 6 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h
index f53125c96683..ef923ae4ca37 100644
--- a/fs/lustre/include/lprocfs_status.h
+++ b/fs/lustre/include/lprocfs_status.h
@@ -461,7 +461,7 @@ int lprocfs_obd_setup(struct obd_device *obd, bool uuid_only);
 int lprocfs_obd_cleanup(struct obd_device *obd);
 void lprocfs_stats_header(struct seq_file *seq, ktime_t now,
 			  ktime_t ts_init, int width, const char *colon,
-			  bool show_units);
+			  bool show_units, const char *prefix);
 
 /* Generic callbacks */
 int ldebugfs_uint(struct seq_file *m, void *data);
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index a57d7bb0a848..ef7ca4c67681 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -2020,7 +2020,8 @@ static int ll_rw_extents_stats_pp_seq_show(struct seq_file *seq, void *v)
 	}
 
 	spin_lock(&sbi->ll_pp_extent_lock);
-	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":", 1);
+	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":",
+			     true, "");
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
 		   "extents", "calls", "%", "cum%", "calls", "%", "cum%");
@@ -2157,7 +2158,8 @@ static int ll_rw_extents_stats_seq_show(struct seq_file *seq, void *v)
 	}
 
 	spin_lock(&sbi->ll_lock);
-	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":", 1);
+	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":",
+			     true, "");
 
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
@@ -2341,7 +2343,7 @@ static int ll_rw_offset_stats_seq_show(struct seq_file *seq, void *v)
 	}
 	spin_lock(&sbi->ll_process_lock);
 	lprocfs_stats_header(seq, ktime_get(), sbi->ll_process_stats_init, 25,
-			     ":", true);
+			     ":", true, "");
 	seq_printf(seq, "%3s %10s %14s %14s %17s %17s %14s\n",
 		   "R/W", "PID", "RANGE START", "RANGE END",
 		   "SMALLEST EXTENT", "LARGEST EXTENT", "OFFSET");
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index cb3744ad9f93..81397a388681 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -513,7 +513,8 @@ static int mdc_stats_seq_show(struct seq_file *seq, void *v)
 	struct obd_device *obd = seq->private;
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
-	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true);
+	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true,
+			     "");
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 6e4d2402efee..a20b119f9c37 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -1448,7 +1448,7 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
 
 	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
 	lprocfs_stats_header(seq, ktime_get(), cli->cl_mod_rpcs_init, 25,
-			     ":", true);
+			     ":", true, "");
 	seq_printf(seq, "modify_RPCs_in_flight:  %hu\n",
 		   cli->cl_mod_rpcs_in_flight);
 
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 01b8132087d6..edc576d5f957 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1373,21 +1373,40 @@ static void *lprocfs_stats_seq_next(struct seq_file *p, void *v, loff_t *pos)
 	return lprocfs_stats_seq_start(p, pos);
 }
 
+/**
+ * print header of stats including snapshot_time, start_time and elapsed_time.
+ *
+ * @seq		the file to print content to
+ * @now		end time to calculate elapsed_time
+ * @ts_init		start time to calculate elapsed_time
+ * @width		the width of key to align them well
+ * @colon		"" or ":"
+ * @show_units		show units or not
+ * @prefix		prefix (indent) before printing each line of header
+ *			to align them with other content
+ */
 void lprocfs_stats_header(struct seq_file *seq, ktime_t now, ktime_t ts_init,
-			  int width, const char *colon, bool show_units)
+			  int width, const char *colon, bool show_units,
+			  const char *prefix)
 {
 	const char *units = show_units ? " secs.nsecs" : "";
 	struct timespec64 ts;
+	const char *field;
 
+	field = (colon && colon[0]) ? "snapshot_time:" : "snapshot_time";
 	ts = ktime_to_timespec64(now);
-	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
-		   "snapshot_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+	seq_printf(seq, "%s%-*s %llu.%09lu%s\n", prefix, width, field,
+		   (s64)ts.tv_sec, ts.tv_nsec, units);
+
+	field = (colon && colon[0]) ? "start_time:" : "start_time";
 	ts = ktime_to_timespec64(ts_init);
-	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
-		   "start_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+	seq_printf(seq, "%s%-*s %llu.%09lu%s\n", prefix, width, field,
+		   (s64)ts.tv_sec, ts.tv_nsec, units);
+
+	field = (colon && colon[0]) ? "elapsed_time:" : "elapsed_time";
 	ts = ktime_to_timespec64(ktime_sub(now, ts_init));
-	seq_printf(seq, "%-*s%s %llu.%09lu%s\n", width,
-		   "elapsed_time", colon, (s64)ts.tv_sec, ts.tv_nsec, units);
+	seq_printf(seq, "%s%-*s %llu.%09lu%s\n", prefix, width, field,
+		   (s64)ts.tv_sec, ts.tv_nsec, units);
 }
 EXPORT_SYMBOL(lprocfs_stats_header);
 
@@ -1400,7 +1419,8 @@ static int lprocfs_stats_seq_show(struct seq_file *p, void *v)
 	int idx = *(loff_t *)v;
 
 	if (idx == 0)
-		lprocfs_stats_header(p, ktime_get(), stats->ls_init, 25, "", 1);
+		lprocfs_stats_header(p, ktime_get(), stats->ls_init, 25, "",
+				     true, "");
 
 	hdr = &stats->ls_cnt_header[idx];
 	lprocfs_stats_collect(stats, idx, &ctr);
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index 54b86d17eb78..c4b1e3e4e2cc 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -702,7 +702,8 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 	spin_lock(&cli->cl_loi_list_lock);
 
-	lprocfs_stats_header(seq, ktime_get(), cli->cl_stats_init, 25, ":", 1);
+	lprocfs_stats_header(seq, ktime_get(), cli->cl_stats_init, 25, ":",
+			     true, "");
 	seq_printf(seq, "read RPCs in flight:  %d\n",
 		   cli->cl_r_in_flight);
 	seq_printf(seq, "write RPCs in flight: %d\n",
@@ -814,7 +815,8 @@ static int osc_stats_seq_show(struct seq_file *seq, void *v)
 	struct obd_device *obd = seq->private;
 	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
 
-	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true);
+	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true,
+			     "");
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 10/42] lustre: misc: fix stats snapshot_time to use wallclock
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (8 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 09/42] lustre: ldebugfs: make job_stats and rename_stats valid YAML James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 11/42] lustre: pools: force creation of a component without a pool James Simmons
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

The timestamps reported during stats collection inadvertently changed
from being POSIX epoch timestamps to elapsed-from-boot timestamps.

While some collection tools ignore these timestamps, or only use the
delta between successive reads, having uniform timestaps in stats
files simplifies stats correlation between different servers.

Revert the snapshot_time back to showing wallclock time.

Some "init" times were not initialized when stats were allocated or
cleared, do this for all stats shown by lprocfs_stats_header().

Rename struct osc_device fields from od_ to osc_ to avoid confusion
with struct osd_device. Having two od_stats was especially confusing.

Fixes: 653198e691 ("lustre: obdclass: add start time to stats files")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16231
Lustre-commit: e42efe35eec7b9725 ("LU-16231 misc: fix stats snapshot_time to use wallclock")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48821
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Ellis Wilson <elliswilson@microsoft.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h      | 16 ++++++++--------
 fs/lustre/ldlm/ldlm_lib.c           |  1 +
 fs/lustre/llite/lproc_llite.c       | 20 +++++++++++---------
 fs/lustre/mdc/lproc_mdc.c           | 11 ++++++-----
 fs/lustre/mdc/mdc_dev.c             | 13 +++++++------
 fs/lustre/obdclass/genops.c         |  2 +-
 fs/lustre/obdclass/lprocfs_status.c |  6 ++++--
 fs/lustre/osc/lproc_osc.c           | 16 ++++++++--------
 fs/lustre/osc/osc_cache.c           |  2 +-
 fs/lustre/osc/osc_dev.c             | 17 +++++++++--------
 10 files changed, 56 insertions(+), 48 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 526093ebff18..3b936c7dc1a1 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -114,18 +114,18 @@ static inline void osc_wake_cache_waiters(struct client_obd *cli)
 }
 
 struct osc_device {
-	struct cl_device	od_cl;
-	struct obd_export	*od_exp;
+	struct cl_device	osc_cl;
+	struct obd_export	*osc_exp;
 
 	/* Write stats is actually protected by client_obd's lock. */
 	struct osc_stats {
 		ktime_t		os_init;
 		u64		os_lockless_writes;	/* by bytes */
 		u64		os_lockless_reads;	/* by bytes */
-	} od_stats;
+	} osc_stats;
 
 	/* configuration item(s) */
-	time64_t		od_contention_time;
+	time64_t		osc_contention_time;
 };
 
 /* \defgroup osc osc
@@ -772,12 +772,12 @@ static inline struct osc_io *osc_env_io(const struct lu_env *env)
 
 static inline struct osc_device *lu2osc_dev(const struct lu_device *d)
 {
-	return container_of(d, struct osc_device, od_cl.cd_lu_dev);
+	return container_of(d, struct osc_device, osc_cl.cd_lu_dev);
 }
 
 static inline struct obd_export *osc_export(const struct osc_object *obj)
 {
-	return lu2osc_dev(obj->oo_cl.co_lu.lo_dev)->od_exp;
+	return lu2osc_dev(obj->oo_cl.co_lu.lo_dev)->osc_exp;
 }
 
 static inline struct client_obd *osc_cli(const struct osc_object *obj)
@@ -798,12 +798,12 @@ static inline struct cl_object *osc2cl(const struct osc_object *obj)
 static inline struct osc_device *obd2osc_dev(const struct obd_device *obd)
 {
 	return container_of_safe(obd->obd_lu_dev, struct osc_device,
-				 od_cl.cd_lu_dev);
+				 osc_cl.cd_lu_dev);
 }
 
 static inline struct lu_device *osc2lu_dev(struct osc_device *osc)
 {
-	return &osc->od_cl.cd_lu_dev;
+	return &osc->osc_cl.cd_lu_dev;
 }
 
 static inline struct lu_object *osc2lu(struct osc_object *osc)
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index e4262c360950..ddedaadacf31 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -449,6 +449,7 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	cli->cl_mod_rpcs_in_flight = 0;
 	cli->cl_close_rpcs_in_flight = 0;
 	init_waitqueue_head(&cli->cl_mod_rpcs_waitq);
+	cli->cl_mod_rpcs_init = ktime_get_real();
 	cli->cl_mod_tag_bitmap = NULL;
 
 	INIT_LIST_HEAD(&cli->cl_chg_dev_linkage);
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index ef7ca4c67681..7157886c31cc 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -2020,8 +2020,8 @@ static int ll_rw_extents_stats_pp_seq_show(struct seq_file *seq, void *v)
 	}
 
 	spin_lock(&sbi->ll_pp_extent_lock);
-	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":",
-			     true, "");
+	lprocfs_stats_header(seq, ktime_get_real(), rw_extents->pp_init, 25,
+			     ":", true, "");
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
 		   "extents", "calls", "%", "cum%", "calls", "%", "cum%");
@@ -2052,6 +2052,7 @@ static int alloc_rw_stats_info(struct ll_sb_info *sbi)
 		spin_lock_init(&rw_extents->pp_extents[i].pp_r_hist.oh_lock);
 		spin_lock_init(&rw_extents->pp_extents[i].pp_w_hist.oh_lock);
 	}
+	rw_extents->pp_init = ktime_get_real();
 
 	spin_lock(&sbi->ll_pp_extent_lock);
 	if (!sbi->ll_rw_extents_info)
@@ -2080,6 +2081,7 @@ static int alloc_rw_stats_info(struct ll_sb_info *sbi)
 	if (!sbi->ll_rw_offset_info)
 		sbi->ll_rw_offset_info = offset;
 	spin_unlock(&sbi->ll_process_lock);
+	sbi->ll_process_stats_init = ktime_get_real();
 
 	/* another writer allocated the structs before we got the lock */
 	if (sbi->ll_rw_offset_info != offset)
@@ -2133,7 +2135,7 @@ static ssize_t ll_rw_extents_stats_pp_seq_write(struct file *file,
 	spin_lock(&sbi->ll_pp_extent_lock);
 	rw_extents = sbi->ll_rw_extents_info;
 	if (rw_extents) {
-		rw_extents->pp_init = ktime_get();
+		rw_extents->pp_init = ktime_get_real();
 		for (i = 0; i < LL_PROCESS_HIST_MAX; i++) {
 			rw_extents->pp_extents[i].pid = 0;
 			lprocfs_oh_clear(&rw_extents->pp_extents[i].pp_r_hist);
@@ -2158,8 +2160,8 @@ static int ll_rw_extents_stats_seq_show(struct seq_file *seq, void *v)
 	}
 
 	spin_lock(&sbi->ll_lock);
-	lprocfs_stats_header(seq, ktime_get(), rw_extents->pp_init, 25, ":",
-			     true, "");
+	lprocfs_stats_header(seq, ktime_get_real(), rw_extents->pp_init, 25,
+			     ":", true, "");
 
 	seq_printf(seq, "%15s %19s       | %20s\n", " ", "read", "write");
 	seq_printf(seq, "%13s   %14s %4s %4s  | %14s %4s %4s\n",
@@ -2202,7 +2204,7 @@ static ssize_t ll_rw_extents_stats_seq_write(struct file *file,
 	spin_lock(&sbi->ll_pp_extent_lock);
 	rw_extents = sbi->ll_rw_extents_info;
 	if (rw_extents) {
-		rw_extents->pp_init = ktime_get();
+		rw_extents->pp_init = ktime_get_real();
 		for (i = 0; i <= LL_PROCESS_HIST_MAX; i++) {
 			rw_extents->pp_extents[i].pid = 0;
 			lprocfs_oh_clear(&rw_extents->pp_extents[i].pp_r_hist);
@@ -2342,8 +2344,8 @@ static int ll_rw_offset_stats_seq_show(struct seq_file *seq, void *v)
 		return 0;
 	}
 	spin_lock(&sbi->ll_process_lock);
-	lprocfs_stats_header(seq, ktime_get(), sbi->ll_process_stats_init, 25,
-			     ":", true, "");
+	lprocfs_stats_header(seq, ktime_get_real(), sbi->ll_process_stats_init,
+			     25, ":", true, "");
 	seq_printf(seq, "%3s %10s %14s %14s %17s %17s %14s\n",
 		   "R/W", "PID", "RANGE START", "RANGE END",
 		   "SMALLEST EXTENT", "LARGEST EXTENT", "OFFSET");
@@ -2410,7 +2412,7 @@ static ssize_t ll_rw_offset_stats_seq_write(struct file *file,
 	spin_lock(&sbi->ll_process_lock);
 	sbi->ll_offset_process_count = 0;
 	sbi->ll_rw_offset_entry_count = 0;
-	sbi->ll_process_stats_init = ktime_get();
+	sbi->ll_process_stats_init = ktime_get_real();
 	if (sbi->ll_rw_process_info)
 		memset(sbi->ll_rw_process_info, 0,
 		       sizeof(struct ll_rw_process_info) * LL_PROCESS_HIST_MAX);
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index 81397a388681..b59bba3595e3 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -410,6 +410,7 @@ static ssize_t mdc_rpc_stats_seq_write(struct file *file,
 	lprocfs_oh_clear(&cli->cl_write_page_hist);
 	lprocfs_oh_clear(&cli->cl_read_offset_hist);
 	lprocfs_oh_clear(&cli->cl_write_offset_hist);
+	cli->cl_mod_rpcs_init = ktime_get_real();
 
 	return len;
 }
@@ -511,10 +512,10 @@ LDEBUGFS_SEQ_FOPS(mdc_rpc_stats);
 static int mdc_stats_seq_show(struct seq_file *seq, void *v)
 {
 	struct obd_device *obd = seq->private;
-	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
+	struct osc_stats *stats = &obd2osc_dev(obd)->osc_stats;
 
-	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true,
-			     "");
+	lprocfs_stats_header(seq, ktime_get_real(), stats->os_init, 25, ":",
+			     true, "");
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
@@ -528,10 +529,10 @@ static ssize_t mdc_stats_seq_write(struct file *file,
 {
 	struct seq_file *seq = file->private_data;
 	struct obd_device *obd = seq->private;
-	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
+	struct osc_stats *stats = &obd2osc_dev(obd)->osc_stats;
 
 	memset(stats, 0, sizeof(*stats));
-	stats->os_init = ktime_get();
+	stats->os_init = ktime_get_real();
 
 	return len;
 }
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index e0f5b457b0fb..984d1a8cc697 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1554,16 +1554,16 @@ static struct lu_device *mdc_device_alloc(const struct lu_env *env,
 					  struct lustre_cfg *cfg)
 {
 	struct lu_device *d;
-	struct osc_device *od;
+	struct osc_device *oc;
 	struct obd_device *obd;
 	int rc;
 
-	od = kzalloc(sizeof(*od), GFP_NOFS);
-	if (!od)
+	oc = kzalloc(sizeof(*oc), GFP_NOFS);
+	if (!oc)
 		return ERR_PTR(-ENOMEM);
 
-	cl_device_init(&od->od_cl, t);
-	d = osc2lu_dev(od);
+	cl_device_init(&oc->osc_cl, t);
+	d = osc2lu_dev(oc);
 	d->ld_ops = &mdc_lu_ops;
 
 	/* Setup MDC OBD */
@@ -1576,7 +1576,8 @@ static struct lu_device *mdc_device_alloc(const struct lu_env *env,
 		osc_device_free(env, d);
 		return ERR_PTR(rc);
 	}
-	od->od_exp = obd->obd_self_export;
+	oc->osc_exp = obd->obd_self_export;
+	oc->osc_stats.os_init = ktime_get_real();
 	return d;
 }
 
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index a20b119f9c37..b6bde00ab389 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -1447,7 +1447,7 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
 	int i;
 
 	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
-	lprocfs_stats_header(seq, ktime_get(), cli->cl_mod_rpcs_init, 25,
+	lprocfs_stats_header(seq, ktime_get_real(), cli->cl_mod_rpcs_init, 25,
 			     ":", true, "");
 	seq_printf(seq, "modify_RPCs_in_flight:  %hu\n",
 		   cli->cl_mod_rpcs_in_flight);
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index edc576d5f957..5089e7cfd377 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1229,6 +1229,7 @@ struct lprocfs_stats *lprocfs_alloc_stats(unsigned int num,
 
 	stats->ls_num = num;
 	stats->ls_flags = flags;
+	stats->ls_init = ktime_get_real();
 	spin_lock_init(&stats->ls_lock);
 
 	/* alloc num of counter headers */
@@ -1339,6 +1340,7 @@ void lprocfs_clear_stats(struct lprocfs_stats *stats)
 				percpu_cntr->lc_sum_irq	= 0;
 		}
 	}
+	stats->ls_init = ktime_get_real();
 
 	lprocfs_stats_unlock(stats, LPROCFS_GET_NUM_CPU, &flags);
 }
@@ -1419,8 +1421,8 @@ static int lprocfs_stats_seq_show(struct seq_file *p, void *v)
 	int idx = *(loff_t *)v;
 
 	if (idx == 0)
-		lprocfs_stats_header(p, ktime_get(), stats->ls_init, 25, "",
-				     true, "");
+		lprocfs_stats_header(p, ktime_get_real(), stats->ls_init, 25,
+				     "", true, "");
 
 	hdr = &stats->ls_cnt_header[idx];
 	lprocfs_stats_collect(stats, idx, &ctr);
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index c4b1e3e4e2cc..b458a867c31f 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -702,8 +702,8 @@ static int osc_rpc_stats_seq_show(struct seq_file *seq, void *v)
 
 	spin_lock(&cli->cl_loi_list_lock);
 
-	lprocfs_stats_header(seq, ktime_get(), cli->cl_stats_init, 25, ":",
-			     true, "");
+	lprocfs_stats_header(seq, ktime_get_real(), cli->cl_stats_init, 25,
+			     ":", true, "");
 	seq_printf(seq, "read RPCs in flight:  %d\n",
 		   cli->cl_r_in_flight);
 	seq_printf(seq, "write RPCs in flight: %d\n",
@@ -803,7 +803,7 @@ static ssize_t osc_rpc_stats_seq_write(struct file *file,
 	lprocfs_oh_clear(&cli->cl_write_page_hist);
 	lprocfs_oh_clear(&cli->cl_read_offset_hist);
 	lprocfs_oh_clear(&cli->cl_write_offset_hist);
-	cli->cl_stats_init = ktime_get();
+	cli->cl_stats_init = ktime_get_real();
 
 	return len;
 }
@@ -813,10 +813,10 @@ LDEBUGFS_SEQ_FOPS(osc_rpc_stats);
 static int osc_stats_seq_show(struct seq_file *seq, void *v)
 {
 	struct obd_device *obd = seq->private;
-	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
+	struct osc_stats *stats = &obd2osc_dev(obd)->osc_stats;
 
-	lprocfs_stats_header(seq, ktime_get(), stats->os_init, 25, ":", true,
-			     "");
+	lprocfs_stats_header(seq, ktime_get_real(), stats->os_init, 25, ":",
+			     true, "");
 	seq_printf(seq, "lockless_write_bytes\t\t%llu\n",
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
@@ -830,10 +830,10 @@ static ssize_t osc_stats_seq_write(struct file *file,
 {
 	struct seq_file *seq = file->private_data;
 	struct obd_device *obd = seq->private;
-	struct osc_stats *stats = &obd2osc_dev(obd)->od_stats;
+	struct osc_stats *stats = &obd2osc_dev(obd)->osc_stats;
 
 	memset(stats, 0, sizeof(*stats));
-	stats->os_init = ktime_get();
+	stats->os_init = ktime_get_real();
 
 	return len;
 }
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index b5776a127643..a9dc985bfa18 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -1365,7 +1365,7 @@ static int osc_completion(const struct lu_env *env, struct osc_async_page *oap,
 	/* statistic */
 	if (rc == 0 && srvlock) {
 		struct lu_device *ld = osc_page_object(opg)->oo_cl.co_lu.lo_dev;
-		struct osc_stats *stats = &lu2osc_dev(ld)->od_stats;
+		struct osc_stats *stats = &lu2osc_dev(ld)->osc_stats;
 		size_t bytes = oap->oap_count;
 
 		if (crt == CRT_READ)
diff --git a/fs/lustre/osc/osc_dev.c b/fs/lustre/osc/osc_dev.c
index 621beb6b6351..c3f9caaf9607 100644
--- a/fs/lustre/osc/osc_dev.c
+++ b/fs/lustre/osc/osc_dev.c
@@ -190,10 +190,10 @@ EXPORT_SYMBOL(osc_device_fini);
 struct lu_device *osc_device_free(const struct lu_env *env,
 				  struct lu_device *d)
 {
-	struct osc_device *od = lu2osc_dev(d);
+	struct osc_device *oc = lu2osc_dev(d);
 
 	cl_device_fini(lu2cl_dev(d));
-	kfree(od);
+	kfree(oc);
 	return NULL;
 }
 EXPORT_SYMBOL(osc_device_free);
@@ -203,16 +203,16 @@ static struct lu_device *osc_device_alloc(const struct lu_env *env,
 					  struct lustre_cfg *cfg)
 {
 	struct lu_device *d;
-	struct osc_device *od;
+	struct osc_device *osc;
 	struct obd_device *obd;
 	int rc;
 
-	od = kzalloc(sizeof(*od), GFP_NOFS);
-	if (!od)
+	osc = kzalloc(sizeof(*osc), GFP_NOFS);
+	if (!osc)
 		return ERR_PTR(-ENOMEM);
 
-	cl_device_init(&od->od_cl, t);
-	d = osc2lu_dev(od);
+	cl_device_init(&osc->osc_cl, t);
+	d = osc2lu_dev(osc);
 	d->ld_ops = &osc_lu_ops;
 
 	/* Setup OSC OBD */
@@ -223,7 +223,8 @@ static struct lu_device *osc_device_alloc(const struct lu_env *env,
 		osc_device_free(env, d);
 		return ERR_PTR(rc);
 	}
-	od->od_exp = obd->obd_self_export;
+	osc->osc_exp = obd->obd_self_export;
+	osc->osc_stats.os_init = ktime_get_real();
 	return d;
 }
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 11/42] lustre: pools: force creation of a component without a pool
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (9 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 10/42] lustre: misc: fix stats snapshot_time to use wallclock James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 12/42] lustre: sec: reserve flag for fid2path for encrypted files James Simmons
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Etienne AUJAMES <eaujames@ddn.com>

This patch add the pool type "ignore" to force the creation of
component without a pool set by inheritance (from parent or root).

The poorly-named "none" keyword, which indicates the pool name
should be inherited from the root or parent dir layout, will be
eventually replaced by the new "inherit" keyword.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15707
Lustre-commit: 6b69d22e4cb738f4f ("LU-15707 lod: force creation of a component without a pool")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46955
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_class.h           |  4 +++
 include/uapi/linux/lustre/lustre_user.h | 39 ++++++++++++++++++++-----
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 80ff4e8aa267..9edd93cbacc5 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -817,6 +817,10 @@ static inline int obd_pool_new(struct obd_device *obd, char *poolname)
 		return -EOPNOTSUPP;
 	}
 
+	/* Check poolname validity */
+	if (!poolname || poolname[0] == '\0' || lov_pool_is_reserved(poolname))
+		return -EINVAL;
+
 	rc = OBP(obd, pool_new)(obd, poolname);
 	return rc;
 }
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index c2096ba1cdbe..a2f46e7f4257 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -435,6 +435,17 @@ struct ll_ioc_lease_id {
 #define LOV_PATTERN_F_HOLE	0x40000000 /* there is hole in LOV EA */
 #define LOV_PATTERN_F_RELEASED	0x80000000 /* HSM released file */
 
+#define LOV_OFFSET_DEFAULT      ((__u16)-1)
+#define LMV_OFFSET_DEFAULT      ((__u32)-1)
+
+static inline bool lov_pattern_supported(__u32 pattern)
+{
+	return (pattern & ~LOV_PATTERN_F_RELEASED) == LOV_PATTERN_RAID0 ||
+	       (pattern & ~LOV_PATTERN_F_RELEASED) ==
+			(LOV_PATTERN_RAID0 | LOV_PATTERN_OVERSTRIPING) ||
+	       (pattern & ~LOV_PATTERN_F_RELEASED) == LOV_PATTERN_MDT;
+}
+
 /* RELEASED and MDT patterns are not valid in many places, so rather than
  * having many extra checks on lov_pattern_supported, we have this separate
  * check for non-released, non-DOM components
@@ -448,15 +459,29 @@ static inline bool lov_pattern_supported_normal_comp(__u32 pattern)
 
 #define LOV_MAXPOOLNAME 15
 #define LOV_POOLNAMEF "%.15s"
-#define LOV_OFFSET_DEFAULT      ((__u16)-1)
-#define LMV_OFFSET_DEFAULT      ((__u32)-1)
+/* The poolname "ignore" is used to force a component creation without pool */
+#define LOV_POOL_IGNORE "ignore"
+/* The poolname "inherit" is used to force a component to inherit the pool from
+ * parent or root directory
+ */
+#define LOV_POOL_INHERIT "inherit"
+/* The poolname "none" is deprecated in 2.15 (same behavior as "inherit") */
+#define LOV_POOL_NONE "none"
 
-static inline bool lov_pattern_supported(__u32 pattern)
+static inline bool lov_pool_is_ignored(const char *pool)
 {
-	return (pattern & ~LOV_PATTERN_F_RELEASED) == LOV_PATTERN_RAID0 ||
-	       (pattern & ~LOV_PATTERN_F_RELEASED) ==
-			(LOV_PATTERN_RAID0 | LOV_PATTERN_OVERSTRIPING) ||
-	       (pattern & ~LOV_PATTERN_F_RELEASED) == LOV_PATTERN_MDT;
+	return pool && strncmp(pool, LOV_POOL_IGNORE, LOV_MAXPOOLNAME) == 0;
+}
+
+static inline bool lov_pool_is_inherited(const char *pool)
+{
+	return pool && (strncmp(pool, LOV_POOL_INHERIT, LOV_MAXPOOLNAME) == 0 ||
+			strncmp(pool, LOV_POOL_NONE, LOV_MAXPOOLNAME) == 0);
+}
+
+static inline bool lov_pool_is_reserved(const char *pool)
+{
+	return lov_pool_is_ignored(pool) || lov_pool_is_inherited(pool);
 }
 
 #define LOV_MIN_STRIPE_BITS	16	/* maximum PAGE_SIZE (ia64), power of 2 */
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 12/42] lustre: sec: reserve flag for fid2path for encrypted files
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (10 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 11/42] lustre: pools: force creation of a component without a pool James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 13/42] lustre: llite: update statx size/ctime for fallocate James Simmons
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Reserve OBD_CONNECT2_ENCRYPT_FID2PATH connection flag for fid2path
support for encrypted files.
This connection flag is required so that newer servers continue to
return -ENODATA to older clients.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16205
Lustre-commit: 6f74bb60ff6c58f4a ("LU-16205 sec: reserve flag for fid2path for encrypted files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49028
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/lprocfs_status.c    | 3 +++
 fs/lustre/ptlrpc/wiretest.c            | 2 ++
 include/uapi/linux/lustre/lustre_idl.h | 1 +
 3 files changed, 6 insertions(+)

diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 5089e7cfd377..5e4ad7c227b1 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -137,6 +137,9 @@ static const char *const obd_connect_names[] = {
 	"lock_contend",			/* 0x2000000 */
 	"atomic_open_lock",		/* 0x4000000 */
 	"name_encryption",		/* 0x8000000 */
+	"mkdir_replay",			/* 0x10000000 */
+	"dmv_inherit",			/* 0x20000000 */
+	"encryption_fid2path",		/* 0x40000000 */
 	NULL
 };
 
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 66c7c1763959..daa11e5d76dc 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1243,6 +1243,8 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT2_ATOMIC_OPEN_LOCK);
 	LASSERTF(OBD_CONNECT2_ENCRYPT_NAME == 0x8000000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT2_ENCRYPT_NAME);
+	LASSERTF(OBD_CONNECT2_ENCRYPT_FID2PATH == 0x40000000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_ENCRYPT_FID2PATH);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
 		 (unsigned int)OBD_CKSUM_CRC32);
 	LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 475151ce72fb..8cf9323c3579 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -783,6 +783,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT2_LOCK_CONTENTION	  0x2000000ULL /* contention detect */
 #define OBD_CONNECT2_ATOMIC_OPEN_LOCK	  0x4000000ULL /* lock on first open */
 #define OBD_CONNECT2_ENCRYPT_NAME	  0x8000000ULL /* name encrypt */
+#define OBD_CONNECT2_ENCRYPT_FID2PATH	 0x40000000ULL /* fid2path enc file */
 /* XXX README XXX README XXX README XXX README XXX README XXX README XXX
  * Please DO NOT add OBD_CONNECT flags before first ensuring that this value
  * is not in use by some other branch/patch.  Email adilger@whamcloud.com
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 13/42] lustre: llite: update statx size/ctime for fallocate
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (11 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 12/42] lustre: sec: reserve flag for fid2path for encrypted files James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 14/42] lustre: ptlrpc: fiemap flexible array James Simmons
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

In the VFS interface ->fallocate(), it should update i_size and
i_ctime returned by statx() accordingly when the file size grows.

fallocate() call does not update the attributes on MDT.
We use STATX with cached-always mode to verify it as it will not
send Glimpse lock RPCs to OSTs to obtain file size information
and use the caching attributes (size) on the client side as much
as possible.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16334
Lustre-commit: 51851705e936b2dbc ("LU-16334 llite: update statx size/ctime for fallocate")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49221
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_io.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index be6f17f5f072..317704172080 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -754,15 +754,25 @@ static void vvp_io_setattr_end(const struct lu_env *env,
 	struct cl_io *io = ios->cis_io;
 	struct inode *inode = vvp_object_inode(io->ci_obj);
 	struct ll_inode_info *lli = ll_i2info(inode);
+	loff_t size = io->u.ci_setattr.sa_attr.lvb_size;
 
 	if (cl_io_is_trunc(io)) {
 		/* Truncate in memory pages - they must be clean pages
 		 * because osc has already notified to destroy osc_extents.
 		 */
-		vvp_do_vmtruncate(inode, io->u.ci_setattr.sa_attr.lvb_size);
+		vvp_do_vmtruncate(inode, size);
 		mutex_unlock(&lli->lli_setattr_mutex);
 		trunc_sem_up_write(&lli->lli_trunc_sem);
 	} else if (cl_io_is_fallocate(io)) {
+		int mode = io->u.ci_setattr.sa_falloc_mode;
+
+		if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+		    size > i_size_read(inode)) {
+			ll_inode_size_lock(inode);
+			i_size_write(inode, size);
+			ll_inode_size_unlock(inode);
+		}
+		inode->i_ctime = current_time(inode);
 		mutex_unlock(&lli->lli_setattr_mutex);
 		trunc_sem_up_write(&lli->lli_trunc_sem);
 	} else {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 14/42] lustre: ptlrpc: fiemap flexible array
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (12 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 13/42] lustre: llite: update statx size/ctime for fallocate James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 15/42] lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck James Simmons
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

Linux commit v5.19-rc2-1-g94dfc73e7cf4
treewide: uapi: Replace zero-length arrays with flexible-array
  members
Adjust wiretest to handle flexible array when
sizeof(fiemap->fm_extents) is undefined.

HPE-bug-id: LUS-11388
WC-bug-id: https://jira.whamcloud.com/browse/LU-16363
Lustre-commit: fedf1e8bd70ccb2aa ("LU-16363 build: fiemap flexible array")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49305
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index daa11e5d76dc..bf12435262d4 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -4223,8 +4223,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)sizeof(((struct fiemap *)0)->fm_reserved));
 	LASSERTF((int)offsetof(struct fiemap, fm_extents) == 32, "found %lld\n",
 		 (long long)(int)offsetof(struct fiemap, fm_extents));
-	LASSERTF((int)sizeof(((struct fiemap *)0)->fm_extents) == 0, "found %lld\n",
-		 (long long)(int)sizeof(((struct fiemap *)0)->fm_extents));
+	BUILD_BUG_ON(offsetof(struct fiemap, fm_extents) != sizeof(struct fiemap));
 	BUILD_BUG_ON(FIEMAP_FLAG_SYNC != 0x00000001);
 	BUILD_BUG_ON(FIEMAP_FLAG_XATTR != 0x00000002);
 	BUILD_BUG_ON(FIEMAP_FLAG_DEVICE_ORDER != 0x40000000);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 15/42] lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (13 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 14/42] lustre: ptlrpc: fiemap flexible array James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 16/42] lnet: selftest: lst read-outside of allocation James Simmons
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

- LCME_FL_PARITY should be included in wiretest

Fixes: 9eafb96659 ("lustre: ec: add necessary structure member for EC file")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16366
Lustre-commit: dce487f53a6f78355 ("LU-16366 build: Add LCME_FL_PARITY to wirecheck")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49311
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index bf12435262d4..372dc1014878 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1678,6 +1678,7 @@ void lustre_assert_wire_constants(void)
 	BUILD_BUG_ON(LCME_FL_INIT != 0x00000010);
 	BUILD_BUG_ON(LCME_FL_NOSYNC != 0x00000020);
 	BUILD_BUG_ON(LCME_FL_EXTENSION != 0x00000040);
+	BUILD_BUG_ON(LCME_FL_PARITY != 0x00000080);
 	BUILD_BUG_ON(LCME_FL_NEG != 0x80000000);
 
 	/* Checks for struct lov_comp_md_v1 */
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 16/42] lnet: selftest: lst read-outside of allocation
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (14 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 15/42] lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 17/42] lustre: misc: rename lprocfs_stats functions James Simmons
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

lnet_selftest want a some parameters from userspace,
but it never sends. It caused a read of outside of allocation
like
  BUG: KASAN: slab-out-of-bounds in lstcon_testrpc_prep+0x19e7/0x1bb0
  Read of size 4 at addr ffff8888bbaa866c by task lt-lst/6371

WC-bug-id: https://jira.whamcloud.com/browse/LU-16157
Lustre-commit: 222fbed52e02122c7 ("LU-16157 lnet: lst read-outside of allocation")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48547
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/selftest/conrpc.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/lnet/selftest/conrpc.c b/net/lnet/selftest/conrpc.c
index 8096c467041a..4f427dd85265 100644
--- a/net/lnet/selftest/conrpc.c
+++ b/net/lnet/selftest/conrpc.c
@@ -780,8 +780,13 @@ lstcon_pingrpc_prep(struct lst_test_ping_param *param, struct srpc_test_reqst *r
 {
 	struct test_ping_req *prq = &req->tsr_u.ping;
 
-	prq->png_size = param->png_size;
-	prq->png_flags = param->png_flags;
+	if (param) {
+		prq->png_size = param->png_size;
+		prq->png_flags = param->png_flags;
+	} else {
+		prq->png_size = 0;
+		prq->png_flags = 0;
+	}
 	/* TODO dest */
 	return 0;
 }
@@ -896,12 +901,17 @@ lstcon_testrpc_prep(struct lstcon_node *nd, int transop, unsigned int feats,
 	trq->tsr_stop_onerr = !!test->tes_stop_onerr;
 
 	switch (test->tes_type) {
-	case LST_TEST_PING:
+	case LST_TEST_PING: {
+		struct lst_test_ping_param *data = NULL;
+
 		trq->tsr_service = SRPC_SERVICE_PING;
-		rc = lstcon_pingrpc_prep((struct lst_test_ping_param *)
-					 &test->tes_param[0], trq);
-		break;
+		if (test->tes_paramlen)
+			data = ((struct lst_test_ping_param *)
+				&test->tes_param[0]);
 
+		rc = lstcon_pingrpc_prep(data, trq);
+		break;
+	}
 	case LST_TEST_BULK:
 		trq->tsr_service = SRPC_SERVICE_BRW;
 		if (!(feats & LST_FEAT_BULK_LEN)) {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 17/42] lustre: misc: rename lprocfs_stats functions
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (15 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 16/42] lnet: selftest: lst read-outside of allocation James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 18/42] lustre: osc: Fix possible null pointer James Simmons
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Rename lprocfs_{alloc,register,clear,free}_stats() to be
lprocfs_stats_*() so these functions can be found more easily
in relation to struct lprocfs_stats.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16231
Lustre-commit: 6293a8723d850d72e ("LU-16231 misc: rename lprocfs_stats functions")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48847
Reviewed-by: Ellis Wilson <elliswilson@microsoft.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lprocfs_status.h  |  6 +++---
 fs/lustre/ldlm/ldlm_pool.c          |  4 ++--
 fs/lustre/ldlm/ldlm_resource.c      |  4 ++--
 fs/lustre/llite/lproc_llite.c       | 14 +++++++-------
 fs/lustre/obdclass/lprocfs_status.c | 22 +++++++++++-----------
 fs/lustre/obdclass/lu_object.c      |  5 +++--
 fs/lustre/ptlrpc/lproc_ptlrpc.c     |  6 +++---
 7 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/fs/lustre/include/lprocfs_status.h b/fs/lustre/include/lprocfs_status.h
index ef923ae4ca37..c7a6dd71a931 100644
--- a/fs/lustre/include/lprocfs_status.h
+++ b/fs/lustre/include/lprocfs_status.h
@@ -439,9 +439,9 @@ u64 lprocfs_stats_collector(struct lprocfs_stats *stats, int idx,
 			    enum lprocfs_fields_flags field);
 
 extern struct lprocfs_stats *
-lprocfs_alloc_stats(unsigned int num, enum lprocfs_stats_flags flags);
-void lprocfs_clear_stats(struct lprocfs_stats *stats);
-void lprocfs_free_stats(struct lprocfs_stats **stats);
+lprocfs_stats_alloc(unsigned int num, enum lprocfs_stats_flags flags);
+void lprocfs_stats_clear(struct lprocfs_stats *stats);
+void lprocfs_stats_free(struct lprocfs_stats **stats);
 int ldebugfs_alloc_md_stats(struct obd_device *obd,
 			    unsigned int num_private_stats);
 void ldebugfs_free_md_stats(struct obd_device *obd);
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 8c6491d97f73..f0e629c5dd24 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -597,7 +597,7 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl)
 	ldlm_add_var(&pool_vars[0], pl->pl_debugfs_entry, "state", pl,
 		     &lprocfs_pool_state_fops);
 
-	pl->pl_stats = lprocfs_alloc_stats(LDLM_POOL_LAST_STAT -
+	pl->pl_stats = lprocfs_stats_alloc(LDLM_POOL_LAST_STAT -
 					   LDLM_POOL_FIRST_STAT, 0);
 	if (!pl->pl_stats) {
 		rc = -ENOMEM;
@@ -652,7 +652,7 @@ static void ldlm_pool_sysfs_fini(struct ldlm_pool *pl)
 static void ldlm_pool_debugfs_fini(struct ldlm_pool *pl)
 {
 	if (pl->pl_stats) {
-		lprocfs_free_stats(&pl->pl_stats);
+		lprocfs_stats_free(&pl->pl_stats);
 		pl->pl_stats = NULL;
 	}
 	debugfs_remove_recursive(pl->pl_debugfs_entry);
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 866f31dfdfa2..98c3e4fb4466 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -430,7 +430,7 @@ static void ldlm_namespace_debugfs_unregister(struct ldlm_namespace *ns)
 	debugfs_remove_recursive(ns->ns_debugfs_entry);
 
 	if (ns->ns_stats)
-		lprocfs_free_stats(&ns->ns_stats);
+		lprocfs_stats_free(&ns->ns_stats);
 }
 
 static void ldlm_namespace_sysfs_unregister(struct ldlm_namespace *ns)
@@ -448,7 +448,7 @@ static int ldlm_namespace_sysfs_register(struct ldlm_namespace *ns)
 	err = kobject_init_and_add(&ns->ns_kobj, &ldlm_ns_ktype, NULL,
 				   "%s", ldlm_ns_name(ns));
 
-	ns->ns_stats = lprocfs_alloc_stats(LDLM_NSS_LAST, 0);
+	ns->ns_stats = lprocfs_stats_alloc(LDLM_NSS_LAST, 0);
 	if (!ns->ns_stats) {
 		kobject_put(&ns->ns_kobj);
 		return -ENOMEM;
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 7157886c31cc..c4a514aa1e4f 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -680,7 +680,7 @@ static ssize_t ll_wr_track_id(struct kobject *kobj, const char *buffer,
 		sbi->ll_stats_track_type = STATS_TRACK_ALL;
 	else
 		sbi->ll_stats_track_type = type;
-	lprocfs_clear_stats(sbi->ll_stats);
+	lprocfs_stats_clear(sbi->ll_stats);
 	return count;
 }
 
@@ -1885,7 +1885,7 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
 			    &ll_rw_offset_stats_fops);
 
 	/* File operations stats */
-	sbi->ll_stats = lprocfs_alloc_stats(LPROC_LL_FILE_OPCODES,
+	sbi->ll_stats = lprocfs_stats_alloc(LPROC_LL_FILE_OPCODES,
 					    LPROCFS_STATS_FLAG_NONE);
 	if (!sbi->ll_stats) {
 		err = -ENOMEM;
@@ -1902,7 +1902,7 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
 	debugfs_create_file("stats", 0644, sbi->ll_debugfs_entry, sbi->ll_stats,
 			    &lprocfs_stats_seq_fops);
 
-	sbi->ll_ra_stats = lprocfs_alloc_stats(ARRAY_SIZE(ra_stat_string),
+	sbi->ll_ra_stats = lprocfs_stats_alloc(ARRAY_SIZE(ra_stat_string),
 					       LPROCFS_STATS_FLAG_NONE);
 	if (!sbi->ll_ra_stats) {
 		err = -ENOMEM;
@@ -1933,9 +1933,9 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
 	return 0;
 
 out_ra_stats:
-	lprocfs_free_stats(&sbi->ll_ra_stats);
+	lprocfs_stats_free(&sbi->ll_ra_stats);
 out_stats:
-	lprocfs_free_stats(&sbi->ll_stats);
+	lprocfs_stats_free(&sbi->ll_stats);
 out_debugfs:
 	debugfs_remove_recursive(sbi->ll_debugfs_entry);
 
@@ -1962,8 +1962,8 @@ void ll_debugfs_unregister_super(struct super_block *sb)
 	kset_unregister(&sbi->ll_kset);
 	wait_for_completion(&sbi->ll_kobj_unregister);
 
-	lprocfs_free_stats(&sbi->ll_ra_stats);
-	lprocfs_free_stats(&sbi->ll_stats);
+	lprocfs_stats_free(&sbi->ll_ra_stats);
+	lprocfs_stats_free(&sbi->ll_stats);
 }
 
 static void ll_display_extents_info(struct ll_rw_extents_info *rw_extents,
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 5e4ad7c227b1..0d669f4dde15 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1205,7 +1205,7 @@ int lprocfs_stats_alloc_one(struct lprocfs_stats *stats, unsigned int cpuid)
 	return rc;
 }
 
-struct lprocfs_stats *lprocfs_alloc_stats(unsigned int num,
+struct lprocfs_stats *lprocfs_stats_alloc(unsigned int num,
 					  enum lprocfs_stats_flags flags)
 {
 	struct lprocfs_stats *stats;
@@ -1259,12 +1259,12 @@ struct lprocfs_stats *lprocfs_alloc_stats(unsigned int num,
 	return stats;
 
 fail:
-	lprocfs_free_stats(&stats);
+	lprocfs_stats_free(&stats);
 	return NULL;
 }
-EXPORT_SYMBOL(lprocfs_alloc_stats);
+EXPORT_SYMBOL(lprocfs_stats_alloc);
 
-void lprocfs_free_stats(struct lprocfs_stats **statsh)
+void lprocfs_stats_free(struct lprocfs_stats **statsh)
 {
 	struct lprocfs_stats *stats = *statsh;
 	unsigned int num_entry;
@@ -1286,7 +1286,7 @@ void lprocfs_free_stats(struct lprocfs_stats **statsh)
 	kvfree(stats->ls_cnt_header);
 	kvfree(stats);
 }
-EXPORT_SYMBOL(lprocfs_free_stats);
+EXPORT_SYMBOL(lprocfs_stats_free);
 
 u64 lprocfs_stats_collector(struct lprocfs_stats *stats, int idx,
 			      enum lprocfs_fields_flags field)
@@ -1312,12 +1312,12 @@ u64 lprocfs_stats_collector(struct lprocfs_stats *stats, int idx,
 }
 EXPORT_SYMBOL(lprocfs_stats_collector);
 
-void lprocfs_clear_stats(struct lprocfs_stats *stats)
+void lprocfs_stats_clear(struct lprocfs_stats *stats)
 {
 	struct lprocfs_counter *percpu_cntr;
-	int i, j;
 	unsigned int num_entry;
 	unsigned long flags = 0;
+	int i, j;
 
 	num_entry = lprocfs_stats_lock(stats, LPROCFS_GET_NUM_CPU, &flags);
 
@@ -1347,7 +1347,7 @@ void lprocfs_clear_stats(struct lprocfs_stats *stats)
 
 	lprocfs_stats_unlock(stats, LPROCFS_GET_NUM_CPU, &flags);
 }
-EXPORT_SYMBOL(lprocfs_clear_stats);
+EXPORT_SYMBOL(lprocfs_stats_clear);
 
 static ssize_t lprocfs_stats_seq_write(struct file *file,
 				       const char __user *buf,
@@ -1356,7 +1356,7 @@ static ssize_t lprocfs_stats_seq_write(struct file *file,
 	struct seq_file *seq = file->private_data;
 	struct lprocfs_stats *stats = seq->private;
 
-	lprocfs_clear_stats(stats);
+	lprocfs_stats_clear(stats);
 
 	return len;
 }
@@ -1601,7 +1601,7 @@ int ldebugfs_alloc_md_stats(struct obd_device *obd,
 		return -EINVAL;
 
 	num_stats = ARRAY_SIZE(mps_stats) + num_private_stats;
-	stats = lprocfs_alloc_stats(num_stats, 0);
+	stats = lprocfs_stats_alloc(num_stats, 0);
 	if (!stats)
 		return -ENOMEM;
 
@@ -1630,7 +1630,7 @@ void ldebugfs_free_md_stats(struct obd_device *obd)
 		return;
 
 	obd->obd_md_stats = NULL;
-	lprocfs_free_stats(&stats);
+	lprocfs_stats_free(&stats);
 }
 EXPORT_SYMBOL(ldebugfs_free_md_stats);
 
diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c
index 7ecd0c4faee7..468dd43b6151 100644
--- a/fs/lustre/obdclass/lu_object.c
+++ b/fs/lustre/obdclass/lu_object.c
@@ -1091,7 +1091,7 @@ int lu_site_init(struct lu_site *s, struct lu_device *top)
 		init_waitqueue_head(&bkt->lsb_waitq);
 	}
 
-	s->ls_stats = lprocfs_alloc_stats(LU_SS_LAST_STAT, 0);
+	s->ls_stats = lprocfs_stats_alloc(LU_SS_LAST_STAT, 0);
 	if (!s->ls_stats) {
 		kvfree(s->ls_bkts);
 		s->ls_bkts = NULL;
@@ -1147,7 +1147,7 @@ void lu_site_fini(struct lu_site *s)
 	}
 
 	if (s->ls_stats)
-		lprocfs_free_stats(&s->ls_stats);
+		lprocfs_stats_free(&s->ls_stats);
 }
 EXPORT_SYMBOL(lu_site_fini);
 
@@ -1163,6 +1163,7 @@ int lu_site_init_finish(struct lu_site *s)
 	if (result == 0)
 		list_add(&s->ls_linkage, &lu_sites);
 	up_write(&lu_sites_guard);
+
 	return result;
 }
 EXPORT_SYMBOL(lu_site_init_finish);
diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index 603259dcd3eb..e0b85bd9f74e 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -198,7 +198,7 @@ ptlrpc_ldebugfs_register(struct dentry *root, char *dir,
 	LASSERT(!*debugfs_root_ret);
 	LASSERT(!*stats_ret);
 
-	svc_stats = lprocfs_alloc_stats(EXTRA_MAX_OPCODES + LUSTRE_MAX_OPCODES,
+	svc_stats = lprocfs_stats_alloc(EXTRA_MAX_OPCODES + LUSTRE_MAX_OPCODES,
 					0);
 	if (!svc_stats)
 		return;
@@ -1257,7 +1257,7 @@ void ptlrpc_lprocfs_unregister_service(struct ptlrpc_service *svc)
 	debugfs_remove_recursive(svc->srv_debugfs_entry);
 
 	if (svc->srv_stats)
-		lprocfs_free_stats(&svc->srv_stats);
+		lprocfs_stats_free(&svc->srv_stats);
 }
 
 void ptlrpc_lprocfs_unregister_obd(struct obd_device *obd)
@@ -1269,7 +1269,7 @@ void ptlrpc_lprocfs_unregister_obd(struct obd_device *obd)
 	debugfs_remove_recursive(obd->obd_svc_debugfs_entry);
 
 	if (obd->obd_svc_stats)
-		lprocfs_free_stats(&obd->obd_svc_stats);
+		lprocfs_stats_free(&obd->obd_svc_stats);
 }
 EXPORT_SYMBOL(ptlrpc_lprocfs_unregister_obd);
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 18/42] lustre: osc: Fix possible null pointer
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (16 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 17/42] lustre: misc: rename lprocfs_stats functions James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 19/42] lustre: ptlrpc: NUL terminate long jobid strings James Simmons
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Change init to fix possible null pointer access.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15014
Lustre-commit: 20b56835b82c5d21c ("LU-15014 osc: Fix possible null pointer")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44975
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index a9dc985bfa18..b339aeff0828 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -2300,14 +2300,12 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops,
 			struct cl_page *page, loff_t offset)
 {
 	struct osc_async_page *oap = &ops->ops_oap;
-	struct page *vmpage = page->cp_vmpage;
 
 	if (!page)
-		return -EIO;
+		return cfs_size_round(sizeof(*oap));
 
 	oap->oap_obj = osc;
-
-	oap->oap_page = vmpage;
+	oap->oap_page = page->cp_vmpage;
 	oap->oap_obj_off = offset;
 	LASSERT(!(offset & ~PAGE_MASK));
 
@@ -2323,7 +2321,7 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops,
 	INIT_LIST_HEAD(&oap->oap_rpc_item);
 
 	CDEBUG(D_INFO, "oap %p vmpage %p obj off %llu\n",
-	       oap, vmpage, oap->oap_obj_off);
+	       oap, oap->oap_page, oap->oap_obj_off);
 	return 0;
 }
 EXPORT_SYMBOL(osc_prep_async_page);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 19/42] lustre: ptlrpc: NUL terminate long jobid strings
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (17 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 18/42] lustre: osc: Fix possible null pointer James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 20/42] lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h James Simmons
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

It appears that some jobid names can be sent that are using the full
32-byte size, rather than containing an embedded NUL terminator. This
caused errors in lprocfs_job_stats_log() server side when it overflowed.

If there is no NUL terminator in lustre_msg_get_jobid() then add one
if not found within the buffer, so that the rest of the code doesn't
have to deal with unterminated strings.

This potentially exposes a larger issue that other places may not be
handling the unterminated string properly either, which needs to be
addressed separately on both the client and server.  Terminating the
jobid to 31 chars only on the client does not totally solve the issue,
since there will still be older clients that are not doing this, so
the server needs to handle this in any case.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16376
Lustre-commit: 9eba5d57297f807fd ("LU-16376 obdclass: NUL terminate long jobid strings")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49351
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/pack_generic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index 9a0341c62702..3499611a0740 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -1198,6 +1198,12 @@ char *lustre_msg_get_jobid(struct lustre_msg *msg)
 		if (!pb)
 			return NULL;
 
+		/* If clients send unterminated jobids, terminate them here
+		 * so that there is no chance of string overflow later.
+		 */
+		if (unlikely(pb->pb_jobid[LUSTRE_JOBID_SIZE - 1] != '\0'))
+			pb->pb_jobid[LUSTRE_JOBID_SIZE - 1] = '\0';
+
 		return pb->pb_jobid;
 	}
 	default:
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 20/42] lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (18 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 19/42] lustre: ptlrpc: NUL terminate long jobid strings James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 21/42] lnet: handles unregister/register events James Simmons
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

The lustre_user.h header uses the non-standard strchrnul() function
in userspace.

Implement an alternative approach to avoid external dependencies on
the lustre_user.h header.

Fixes: 90c2d33680 ("lustre: uapi: avoid gcc-11 -Werror=stringop-overread warning")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16335
Lustre-commit: efc5c8d4de60d3943 ("LU-16335 build: remove _GNU_SOURCE dependency in lustre_user.h")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49328
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_user.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index a2f46e7f4257..9bbb1c9ab0b9 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1715,9 +1715,11 @@ static inline __kernel_size_t changelog_rec_snamelen(struct changelog_rec *rec)
 
 static inline char *changelog_rec_sname(struct changelog_rec *rec)
 {
-	char *cr_name = changelog_rec_name(rec);
+	char *str = changelog_rec_name(rec);
 
-	return cr_name + strlen(cr_name) + 1;
+	while (*str != '\0')
+		str++;
+	return str + 1;
 }
 
 /**
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 21/42] lnet: handles unregister/register events
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (19 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 20/42] lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 22/42] lustre: update version to 2.15.53 James Simmons
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

When network is restarted, devices are unregistered and then
registered again. When a device registers using an index that is
different from the previous one (before network was restarted), LNet
ignores it. Consequently, this device stays with link in fatal state.

To fix that, we catch unregistering events to clear the saved index
value, and when a registering event comes, we save the new value.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16378
Lustre-commit: 3c9282a67d73799a0 ("LU-16378 lnet: handles unregister/register events")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49375
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index d8d1071d40f4..07e056845b24 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -2010,10 +2010,30 @@ ksocknal_handle_link_state_change(struct net_device *dev,
 		sa = (void *)&ksi->ksni_addr;
 		found_ip = false;
 
-		if (ksi->ksni_index != ifindex ||
-		    strcmp(ksi->ksni_name, dev->name))
+		if (strcmp(ksi->ksni_name, dev->name))
+			continue;
+
+		if (ksi->ksni_index == -1) {
+			if (dev->reg_state != NETREG_REGISTERED)
+				continue;
+			/* A registration just happened: save the new index for
+			 * the device
+			 */
+			ksi->ksni_index = ifindex;
+			goto out;
+		}
+
+		if (ksi->ksni_index != ifindex)
 			continue;
 
+		if (dev->reg_state == NETREG_UNREGISTERING) {
+			/* Device is being unregitering, we need to clear the
+			 * index, it can change when device will be back
+			 */
+			ksi->ksni_index = -1;
+			goto out;
+		}
+
 		ni = net->ksnn_ni;
 
 		in_dev = __in_dev_get_rtnl(dev);
@@ -2108,6 +2128,8 @@ static int ksocknal_device_event(struct notifier_block *unused,
 	case NETDEV_UP:
 	case NETDEV_DOWN:
 	case NETDEV_CHANGE:
+	case NETDEV_REGISTER:
+	case NETDEV_UNREGISTER:
 		ksocknal_handle_link_state_change(dev, operstate);
 		break;
 	}
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 22/42] lustre: update version to 2.15.53
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (20 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 21/42] lnet: handles unregister/register events James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection James Simmons
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.15.53

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index d9a684738be5..962674281298 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 15
-#define LUSTRE_PATCH 52
+#define LUSTRE_PATCH 53
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.15.52"
+#define LUSTRE_VERSION_STRING "2.15.53"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (21 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 22/42] lustre: update version to 2.15.53 James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 24/42] lustre: move to kobj_type default_groups James Simmons
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Boyko, Lustre Development List

From: Alexander Boyko <alexander.boyko@hpe.com>

ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

HPE-bug-id: LUS-10985
WC-bug-id: https://jira.whamcloud.com/browse/LU-16297
Lustre-commit: df31c4c0b39b88459 ("LU-16297 ptlrpc: don't panic during reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/niobuf.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 670bfb0de02f..09f68157b883 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -579,13 +579,20 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 
 	/**
 	 * For enabled AT all request should have AT_SUPPORT in the
-	 * FULL import state when OBD_CONNECT_AT is set
+	 * FULL import state when OBD_CONNECT_AT is set.
+	 * This check has a race with ptlrpc_connect_import_locked()
+	 * with low chance, don't panic, only report.
 	 */
-	LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
-		(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
-		!(imp->imp_connect_data.ocd_connect_flags &
-		OBD_CONNECT_AT));
-
+	if (!(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
+	    (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
+	    !(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT))) {
+		DEBUG_REQ(D_HA, request,
+			  "Wrong state of import detected, AT=%d, imp=%d, msghdr=%d, conn=%d\n",
+			  AT_OFF, imp->imp_state != LUSTRE_IMP_FULL,
+			  (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT),
+			  !(imp->imp_connect_data.ocd_connect_flags &
+			    OBD_CONNECT_AT));
+	}
 	if (request->rq_resend)
 		lustre_msg_add_flags(request->rq_reqmsg, MSG_RESENT);
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 24/42] lustre: move to kobj_type default_groups
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (22 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 25/42] lnet: increase transaction timeout James Simmons
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

Linux commit v5.1-rc3-29-gaa30f47cf666
  kobject: Add support for default attribute groups to kobj_type

Linux commit v5.18-rc1-2-gcdb4f26a63c3
  kobject: kobj_type: remove default_attrs

Switch to using kobj_type default_groups.

HPE-bug-id: LUS-11196
WC-bug-id: https://jira.whamcloud.com/browse/LU-16120
Lustre-commit: 62e9d055d9516ec6a ("LU-16120 build: Add support for kobj_type default_groups")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48365
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_class.h   | 39 +++++++++++++++++++++++++++++++++
 fs/lustre/ldlm/ldlm_pool.c      |  4 +++-
 fs/lustre/ldlm/ldlm_resource.c  |  4 +++-
 fs/lustre/llite/lproc_llite.c   |  4 +++-
 fs/lustre/lmv/lproc_lmv.c       |  4 +++-
 fs/lustre/lov/lproc_lov.c       |  4 +++-
 fs/lustre/mdc/lproc_mdc.c       |  4 +++-
 fs/lustre/obdclass/obd_config.c | 15 +++----------
 fs/lustre/osc/lproc_osc.c       |  4 +++-
 fs/lustre/ptlrpc/lproc_ptlrpc.c |  4 +++-
 10 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 9edd93cbacc5..81ef59e01956 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1750,4 +1750,43 @@ extern u64 obd_heat_get(struct obd_heat_instance *instance,
 			unsigned int period_second);
 extern void obd_heat_clear(struct obd_heat_instance *instance, int count);
 
+/* struct kobj_type */
+#define KOBJ_ATTR_GROUPS(_name)		_name##_groups
+
+static inline
+struct attribute *_get_attr_matches(const struct kobj_type *typ,
+				    const char *key, size_t keylen,
+				    int (*is_match)(const char *, const char *,
+						    size_t))
+{
+	int i;
+
+	for (i = 0; typ->default_groups[i]; i++) {
+		struct attribute **attrs;
+		int k;
+
+		attrs = (struct attribute **)typ->default_groups[i]->attrs;
+		for (k = 0; attrs[k]; k++) {
+			if (is_match(attrs[k]->name, key, keylen))
+				return (struct attribute *)attrs[k];
+		}
+	}
+
+	return NULL;
+}
+
+static inline
+int _attr_name_starts_with(const char *attr_name, const char *name, size_t len)
+{
+	return !strncmp(attr_name, name, len);
+}
+
+static inline
+struct attribute *get_attr_starts_with(const struct kobj_type *typ,
+				       const char *name,
+				       size_t len)
+{
+	return _get_attr_matches(typ, name, len, _attr_name_starts_with);
+}
+
 #endif /* __LINUX_OBD_CLASS_H */
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index f0e629c5dd24..6fce509f7f31 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -549,6 +549,8 @@ static struct attribute *ldlm_pl_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(ldlm_pl);
+
 static void ldlm_pl_release(struct kobject *kobj)
 {
 	struct ldlm_pool *pl = container_of(kobj, struct ldlm_pool,
@@ -557,7 +559,7 @@ static void ldlm_pl_release(struct kobject *kobj)
 }
 
 static struct kobj_type ldlm_pl_ktype = {
-	.default_attrs	= ldlm_pl_attrs,
+	.default_groups = KOBJ_ATTR_GROUPS(ldlm_pl),
 	.sysfs_ops	= &lustre_sysfs_ops,
 	.release	= ldlm_pl_release,
 };
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 98c3e4fb4466..9a269cb326a0 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -419,8 +419,10 @@ static void ldlm_ns_release(struct kobject *kobj)
 	complete(&ns->ns_kobj_unregister);
 }
 
+ATTRIBUTE_GROUPS(ldlm_ns);
+
 static struct kobj_type ldlm_ns_ktype = {
-	.default_attrs	= ldlm_ns_attrs,
+	.default_groups = KOBJ_ATTR_GROUPS(ldlm_ns),
 	.sysfs_ops	= &lustre_sysfs_ops,
 	.release	= ldlm_ns_release,
 };
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index c4a514aa1e4f..3d64a936331a 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1759,6 +1759,8 @@ static struct attribute *llite_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(llite); /* creates llite_groups */
+
 static void sbi_kobj_release(struct kobject *kobj)
 {
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
@@ -1767,7 +1769,7 @@ static void sbi_kobj_release(struct kobject *kobj)
 }
 
 static struct kobj_type sbi_ktype = {
-	.default_attrs	= llite_attrs,
+	.default_groups	= KOBJ_ATTR_GROUPS(llite),
 	.sysfs_ops	= &lustre_sysfs_ops,
 	.release	= sbi_kobj_release,
 };
diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c
index 6d4e8d986f8b..3b8181f51080 100644
--- a/fs/lustre/lmv/lproc_lmv.c
+++ b/fs/lustre/lmv/lproc_lmv.c
@@ -290,11 +290,13 @@ static struct attribute *lmv_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(lmv); /* creates lmv_groups */
+
 int lmv_tunables_init(struct obd_device *obd)
 {
 	int rc;
 
-	obd->obd_ktype.default_attrs = lmv_attrs;
+	obd->obd_ktype.default_groups = KOBJ_ATTR_GROUPS(lmv);
 	rc = lprocfs_obd_setup(obd, true);
 	if (rc)
 		return rc;
diff --git a/fs/lustre/lov/lproc_lov.c b/fs/lustre/lov/lproc_lov.c
index 95fb4ef91928..05cbde6e6e99 100644
--- a/fs/lustre/lov/lproc_lov.c
+++ b/fs/lustre/lov/lproc_lov.c
@@ -277,12 +277,14 @@ static struct attribute *lov_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(lov); /* creates lov_groups */
+
 int lov_tunables_init(struct obd_device *obd)
 {
 	struct lov_obd *lov = &obd->u.lov;
 	int rc;
 
-	obd->obd_ktype.default_attrs = lov_attrs;
+	obd->obd_ktype.default_groups = KOBJ_ATTR_GROUPS(lov);
 	rc = lprocfs_obd_setup(obd, false);
 	if (rc)
 		return rc;
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index b59bba3595e3..fa799c525f46 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -754,11 +754,13 @@ static struct attribute *mdc_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(mdc); /* creates mdc_groups */
+
 int mdc_tunables_init(struct obd_device *obd)
 {
 	int rc;
 
-	obd->obd_ktype.default_attrs = mdc_attrs;
+	obd->obd_ktype.default_groups = KOBJ_ATTR_GROUPS(mdc);
 	obd->obd_vars = lprocfs_mdc_obd_vars;
 
 	rc = lprocfs_obd_setup(obd, false);
diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 75fc6a632cda..953f544b410d 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -1080,7 +1080,7 @@ ssize_t class_modify_config(struct lustre_cfg *lcfg, const char *prefix,
 	}
 
 	typ = get_ktype(kobj);
-	if (!typ || !typ->default_attrs)
+	if (!typ || !typ->default_groups)
 		return -ENODEV;
 
 	print_lustre_cfg(lcfg);
@@ -1091,11 +1091,10 @@ ssize_t class_modify_config(struct lustre_cfg *lcfg, const char *prefix,
 	 * or   lctl conf_param lustre-OST0000.osc.max_dirty_mb=36
 	 */
 	for (i = 1; i < lcfg->lcfg_bufcount; i++) {
-		struct attribute *attr;
+		struct attribute *attr = NULL;
 		size_t keylen;
 		char *value;
 		char *key;
-		int j;
 
 		key = lustre_cfg_buf(lcfg, i);
 		/* Strip off prefix */
@@ -1116,15 +1115,7 @@ ssize_t class_modify_config(struct lustre_cfg *lcfg, const char *prefix,
 		keylen = value - key;
 		value++;
 
-		attr = NULL;
-		for (j = 0; typ->default_attrs[j]; j++) {
-			if (!strncmp(typ->default_attrs[j]->name, key,
-				     keylen)) {
-				attr = typ->default_attrs[j];
-				break;
-			}
-		}
-
+		attr = get_attr_starts_with(typ, key, keylen);
 		if (!attr) {
 			char *envp[4], *param, *path;
 
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index b458a867c31f..d3131ee7eef1 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -872,12 +872,14 @@ static struct attribute *osc_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(osc); /* creates osc_groups */
+
 int osc_tunables_init(struct obd_device *obd)
 {
 	int rc;
 
 	obd->obd_vars = lprocfs_osc_obd_vars;
-	obd->obd_ktype.default_attrs = osc_attrs;
+	obd->obd_ktype.default_groups = KOBJ_ATTR_GROUPS(osc);
 	rc = lprocfs_obd_setup(obd, false);
 	if (rc)
 		return rc;
diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c
index e0b85bd9f74e..f3f8a7115ade 100644
--- a/fs/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c
@@ -1125,6 +1125,8 @@ static struct attribute *ptlrpc_svc_attrs[] = {
 	NULL,
 };
 
+ATTRIBUTE_GROUPS(ptlrpc_svc); /* creates ptlrpc_svc_groups */
+
 static void ptlrpc_sysfs_svc_release(struct kobject *kobj)
 {
 	struct ptlrpc_service *svc = container_of(kobj, struct ptlrpc_service,
@@ -1134,7 +1136,7 @@ static void ptlrpc_sysfs_svc_release(struct kobject *kobj)
 }
 
 static struct kobj_type ptlrpc_svc_ktype = {
-	.default_attrs	= ptlrpc_svc_attrs,
+	.default_groups = KOBJ_ATTR_GROUPS(ptlrpc_svc),
 	.sysfs_ops	= &lustre_sysfs_ops,
 	.release	= ptlrpc_sysfs_svc_release,
 };
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 25/42] lnet: increase transaction timeout
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (23 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 24/42] lustre: move to kobj_type default_groups James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 26/42] lnet: Allow IP specification James Simmons
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

In LU-13145, it was decided to increase default transaction timeout
LNET_TRANSACTION_TIMEOUT_DEFAULT) to 150s. But, in the associated
patch, it was set to 50s. This modification will also modify
lnd_timeout (from 16 to 49).

WC-bug-id: https://jira.whamcloud.com/browse/LU-15288
Lustre-commit: 18b4e28f18d55291f ("LU-15288 lnet: increase transaction timeout")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45780
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index e400de72545c..18dc3e7cccc6 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -163,7 +163,7 @@ module_param(lnet_drop_asym_route, drop_asym_route, 0644);
 MODULE_PARM_DESC(lnet_drop_asym_route,
 		 "Set to 1 to drop asymmetrical route messages.");
 
-#define LNET_TRANSACTION_TIMEOUT_DEFAULT 50
+#define LNET_TRANSACTION_TIMEOUT_DEFAULT 150
 unsigned int lnet_transaction_timeout = LNET_TRANSACTION_TIMEOUT_DEFAULT;
 static int transaction_to_set(const char *val, const struct kernel_param *kp);
 static struct kernel_param_ops param_ops_transaction_timeout = {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 26/42] lnet: Allow IP specification
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (24 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 25/42] lnet: increase transaction timeout James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 27/42] lustre: obdclass: fix T10PI prototypes James Simmons
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Frank Sehr, Lustre Development List

From: Frank Sehr <fsehr@whamcloud.com>

Allows selecting an interface by specifying an IP address in the NID.
All variations of interface and IP address are considered.

1 no interface and no IP address is specified: Select first interface
2 interface and no IP: Select main IP address
3 no interface and IP specified: Select first interface
  that has the IP address
4 interface and IP specified: Verify that interface and IP match

The change does not have any effect on current configurations and
will be active when the changes in lnetctl, YAML or
module parameter are applied.
This patch effects only socklnd component. A macro is defined in
lnet-types.h to check if an IP address is set (IPV4 or IPV6).
Further IPV6 changes are not integrated.

For further reference please read

IP specification in LNet
https://wiki.whamcloud.com/display/LNet/IP+specification+in+LNet

WC-bug-id: https://jira.whamcloud.com/browse/LU-13642
Lustre-commit: 14cdcd61985aa0209 ("LU-13642 lnet: Allow IP specification")
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47660
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h        |  3 ++
 include/uapi/linux/lnet/lnet-types.h | 37 ++++++++++++-------
 net/lnet/klnds/socklnd/socklnd.c     | 53 +++++++++++++---------------
 net/lnet/lnet/config.c               | 50 ++++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 42 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index ba68d50e677d..25289f5bba39 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -838,6 +838,9 @@ struct lnet_inetdev {
 
 int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns,
 			bool v6);
+int lnet_inet_select(struct lnet_ni *ni, struct lnet_inetdev *ifaces,
+		     int num_ifaces);
+
 void lnet_sock_setbuf(struct socket *socket, int txbufsize, int rxbufsize);
 void lnet_sock_getbuf(struct socket *socket, int *txbufsize, int *rxbufsize);
 int lnet_sock_getaddr(struct socket *socket, bool remote,
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 304add9957ef..8a1d2d749b4b 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -90,11 +90,6 @@ static inline __u32 LNET_NIDNET(lnet_nid_t nid)
 	return (nid >> 32) & 0xffffffff;
 }
 
-static inline lnet_nid_t LNET_MKNID(__u32 net, __u32 addr)
-{
-	return (((__u64)net) << 32) | addr;
-}
-
 static inline __u32 LNET_NETNUM(__u32 net)
 {
 	return net & 0xffff;
@@ -110,25 +105,41 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
 	return (type << 16) | num;
 }
 
+static inline lnet_nid_t LNET_MKNID(__u32 net, __u32 addr)
+{
+	return (((__u64)net) << 32) | addr;
+}
+
 /** The lolnd NID (i.e. myself) */
 #define LNET_NID_LO_0 LNET_MKNID(LNET_MKNET(LOLND, 0), 0)
 
 #define LNET_NET_ANY LNET_NIDNET(LNET_NID_ANY)
 
-/* check for address set */
-static inline bool nid_addr_is_set(const struct lnet_nid *nid)
+static inline bool nid_is_nid4(const struct lnet_nid *nid)
 {
-	int sum = 0, i;
+	return NID_ADDR_BYTES(nid) == 4;
+}
 
-	for (i = 0; i < NID_ADDR_BYTES(nid); i++)
-		sum |= nid->nid_addr[i];
+static inline bool nid_is_ipv4(const struct lnet_nid *nid)
+{
+	return NID_ADDR_BYTES(nid) == 4;
+}
 
-	return sum ? true : false;
+static inline bool nid_is_ipv6(const struct lnet_nid *nid)
+{
+	return NID_ADDR_BYTES(nid) == 16;
 }
 
-static inline int nid_is_nid4(const struct lnet_nid *nid)
+/* check for address set */
+static inline bool nid_addr_is_set(const struct lnet_nid *nid)
 {
-	return NID_ADDR_BYTES(nid) == 4;
+	int i;
+
+	for (i = 0; i < NID_ADDR_BYTES(nid); i++)
+		if (nid->nid_addr[i])
+			return true;
+
+	return false;
 }
 
 /* LOLND may not be defined yet, so we cannot use an inline */
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 07e056845b24..0a4fb966f498 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -2542,8 +2542,7 @@ ksocknal_startup(struct lnet_ni *ni)
 	struct ksock_net *net;
 	struct ksock_interface *ksi = NULL;
 	struct lnet_inetdev *ifaces = NULL;
-	int i = 0;
-	int rc;
+	int rc, if_idx;
 
 	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
 
@@ -2555,7 +2554,7 @@ ksocknal_startup(struct lnet_ni *ni)
 
 	net = kzalloc(sizeof(*net), GFP_NOFS);
 	if (!net)
-		goto fail_0;
+		goto out_base;
 
 	net->ksnn_incarnation = ktime_get_real_ns();
 	ni->ni_data = net;
@@ -2564,55 +2563,51 @@ ksocknal_startup(struct lnet_ni *ni)
 
 	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns, true);
 	if (rc < 0)
-		goto fail_1;
+		goto out_net;
 
 	ksi = &net->ksnn_interface;
-	/* Use the first discovered interface or look in the list */
-	if (ni->ni_interface) {
-		for (i = 0; i < rc; i++) {
-			if (strcmp(ifaces[i].li_name, ni->ni_interface) == 0)
-				break;
-		}
-		/* ni_interface doesn't contain the interface we want */
-		if (i == rc) {
-			CERROR("ksocklnd: failed to find interface %s\n",
-			       ni->ni_interface);
-			goto fail_1;
-		}
-	} else {
-		rc = lnet_ni_add_interface(ni, ifaces[i].li_name);
+
+	/* Interface and/or IP address is specified otherwise default to
+	 * the first Interface
+	 */
+	if_idx = lnet_inet_select(ni, ifaces, rc);
+	if (if_idx < 0)
+		goto out_net;
+
+	if (!ni->ni_interface) {
+		rc = lnet_ni_add_interface(ni, ifaces[if_idx].li_name);
 		if (rc < 0)
 			CWARN("ksocklnd failed to allocate ni_interface\n");
 	}
 
-	ni->ni_dev_cpt = ifaces[i].li_cpt;
-	ksi->ksni_index = ifaces[i].li_index;
-	if (ifaces[i].li_ipv6) {
+	ni->ni_dev_cpt = ifaces[if_idx].li_cpt;
+	ksi->ksni_index = ifaces[if_idx].li_index;
+	if (ifaces[if_idx].li_ipv6) {
 		struct sockaddr_in6 *sa;
 		sa = (void *)&ksi->ksni_addr;
 		memset(sa, 0, sizeof(*sa));
 		sa->sin6_family = AF_INET6;
-		memcpy(&sa->sin6_addr, ifaces[i].li_ipv6addr,
+		memcpy(&sa->sin6_addr, ifaces[if_idx].li_ipv6addr,
 		       sizeof(struct in6_addr));
 		ni->ni_nid.nid_size = sizeof(struct in6_addr) - 4;
-		memcpy(&ni->ni_nid.nid_addr, ifaces[i].li_ipv6addr,
+		memcpy(&ni->ni_nid.nid_addr, ifaces[if_idx].li_ipv6addr,
 		       sizeof(struct in6_addr));
 	} else {
 		struct sockaddr_in *sa;
 		sa = (void *)&ksi->ksni_addr;
 		memset(sa, 0, sizeof(*sa));
 		sa->sin_family = AF_INET;
-		sa->sin_addr.s_addr = htonl(ifaces[i].li_ipaddr);
-		ksi->ksni_netmask = ifaces[i].li_netmask;
+		sa->sin_addr.s_addr = htonl(ifaces[if_idx].li_ipaddr);
+		ksi->ksni_netmask = ifaces[if_idx].li_netmask;
 		ni->ni_nid.nid_size = 4 - 4;
 		ni->ni_nid.nid_addr[0] = sa->sin_addr.s_addr;
 	}
-	strlcpy(ksi->ksni_name, ifaces[i].li_name, sizeof(ksi->ksni_name));
+	strlcpy(ksi->ksni_name, ifaces[if_idx].li_name, sizeof(ksi->ksni_name));
 
 	/* call it before add it to ksocknal_data.ksnd_nets */
 	rc = ksocknal_net_start_threads(net, ni->ni_cpts, ni->ni_ncpts);
 	if (rc)
-		goto fail_1;
+		goto out_net;
 
 	list_add(&net->ksnn_list, &ksocknal_data.ksnd_nets);
 	net->ksnn_ni = ni;
@@ -2620,9 +2615,9 @@ ksocknal_startup(struct lnet_ni *ni)
 
 	return 0;
 
-fail_1:
+out_net:
 	kfree(net);
-fail_0:
+out_base:
 	if (!ksocknal_data.ksnd_nnets)
 		ksocknal_base_shutdown();
 
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index 5bfae4e46910..0c4405f0f13b 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -1615,6 +1615,56 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6)
 }
 EXPORT_SYMBOL(lnet_inet_enumerate);
 
+int lnet_inet_select(struct lnet_ni *ni,
+		     struct lnet_inetdev *ifaces,
+		     int num_ifaces)
+{
+	bool addr_set = nid_addr_is_set(&ni->ni_nid);
+	int if_idx;
+
+	/* default to first interface if both interface and NID unspecified */
+	if (!ni->ni_interface && !addr_set)
+		return 0;
+
+	for (if_idx = 0; if_idx < num_ifaces; if_idx++) {
+		if (ni->ni_interface &&
+		    strcmp(ni->ni_interface, ifaces[if_idx].li_name) != 0)
+			/* not the specified interface */
+			continue;
+
+		if (!addr_set)
+			/* IP unspecified, use IP of first matching interface */
+			break;
+
+		if (ifaces[if_idx].li_ipv6 &&
+		    nid_is_ipv6(&ni->ni_nid)) {
+			if (memcmp(ni->ni_nid.nid_addr,
+				   ifaces[if_idx].li_ipv6addr,
+				   sizeof(struct in6_addr)) == 0)
+				break;
+		} else if (!ifaces[if_idx].li_ipv6 &&
+			   nid_is_ipv4(&ni->ni_nid)) {
+			if (ni->ni_nid.nid_addr[0] ==
+			    htonl(ifaces[if_idx].li_ipaddr))
+				break;
+		}
+	}
+
+	if (if_idx < num_ifaces)
+		return if_idx;
+
+	if (ni->ni_interface)
+		CERROR("ksocklnd: failed to find interface %s%s%s\n",
+		       ni->ni_interface, addr_set ? "@" : "",
+		       addr_set ? libcfs_nidstr(&ni->ni_nid) : "");
+	else
+		CERROR("ksocklnd: failed to find IP address %s\n",
+		       libcfs_nidstr(&ni->ni_nid));
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(lnet_inet_select);
+
 int
 lnet_parse_ip2nets(const char **networksp, const char *ip2nets)
 {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 27/42] lustre: obdclass: fix T10PI prototypes
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (25 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 26/42] lnet: Allow IP specification James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 28/42] lustre: obdclass: prefer T10 checksum if the target supports it James Simmons
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Li Dongyang, Lustre Development List

From: Li Dongyang <dongyangli@ddn.com>

Update the custom generate and verify functions, to sync
with upstream versions.
 - Use __be16 instead of __u16 for guard tags.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16413
Lustre-commit: 4f0273b3bc7d2159d ("LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49441
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_cksum.h  |  8 ++++----
 fs/lustre/obdclass/integrity.c | 12 ++++++------
 fs/lustre/osc/osc_request.c    |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h
index 1189bc2c73e9..36443fd5a156 100644
--- a/fs/lustre/include/obd_cksum.h
+++ b/fs/lustre/include/obd_cksum.h
@@ -129,13 +129,13 @@ enum cksum_types obd_cksum_type_select(const char *obd_name,
 #define DECLARE_CKSUM_NAME const char *const cksum_name[] = {"crc32", "adler", \
 	"crc32c", "reserved", "t10ip512", "t10ip4K", "t10crc512", "t10crc4K"}
 
-typedef u16 (obd_dif_csum_fn) (void *, unsigned int);
+typedef __be16 (obd_dif_csum_fn) (void *, unsigned int);
 
-u16 obd_dif_crc_fn(void *data, unsigned int len);
-u16 obd_dif_ip_fn(void *data, unsigned int len);
+__be16 obd_dif_crc_fn(void *data, unsigned int len);
+__be16 obd_dif_ip_fn(void *data, unsigned int len);
 int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
 				 u32 offset, u32 length,
-				 u16 *guard_start, int guard_number,
+				 __be16 *guard_start, int guard_number,
 				 int *used_number, int sector_size,
 				 obd_dif_csum_fn *fn);
 /*
diff --git a/fs/lustre/obdclass/integrity.c b/fs/lustre/obdclass/integrity.c
index 7a95a11f112f..e6069cb30213 100644
--- a/fs/lustre/obdclass/integrity.c
+++ b/fs/lustre/obdclass/integrity.c
@@ -32,13 +32,13 @@
 #include <obd_class.h>
 #include <obd_cksum.h>
 
-u16 obd_dif_crc_fn(void *data, unsigned int len)
+__be16 obd_dif_crc_fn(void *data, unsigned int len)
 {
 	return cpu_to_be16(crc_t10dif(data, len));
 }
 EXPORT_SYMBOL(obd_dif_crc_fn);
 
-u16 obd_dif_ip_fn(void *data, unsigned int len)
+__be16 obd_dif_ip_fn(void *data, unsigned int len)
 {
 	return ip_compute_csum(data, len);
 }
@@ -46,14 +46,14 @@ EXPORT_SYMBOL(obd_dif_ip_fn);
 
 int obd_page_dif_generate_buffer(const char *obd_name, struct page *page,
 				 u32 offset, u32 length,
-				 u16 *guard_start, int guard_number,
+				 __be16 *guard_start, int guard_number,
 				 int *used_number, int sector_size,
 				 obd_dif_csum_fn *fn)
 {
 	unsigned int i = offset;
 	unsigned int end = offset + length;
 	char *data_buf;
-	u16 *guard_buf = guard_start;
+	__be16 *guard_buf = guard_start;
 	unsigned int data_size;
 	int used = 0;
 
@@ -90,7 +90,7 @@ static int __obd_t10_performance_test(const char *obd_name,
 	unsigned int bufsize;
 	unsigned char *buffer;
 	struct page *__page;
-	u16 *guard_start;
+	__be16 *guard_start;
 	int guard_number;
 	int used_number = 0;
 	int sector_size = 0;
@@ -117,7 +117,7 @@ static int __obd_t10_performance_test(const char *obd_name,
 	}
 
 	buffer = kmap(__page);
-	guard_start = (u16 *)buffer;
+	guard_start = (__be16 *)buffer;
 	guard_number = PAGE_SIZE / sizeof(*guard_start);
 	for (i = 0; i < repeat_number; i++) {
 		/*
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 5a3f418615ca..fb0facc8ddd1 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1196,7 +1196,7 @@ static int osc_checksum_bulk_t10pi(const char *obd_name, int nob,
 	unsigned char cfs_alg = cksum_obd2cfs(OBD_CKSUM_T10_TOP);
 	struct page *__page;
 	unsigned char *buffer;
-	u16 *guard_start;
+	__be16 *guard_start;
 	unsigned int bufsize;
 	int guard_number;
 	int used_number = 0;
@@ -1220,7 +1220,7 @@ static int osc_checksum_bulk_t10pi(const char *obd_name, int nob,
 	}
 
 	buffer = kmap(__page);
-	guard_start = (u16 *)buffer;
+	guard_start = (__be16 *)buffer;
 	guard_number = PAGE_SIZE / sizeof(*guard_start);
 	CDEBUG(D_PAGE | (resend ? D_HA : 0),
 	       "GRD tags per page=%u, resend=%u, bytes=%u, pages=%zu\n",
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 28/42] lustre: obdclass: prefer T10 checksum if the target supports it
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (26 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 27/42] lustre: obdclass: fix T10PI prototypes James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 29/42] lustre: llite: remove false outdated comment James Simmons
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Li Dongyang, Lustre Development List

From: Li Dongyang <dongyangli@ddn.com>

If the target actually has T10PI support, we prefer to use that
T10 checksum even it's not the fastest on the client, given
checksum_type is not explicitly set.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14912
Lustre-commit: 5e9059e08aec6fb36 ("LU-14912 obdclass: prefer T10 checksum if the target supports it")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/44657
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_cksum.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h
index 36443fd5a156..052635d5d5b0 100644
--- a/fs/lustre/include/obd_cksum.h
+++ b/fs/lustre/include/obd_cksum.h
@@ -118,6 +118,13 @@ enum cksum_types obd_cksum_type_select(const char *obd_name,
 	if (preferred & cksum_types)
 		return preferred;
 
+	/*
+	 * Server reporting a single T10 checksum type
+	 * means the target actually supports T10-PI.
+	 */
+	if (hweight32(cksum_types & OBD_CKSUM_T10_ALL) == 1)
+		return cksum_types & OBD_CKSUM_T10_ALL;
+
 	flag = obd_cksum_type_pack(obd_name, cksum_types);
 
 	return obd_cksum_type_unpack(flag);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 29/42] lustre: llite: remove false outdated comment
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (27 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 28/42] lustre: obdclass: prefer T10 checksum if the target supports it James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 30/42] lnet: socklnd: clarify error message on timeout James Simmons
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Aurelien Degremont <degremoa@amazon.com>

Old commit from before Lustre was submitted upstream changed
ll_i2gids() behavior without updating the function
documentation accordingly. Fix it as this is confusing.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16438
Lustre-commit: 557bb0004d7ea1d87 ("LU-16438 llite: remove false outdated comment");
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49539
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/namei.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 93abec81896f..f23d3ae49fae 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -518,11 +518,7 @@ u32 ll_i2suppgid(struct inode *i)
 		return (u32)(-1);
 }
 
-/* Pack the required supplementary groups into the supplied groups array.
- * If we don't need to use the groups from the target inode(s) then we
- * instead pack one or more groups from the user's supplementary group
- * array in case it might be useful.  Not needed if doing an MDS-side upcall.
- */
+/* Pack the required supplementary groups into the supplied groups array. */
 void ll_i2gids(u32 *suppgids, struct inode *i1, struct inode *i2)
 {
 	LASSERT(i1);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 30/42] lnet: socklnd: clarify error message on timeout
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (28 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 29/42] lustre: llite: remove false outdated comment James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 31/42] lustre: llite: replace selinux_is_enabled() James Simmons
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Aurelien Degremont <degremoa@amazon.com>

When the local peer times out when writing
to another peer, prints an explicit error message
rather than a generic one. This is make it clearer
for admins and easier to debug.

Add port to help determining if this is always
the same one or not.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16439
Lustre-commit: 5b06ba9d46e19b9d7 ("LU-16439 socklnd: clarify error message on timeout")
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49540
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd_cb.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 15fba9d7f0ac..17ea0cca9255 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -533,11 +533,15 @@ ksocknal_process_transmit(struct ksock_conn *conn, struct ksock_tx *tx)
 	if (!conn->ksnc_closing) {
 		switch (rc) {
 		case -ECONNRESET:
-			LCONSOLE_WARN("Host %pISc reset our connection while we were sending data; it may have rebooted.\n",
-				      &conn->ksnc_peeraddr);
+			LCONSOLE_WARN("Host %pISc reset our connection while we were sending data; it may have rebooted: rc = %d\n",
+				      &conn->ksnc_peeraddr, rc);
+			break;
+		case -ETIMEDOUT:
+			LCONSOLE_WARN("Timeout error while writing to %pISp. Closing socket: rc = %d\n",
+				      &conn->ksnc_peeraddr, rc);
 			break;
 		default:
-			LCONSOLE_WARN("There was an unexpected network error while writing to %pISc: %d.\n",
+			LCONSOLE_WARN("There was an unexpected network error while writing to %pISc: rc = %d\n",
 				      &conn->ksnc_peeraddr, rc);
 			break;
 		}
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 31/42] lustre: llite: replace selinux_is_enabled()
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (29 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 30/42] lnet: socklnd: clarify error message on timeout James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 32/42] lustre: enc: S_ENCRYPTED flag on OST objects for enc files James Simmons
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Etienne AUJAMES, Lustre Development List

From: Etienne AUJAMES <etienne.aujames@cea.fr>

selinux_is_enabled() was removed from kernel 5.1.
The commit 7037720 add the kernel support by assuming SELinux to be
enabled if the function selinux_is_enabled() does not exist.

This has performances impacts: getxattr RPCs was not send for
"security.selinux" if selinux was disabled. Utilities like "ls -l"
always try to get "security.selinux".
See the LU-549 for more information.

This patch uses security_inode_listsecurity() when mounting the
client to know if a LSM module (selinux) required a xattr to store
file contexts. If a xattr is returned we store it and use it for in
request security context.

For getxattr/setxattr we use the stored LSM's xattr to filter xattr
security contexts like security.selinux. If xattr does not match the
stored xattr name we returned -EOPNOTSUPP to userspace.

It adds also the s_security check for security_inode_notifysecctx() to
avoid calling this function if selinux is disabled (as in
nfs_setsecurity()).

For "Enforcing SELinux Policy Check" functionality, the selinux check
have been moved in l_getsepol: -ENODEV is returned if selinux is
disabled.

*Note:*
This patch detects that selinux is disabled without explicitly
disabled it in kernel cmdline.

*Performances:*
Tests with "strace -c ls -l" with 100000 files on root in a multi VMs
env (on Rocky 9). FS is remount for each tests (cache is cleaned) and
selinux is disabled.

 __________________ ___________ _________
| Total time %     | lgetxattr | statx   |
|__________________|___________|_________|
|Without the patch:|    29%    |   51%   |
|__________________|___________|_________|
|With the patch:   |    0%     |   87%   |
|__________________|___________|_________|
"ls -l" uses lgetxattr to get "security.selinux".

Linux-commit: 3d252529480c68bfd6a6774652df7c8968b28e41

Fixes: 7037720 ("lustre: remove use of selinux_is_enabled().")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16210
Lustre-commit: 1d8faaf6caf4acaf0 ("LU-16210 llite: replace selinux_is_enabled()")
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48875
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c            |  22 ++--
 fs/lustre/llite/llite_internal.h |  46 +++++++-
 fs/lustre/llite/llite_lib.c      |  11 ++
 fs/lustre/llite/namei.c          |  87 +++++---------
 fs/lustre/llite/xattr.c          |  10 +-
 fs/lustre/llite/xattr_cache.c    |   6 +-
 fs/lustre/llite/xattr_security.c | 193 ++++++++++++++++++++++++-------
 fs/lustre/ptlrpc/sec.c           |  17 +--
 8 files changed, 260 insertions(+), 132 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index abbba964103f..7dca0fc461c0 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -432,6 +432,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 			.hash = full_name_hash(dparent, dirname,
 					       strlen(dirname)),
 		},
+		.d_sb = dparent->d_sb,
 	};
 	bool encrypt = false;
 	int hash_flags;
@@ -511,14 +512,13 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	}
 
 	if (test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags)) {
-		/*
-		 * selinux_dentry_init_security() uses dentry->d_parent and name
+		/* selinux_dentry_init_security() uses dentry->d_parent and name
 		 * to determine the security context for the file. So our fake
 		 * dentry should be real enough for this purpose.
 		 */
-		err = ll_dentry_init_security(parent,
-					      &dentry, mode, &dentry.d_name,
+		err = ll_dentry_init_security(&dentry, mode, &dentry.d_name,
 					      &op_data->op_file_secctx_name,
+					      &op_data->op_file_secctx_name_size,
 					      &op_data->op_file_secctx,
 					      &op_data->op_file_secctx_size);
 		if (err < 0)
@@ -550,17 +550,11 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 
 	dentry.d_inode = inode;
 
-	if (test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags)) {
-		/* no need to protect selinux_inode_setsecurity() by
-		 * inode_lock. Taking it would lead to a client deadlock
-		 * LU-13617
-		 */
-		err = security_inode_notifysecctx(inode,
-						  op_data->op_file_secctx,
-						  op_data->op_file_secctx_size);
-	} else {
+	if (test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags))
+		err = ll_inode_notifysecctx(inode, op_data->op_file_secctx,
+					    op_data->op_file_secctx_size);
+	else
 		err = ll_inode_init_security(&dentry, inode, parent);
-	}
 
 	if (encrypt)
 		err = ll_set_encflags(inode, op_data->op_file_encctx,
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 998eed83738e..c42330e54874 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -447,15 +447,45 @@ static inline void obd_connect_set_secctx(struct obd_connect_data *data)
 #endif
 }
 
-int ll_dentry_init_security(struct inode *parent, struct dentry *dentry,
-			    int mode, struct qstr *name,
-			    const char **secctx_name, void **secctx,
-			    u32 *secctx_size);
+/* Only smack and selinux is known to use security contexts */
+static inline bool ll_xattr_is_seclabel(const char *name)
+{
+	return !strcmp(name, XATTR_NAME_SELINUX) ||
+		!strcmp(name, XATTR_NAME_SMACK);
+}
+
+static inline bool ll_xattr_suffix_is_seclabel(const char *suffix)
+{
+	return !strcmp(suffix, XATTR_SELINUX_SUFFIX) ||
+		!strcmp(suffix, XATTR_SMACK_SUFFIX);
+}
+
+int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name,
+			    const char **secctx_name, u32 *secctx_name_size,
+			    void **secctx, u32 *secctx_size);
 int ll_inode_init_security(struct dentry *dentry, struct inode *inode,
 			   struct inode *dir);
 
-int ll_listsecurity(struct inode *inode, char *secctx_name,
-		    size_t secctx_name_size);
+int ll_inode_notifysecctx(struct inode *inode,
+			  void *secctx, u32 secctxlen);
+
+void ll_secctx_name_free(struct ll_sb_info *sbi);
+
+int ll_secctx_name_store(struct inode *in);
+
+u32 ll_secctx_name_get(struct ll_sb_info *sbi, const char **secctx_name);
+
+int ll_security_secctx_name_filter(struct ll_sb_info *sbi, int xattr_type,
+				   const char *suffix);
+
+static inline bool ll_security_xattr_wanted(struct inode *in)
+{
+#ifdef CONFIG_SECURITY
+	return in->i_security && in->i_sb->s_security;
+#else
+	return false;
+#endif
+}
 
 static inline bool obd_connect_has_enc(struct obd_connect_data *data)
 {
@@ -804,6 +834,10 @@ struct ll_sb_info {
 	struct ll_foreign_symlink_upcall_item *ll_foreign_symlink_upcall_items;
 	/* foreign symlink path upcall nb infos */
 	unsigned int		ll_foreign_symlink_upcall_nb_items;
+
+	/* cached file security context xattr name. e.g: security.selinux */
+	char			*ll_secctx_name;
+	u32			ll_secctx_name_size;
 };
 
 #define SBI_DEFAULT_HEAT_DECAY_WEIGHT	((80 * 256 + 50) / 100)
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 176e61b5874e..4bc91dddb6a7 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -144,6 +144,9 @@ static struct ll_sb_info *ll_init_sbi(void)
 	 * not enabled by default
 	 */
 
+	sbi->ll_secctx_name = NULL;
+	sbi->ll_secctx_name_size = 0;
+
 	sbi->ll_ra_info.ra_max_pages =
 		min(pages / 32, SBI_DEFAULT_READ_AHEAD_MAX);
 	sbi->ll_ra_info.ra_max_pages_per_file =
@@ -236,6 +239,8 @@ static void ll_free_sbi(struct super_block *sb)
 		kvfree(items);
 		sbi->ll_foreign_symlink_upcall_items = NULL;
 	}
+	ll_secctx_name_free(sbi);
+
 	ll_free_rw_stats_info(sbi);
 	pcc_super_fini(&sbi->ll_pcc_super);
 	kfree(sbi);
@@ -710,6 +715,12 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 		goto out_root;
 	}
 
+	err = ll_secctx_name_store(root);
+	if (err < 0 && ll_security_xattr_wanted(root))
+		CWARN("%s: file security contextes not supported: rc = %d\n",
+		      sbi->ll_fsname, err);
+
+	err = 0;
 	if (encctxlen) {
 		CDEBUG(D_SEC,
 		       "server returned encryption ctx for root inode "DFID"\n",
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index f23d3ae49fae..9314a1704146 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -712,20 +712,8 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 				       PFID(ll_inode2fid(inode)));
 		}
 
-		if (secctx && secctxlen != 0) {
-			/* no need to protect selinux_inode_setsecurity() by
-			 * inode_lock. Taking it would lead to a client deadlock
-			 * LU-13617
-			 */
-			rc = security_inode_notifysecctx(inode, secctx,
-							 secctxlen);
-			if (rc)
-				CWARN("%s: cannot set security context for " DFID ": rc = %d\n",
-				      ll_i2sbi(inode)->ll_fsname,
-				      PFID(ll_inode2fid(inode)),
-				      rc);
-		}
-
+		/* resume normally on error */
+		ll_inode_notifysecctx(inode, secctx, secctxlen);
 	}
 
 	alias = ll_splice_alias(inode, *de);
@@ -798,26 +786,26 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 }
 
 static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
-				   struct lookup_intent *it, void **secctx,
-				   u32 *secctxlen,
+				   struct lookup_intent *it,
+				   void **secctx, u32 *secctxlen,
 				   struct pcc_create_attach *pca,
 				   bool encrypt,
 				   void **encctx, u32 *encctxlen)
 {
 	ktime_t kstart = ktime_get();
 	struct lookup_intent lookup_it = { .it_op = IT_LOOKUP };
+	struct ll_sb_info *sbi = ll_i2sbi(parent);
 	struct dentry *save = dentry, *retval;
 	struct ptlrpc_request *req = NULL;
 	struct md_op_data *op_data = NULL;
 	struct lov_user_md *lum = NULL;
-	char secctx_name[XATTR_NAME_MAX + 1];
 	struct fscrypt_name fname;
 	struct inode *inode;
 	struct lu_fid fid;
 	u32 opc;
 	int rc;
 
-	if (dentry->d_name.len > ll_i2sbi(parent)->ll_namelen)
+	if (dentry->d_name.len > sbi->ll_namelen)
 		return ERR_PTR(-ENAMETOOLONG);
 
 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir=" DFID "(%p),intent=%s\n",
@@ -831,13 +819,8 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 
 	if (it->it_op == IT_GETATTR && dentry_may_statahead(parent, dentry)) {
 		rc = ll_revalidate_statahead(parent, &dentry, 0);
-		if (rc == 1) {
-			if (dentry == save)
-				retval = NULL;
-			else
-				retval = dentry;
-			goto out;
-		}
+		if (rc == 1)
+			return dentry == save ? NULL : dentry;
 	}
 
 	if (it->it_op & IT_OPEN && it->it_flags & FMODE_WRITE &&
@@ -872,7 +855,6 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 	if (IS_ERR(op_data)) {
 		fscrypt_free_filename(&fname);
 		return ERR_CAST(op_data);
-		goto out;
 	}
 	if (!fid_is_zero(&fid)) {
 		op_data->op_fid2 = fid;
@@ -886,11 +868,11 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 		it->it_create_mode &= ~current_umask();
 
 	if (it->it_op & IT_CREAT &&
-	    test_bit(LL_SBI_FILE_SECCTX, ll_i2sbi(parent)->ll_flags)) {
-		rc = ll_dentry_init_security(parent,
-					     dentry, it->it_create_mode,
+	    test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags)) {
+		rc = ll_dentry_init_security(dentry, it->it_create_mode,
 					     &dentry->d_name,
 					     &op_data->op_file_secctx_name,
+					     &op_data->op_file_secctx_name_size,
 					     &op_data->op_file_secctx,
 					     &op_data->op_file_secctx_size);
 		if (rc < 0) {
@@ -993,22 +975,12 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 			*encctxlen = 0;
 	}
 
-	/* ask for security context upon intent */
-	if (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_OPEN)) {
-		/* get name of security xattr to request to server */
-		rc = ll_listsecurity(parent, secctx_name,
-				     sizeof(secctx_name));
-		if (rc < 0) {
-			CDEBUG(D_SEC,
-			       "cannot get security xattr name for " DFID ": rc = %d\n",
-			       PFID(ll_inode2fid(parent)), rc);
-		} else if (rc > 0) {
-			op_data->op_file_secctx_name = secctx_name;
-			op_data->op_file_secctx_name_size = rc;
-			CDEBUG(D_SEC, "'%.*s' is security xattr for " DFID "\n",
-			       rc, secctx_name, PFID(ll_inode2fid(parent)));
-		}
-	}
+	/* ask for security context upon intent:
+	 * get name of security xattr to request to server
+	 */
+	if (it->it_op & (IT_LOOKUP | IT_GETATTR | IT_OPEN))
+		op_data->op_file_secctx_name_size =
+			ll_secctx_name_get(sbi, &op_data->op_file_secctx_name);
 
 	if (pca && pca->pca_dataset) {
 		lum = kzalloc(sizeof(*lum), GFP_NOFS);
@@ -1416,20 +1388,13 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
-	if (test_bit(LL_SBI_FILE_SECCTX, ll_i2sbi(inode)->ll_flags) &&
-	    secctx) {
-		/* must be done before d_instantiate, because it calls
-		 * security_d_instantiate, which means a getxattr if security
-		 * context is not set yet
-		 */
-		/* no need to protect selinux_inode_setsecurity() by
-		 * inode_lock. Taking it would lead to a client deadlock
-		 * LU-13617
-		 */
-		rc = security_inode_notifysecctx(inode, secctx, secctxlen);
-		if (rc)
-			return rc;
-	}
+	/* must be done before d_instantiate, because it calls
+	 * security_d_instantiate, which means a getxattr if security
+	 * context is not set yet
+	 */
+	rc = ll_inode_notifysecctx(inode, secctx, secctxlen);
+	if (rc)
+		return rc;
 
 	d_instantiate(dentry, inode);
 
@@ -1567,9 +1532,9 @@ static int ll_new_node(struct inode *dir, struct dentry *dchild,
 		ll_qos_mkdir_prep(op_data, dir);
 
 	if (test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags)) {
-		err = ll_dentry_init_security(dir,
-					      dchild, mode, &dchild->d_name,
+		err = ll_dentry_init_security(dchild, mode, &dchild->d_name,
 					      &op_data->op_file_secctx_name,
+					      &op_data->op_file_secctx_name_size,
 					      &op_data->op_file_secctx,
 					      &op_data->op_file_secctx_size);
 		if (err < 0)
diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index 11310f9a0a6b..c90f5017c931 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -123,6 +123,10 @@ static int ll_xattr_set_common(const struct xattr_handler *handler,
 	    (!strcmp(name, "ima") || !strcmp(name, "evm")))
 		return -EOPNOTSUPP;
 
+	rc = ll_security_secctx_name_filter(sbi, handler->flags, name);
+	if (rc)
+		return rc;
+
 	/*
 	 * In user.* namespace, only regular files and directories can have
 	 * extended attributes.
@@ -373,7 +377,7 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
 	}
 
 	if (sbi->ll_xattr_cache_enabled && type != XATTR_ACL_ACCESS_T &&
-	    (type != XATTR_SECURITY_T || strcmp(name, "security.selinux")) &&
+	    (type != XATTR_SECURITY_T || !ll_xattr_is_seclabel(name)) &&
 	    (type != XATTR_TRUSTED_T || strcmp(name, XATTR_NAME_SOM))) {
 		rc = ll_xattr_cache_get(inode, name, buffer, size, valid);
 		if (rc == -EAGAIN)
@@ -448,6 +452,10 @@ static int ll_xattr_get_common(const struct xattr_handler *handler,
 	if (rc)
 		return rc;
 
+	rc = ll_security_secctx_name_filter(sbi, handler->flags, name);
+	if (rc)
+		return rc;
+
 #ifdef CONFIG_LUSTRE_FS_POSIX_ACL
 	/* posix acl is under protection of LOOKUP lock. when calling to this,
 	 * we just have path resolution to the target inode, so we have great
diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c
index ae5980603bce..d8ddb90f2042 100644
--- a/fs/lustre/llite/xattr_cache.c
+++ b/fs/lustre/llite/xattr_cache.c
@@ -461,9 +461,9 @@ static int ll_xattr_cache_refill(struct inode *inode)
 			CDEBUG(D_CACHE, "not caching %s\n",
 			       XATTR_NAME_ACL_ACCESS);
 			rc = 0;
-		} else if (!strcmp(xdata, "security.selinux")) {
-			/* Filter out security.selinux, it is cached in slab */
-			CDEBUG(D_CACHE, "not caching security.selinux\n");
+		} else if (ll_xattr_is_seclabel(xdata)) {
+			/* Filter out security label, it is cached in slab */
+			CDEBUG(D_CACHE, "not caching %s\n", xdata);
 			rc = 0;
 		} else if (!strcmp(xdata, XATTR_NAME_SOM)) {
 			/* Filter out trusted.som, it is not cached on client */
diff --git a/fs/lustre/llite/xattr_security.c b/fs/lustre/llite/xattr_security.c
index 39229d3d0f9e..9910ac6c77fc 100644
--- a/fs/lustre/llite/xattr_security.c
+++ b/fs/lustre/llite/xattr_security.c
@@ -38,19 +38,17 @@
 /*
  * Check for LL_SBI_FILE_SECCTX before calling.
  */
-int ll_dentry_init_security(struct inode *parent, struct dentry *dentry,
-			    int mode, struct qstr *name,
-			    const char **secctx_name, void **secctx,
-			    u32 *secctx_size)
+int ll_dentry_init_security(struct dentry *dentry, int mode, struct qstr *name,
+			    const char **secctx_name, u32 *secctx_name_size,
+			    void **secctx, u32 *secctx_size)
 {
+	struct ll_sb_info *sbi = ll_s2sbi(dentry->d_sb);
 	int rc;
 
 	/*
-	 * security_dentry_init_security() is strange. Like
-	 * security_inode_init_security() it may return a context (provided a
-	 * Linux security module is enabled) but unlike
-	 * security_inode_init_security() it does not return to us the name of
-	 * the extended attribute to store the context under (for example
+	 * Before kernel 5.15-rc1-20-g15bf32398ad4,
+	 * security_inode_init_security() does not return to us the name of the
+	 * extended attribute to store the context under (for example
 	 * "security.selinux"). So we only call it when we think we know what
 	 * the name of the extended attribute will be. This is OK-ish since
 	 * SELinux is the only module that implements
@@ -59,30 +57,19 @@ int ll_dentry_init_security(struct inode *parent, struct dentry *dentry,
 	 * from SELinux.
 	 */
 
-	/* fetch length of security xattr name */
-	rc = security_inode_listsecurity(parent, NULL, 0);
-	/* xattr name length == 0 means SELinux is disabled */
-	if (rc == 0)
+	*secctx_name_size = ll_secctx_name_get(sbi, secctx_name);
+	/* xattr name length == 0 means no LSM module manage file contexts */
+	if (*secctx_name_size == 0)
 		return 0;
-	/* we support SELinux only */
-	if (rc != strlen(XATTR_NAME_SELINUX) + 1)
-		return -EOPNOTSUPP;
 
 	rc = security_dentry_init_security(dentry, mode, name, secctx,
 					   secctx_size);
-	/* Usually, security_dentry_init_security() returns -EOPNOTSUPP when
-	 * SELinux is disabled.
-	 * But on some kernels (e.g. rhel 8.5) it returns 0 when SELinux is
-	 * disabled, and in this case the security context is empty.
-	 */
-	if (rc == -EOPNOTSUPP || (rc == 0 && *secctx_size == 0))
-		/* do nothing */
+	/* ignore error if the hook is not supported by the LSM module */
+	if (rc == -EOPNOTSUPP)
 		return 0;
 	if (rc < 0)
 		return rc;
 
-	*secctx_name = XATTR_NAME_SELINUX;
-
 	return 0;
 }
 
@@ -139,31 +126,159 @@ int
 ll_inode_init_security(struct dentry *dentry, struct inode *inode,
 		       struct inode *dir)
 {
-	int err;
+	int rc;
 
-	err = security_inode_init_security(inode, dir, NULL,
-					   &ll_initxattrs, dentry);
+	if (!ll_security_xattr_wanted(dir))
+		return 0;
 
-	if (err == -EOPNOTSUPP)
+	rc = security_inode_init_security(inode, dir, NULL,
+					  &ll_initxattrs, dentry);
+	if (rc == -EOPNOTSUPP)
 		return 0;
-	return err;
+
+	return rc;
 }
 
 /**
- * Get security context xattr name used by policy.
+ * Notify security context to the security layer
+ *
+ * Notify security context @secctx of inode @inode to the security layer.
  *
- * \retval >= 0     length of xattr name
- * \retval < 0      failure to get security context xattr name
+ * Return	0 success, or SELinux is disabled or not supported by the fs
+ *		< 0 failure to set the security context
  */
-int
-ll_listsecurity(struct inode *inode, char *secctx_name, size_t secctx_name_size)
+int ll_inode_notifysecctx(struct inode *inode,
+			  void *secctx, u32 secctxlen)
 {
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	int rc;
 
-	rc = security_inode_listsecurity(inode, secctx_name, secctx_name_size);
-	if (rc >= secctx_name_size)
+	if (!test_bit(LL_SBI_FILE_SECCTX, sbi->ll_flags) ||
+	    !ll_security_xattr_wanted(inode) ||
+	    !secctx || !secctxlen)
+		return 0;
+
+	/* no need to protect selinux_inode_setsecurity() by
+	 * inode_lock. Taking it would lead to a client deadlock
+	 * LU-13617
+	 */
+	rc = security_inode_notifysecctx(inode, secctx, secctxlen);
+	if (rc)
+		CWARN("%s: cannot set security context for "DFID": rc = %d\n",
+		      sbi->ll_fsname, PFID(ll_inode2fid(inode)), rc);
+
+	return rc;
+}
+
+/**
+ * Free the security context xattr name used by policy
+ */
+void ll_secctx_name_free(struct ll_sb_info *sbi)
+{
+	kfree(sbi->ll_secctx_name);
+	sbi->ll_secctx_name = NULL;
+	sbi->ll_secctx_name_size = 0;
+}
+
+/**
+ * Get security context xattr name used by policy and save it.
+ *
+ * Return > 0	length of xattr name
+ *	  == 0	no LSM module registered supporting security contexts
+ *	  <= 0	failure to get xattr name or xattr is not supported
+ */
+int ll_secctx_name_store(struct inode *in)
+{
+	struct ll_sb_info *sbi = ll_i2sbi(in);
+	int rc = 0;
+
+	if (!ll_security_xattr_wanted(in))
+		return 0;
+
+	/* get size of xattr name */
+	rc = security_inode_listsecurity(in, NULL, 0);
+	if (rc <= 0)
+		return rc;
+
+	ll_secctx_name_free(sbi);
+
+	sbi->ll_secctx_name = kzalloc(rc + 1, GFP_NOFS);
+	if (!sbi->ll_secctx_name)
+		return -ENOMEM;
+
+	/* save the xattr name */
+	sbi->ll_secctx_name_size = rc;
+	rc = security_inode_listsecurity(in, sbi->ll_secctx_name,
+					 sbi->ll_secctx_name_size);
+	if (rc <= 0)
+		goto err_free;
+
+	if (rc > sbi->ll_secctx_name_size) {
 		rc = -ERANGE;
-	else if (rc >= 0)
-		secctx_name[rc] = '\0';
+		goto err_free;
+	}
+
+	/* sanity check */
+	sbi->ll_secctx_name[rc] = '\0';
+	if (rc < sizeof(XATTR_SECURITY_PREFIX)) {
+		rc = -EINVAL;
+		goto err_free;
+	}
+	if (strncmp(sbi->ll_secctx_name, XATTR_SECURITY_PREFIX,
+		    sizeof(XATTR_SECURITY_PREFIX) - 1) != 0) {
+		rc = -EOPNOTSUPP;
+		goto err_free;
+	}
+
+	return rc;
+
+err_free:
+	ll_secctx_name_free(sbi);
 	return rc;
 }
+
+/**
+ * Retrieved file security context xattr name stored.
+ *
+ * Return	security context xattr name size stored.
+ *	  0	no xattr name stored.
+ */
+u32 ll_secctx_name_get(struct ll_sb_info *sbi, const char **secctx_name)
+{
+	if (!sbi->ll_secctx_name || !sbi->ll_secctx_name_size)
+		return 0;
+
+	*secctx_name = sbi->ll_secctx_name;
+
+	return sbi->ll_secctx_name_size;
+}
+
+/**
+ * Filter out xattr file security context if not managed by LSM
+ *
+ * This is done to improve performance for application that blindly try to get
+ * file context (like "ls -l" for security.linux).
+ * See LU-549 for more information.
+ *
+ * Return 0		xattr not filtered
+ *	  -EOPNOTSUPP	no enabled LSM security module supports the xattr
+ */
+int ll_security_secctx_name_filter(struct ll_sb_info *sbi, int xattr_type,
+				   const char *suffix)
+{
+	const char *cached_suffix = NULL;
+
+	if (xattr_type != XATTR_SECURITY_T ||
+	    !ll_xattr_suffix_is_seclabel(suffix))
+		return 0;
+
+	/* is the xattr label used by lsm ? */
+	if (!ll_secctx_name_get(sbi, &cached_suffix))
+		return -EOPNOTSUPP;
+
+	cached_suffix += sizeof(XATTR_SECURITY_PREFIX) - 1;
+	if (strcmp(suffix, cached_suffix) != 0)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 976df0bca8a8..7cd09ebe78db 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -1767,19 +1767,17 @@ static inline int sptlrpc_sepol_needs_check(struct ptlrpc_sec *imp_sec)
 
 int sptlrpc_get_sepol(struct ptlrpc_request *req)
 {
-#ifndef CONFIG_SECURITY_SELINUX
+	struct ptlrpc_sec *imp_sec = req->rq_import->imp_sec;
+	int rc = 0;
+
 	(req->rq_sepol)[0] = '\0';
 
+#ifndef CONFIG_SECURITY_SELINUX
 	if (unlikely(send_sepol != 0))
 		CDEBUG(D_SEC,
 		       "Client cannot report SELinux status, it was not built against libselinux.\n");
 	return 0;
-#else
-	struct ptlrpc_sec *imp_sec = req->rq_import->imp_sec;
-	int rc = 0;
-
-	(req->rq_sepol)[0] = '\0';
-
+#endif
 	if (send_sepol == 0)
 		return 0;
 
@@ -1794,10 +1792,13 @@ int sptlrpc_get_sepol(struct ptlrpc_request *req)
 		memcpy(req->rq_sepol, imp_sec->ps_sepol,
 		       sizeof(req->rq_sepol));
 		spin_unlock(&imp_sec->ps_lock);
+	} else if (rc == -ENODEV) {
+		CDEBUG(D_SEC,
+		       "Client cannot report SELinux status, SELinux is disabled.\n");
+		rc = 0;
 	}
 
 	return rc;
-#endif
 }
 EXPORT_SYMBOL(sptlrpc_get_sepol);
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 32/42] lustre: enc: S_ENCRYPTED flag on OST objects for enc files
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (30 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 31/42] lustre: llite: replace selinux_is_enabled() James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 33/42] lnet: asym route inconsistency warning James Simmons
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Add a dumb encryption context on OST objects being created, when the
LUSTRE_ENCRYPT_FL flag gets set in the LMA, for ldiskfs backend
targets. This leads ldiskfs to internally set the LDISKFS_ENCRYPT_FL
flag on the on-disk inode. Also, it makes e2fsprogs happy to see an
enc ctx for an inode that has the LDISKFS_ENCRYPT_FL flag.

Add a dumb encryption context on OST objects being opened, if there is
not already one, for ldiskfs backend targets. This is done by adding
the LUSTRE_ENCRYPT_FL flag if necessary, at the same time as atime
gets updated. It is some sort of live self-check that fixes OST
objects created with an older Lustre version.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16091
Lustre-commit: 348446d6370b3f63f ("LU-16091 enc: S_ENCRYPTED flag on OST objects for enc files")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48198
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  1 +
 fs/lustre/llite/file.c          |  5 +++++
 fs/lustre/llite/llite_lib.c     |  5 +++++
 fs/lustre/osc/osc_request.c     | 10 ++++++++++
 4 files changed, 21 insertions(+)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index b58c1df4b538..a2930c800736 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -509,6 +509,7 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LFSCK_NO_DOUBLESCAN			0x160c
 #define OBD_FAIL_LFSCK_INVALID_PFID			0x1619
 #define OBD_FAIL_LFSCK_BAD_NAME_HASH			0x1628
+#define OBD_FAIL_LFSCK_NO_ENCFLAG			0x1632
 
 /* UPDATE */
 #define OBD_FAIL_UPDATE_OBJ_NET				0x1700
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index dac829fb977e..e343fc83d707 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5338,6 +5338,11 @@ int ll_getattr_dentry(struct dentry *de, struct kstat *stat, u32 request_mask,
 	stat->attributes_mask |= STATX_ATTR_ENCRYPTED;
 #endif
 	stat->attributes |= ll_inode_to_ext_flags(inode->i_flags);
+	/* if Lustre specific LUSTRE_ENCRYPT_FL flag is set, also set
+	 * ext4 equivalent to please statx
+	 */
+	if (stat->attributes & LUSTRE_ENCRYPT_FL)
+		stat->attributes |= STATX_ATTR_ENCRYPTED;
 	stat->result_mask &= request_mask;
 
 	ll_stats_ops_tally(sbi, LPROC_LL_GETATTR,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 4bc91dddb6a7..30056a6e26b2 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2873,6 +2873,11 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 		body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 
 		flags = body->mbo_flags;
+		/* if Lustre specific LUSTRE_ENCRYPT_FL flag is set, also set
+		 * ext4 equivalent to please lsattr and other e2fsprogs tools
+		 */
+		if (flags & LUSTRE_ENCRYPT_FL)
+			flags |= STATX_ATTR_ENCRYPTED;
 
 		ptlrpc_req_finished(req);
 
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index fb0facc8ddd1..bd294c5085d8 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1792,6 +1792,16 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	else /* short i/o */
 		ioobj_max_brw_set(ioobj, 0);
 
+	if (inode && IS_ENCRYPTED(inode) &&
+	    fscrypt_has_encryption_key(inode) &&
+	    !OBD_FAIL_CHECK(OBD_FAIL_LFSCK_NO_ENCFLAG)) {
+		if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) {
+			body->oa.o_valid |= OBD_MD_FLFLAGS;
+			body->oa.o_flags = 0;
+		}
+		body->oa.o_flags |= LUSTRE_ENCRYPT_FL;
+	}
+
 	if (short_io_size != 0) {
 		if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0) {
 			body->oa.o_valid |= OBD_MD_FLFLAGS;
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 33/42] lnet: asym route inconsistency warning
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (31 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 32/42] lustre: enc: S_ENCRYPTED flag on OST objects for enc files James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 34/42] lnet: o2iblnd: reset hiw proportionally James Simmons
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Gian-Carlo DeFazio <defazio1@llnl.gov>

remove LNET_UNDEFINED_HOPS from lnet_check_route_inconsistency()
where it is being treated as equivalent to 1 for the
value of lr_hops.

Due to the changes made in commit 3f2844dc9
"LU-14945 lnet: don't use hops to determine the route state",
LNET_UNDEFINED_HOPS is no longer considered equivalent to 1
for lr_hops in all cases, and it is valid to leave hops undefined
for multi-hop routes.

Therefore, having a multi-hop route with a hops of
LNET_UNDEFINED_HOPS is no longer inconsistent.

Fixes: 546bdd11a7 ("lnet: asym route inconsistency warning")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14555
Lustre-commit: 6aed5df1771c299b5 ("LU-14555 lnet: asym route inconsistency warning")
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49352
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 88a5b69e1f2e..c73e649fe07b 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -369,8 +369,7 @@ lnet_consolidate_routes_locked(struct lnet_peer *orig_lp,
 static inline void
 lnet_check_route_inconsistency(struct lnet_route *route)
 {
-	if (!route->lr_single_hop &&
-	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS) &&
+	if (!route->lr_single_hop && route->lr_hops == 1 &&
 	    avoid_asym_router_failure) {
 		CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n",
 		      libcfs_net2str(route->lr_net),
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 34/42] lnet: o2iblnd: reset hiw proportionally
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (32 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 33/42] lnet: asym route inconsistency warning James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 35/42] lnet: libcfs: cfs_hash_for_each_empty optimization James Simmons
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

As a result of connection negotiation, queue depth may end up
being shorter than "peer_tx_credits" tunables value. Before this
patch, the high-water mark "lnd_peercredits_hiw" would be set at
    min(current hiw, queue depth - 1).

For example, considering that hiw is allowed to only be as low as
half of peer_tx_credits, negotiating queue_depth/peer_credits down
from 32 to 8 would always result in hiw set at 7, i.e. credits would
be released as late as possible.

With this patch, if queue depth is reduced, hiw is set proportionally
relative to the level it was at before:
    hiw = (queue_depth * lnd_peercredits_hiw) / peer_tx_credits

Using the above example with queue depth initially at 32, negotiating
down to 8 would result in hiw set to 4 if "lnd_peercredits_hiw" is
initially at 16, 17, 18, 19; hiw set to 5 if "lnd_peercredits_hiw" is
initially at 20, 21, 22, 23, and so on.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15828
Lustre-commit: e1944c29793d48942 ("LU-15828 o2iblnd: reset hiw proportionally")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49497
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.h | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index e3c069bd1a7f..5884cda7a707 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -114,13 +114,6 @@ extern struct kib_tunables  kiblnd_tunables;
 /* Max # of peer_ni credits */
 #define IBLND_CREDITS_MAX	  ((typeof(((struct kib_msg *)0)->ibm_credits)) - 1)
 
-/* when eagerly to return credits */
-#define IBLND_CREDITS_HIGHWATER(t, conn)			\
-	(((conn)->ibc_version) == IBLND_MSG_VERSION_1 ?		\
-	 IBLND_CREDIT_HIGHWATER_V1 :				\
-	 min((t)->lnd_peercredits_hiw,				\
-	     (u32)(conn)->ibc_queue_depth - 1))
-
 # define kiblnd_rdma_create_id(ns, cb, dev, ps, qpt) \
 	 rdma_create_id((ns) ? (ns) : &init_net, cb, dev, ps, qpt)
 
@@ -699,17 +692,38 @@ kiblnd_send_keepalive(struct kib_conn *conn)
 			    ktime_add_ns(conn->ibc_last_send, keepalive_ns));
 }
 
+/* when to return credits eagerly */
+static inline int
+kiblnd_credits_highwater(struct lnet_ioctl_config_o2iblnd_tunables *t,
+			 struct lnet_ioctl_config_lnd_cmn_tunables *nt,
+			 struct kib_conn *conn)
+{
+	int credits_hiw = IBLND_CREDIT_HIGHWATER_V1;
+
+	if (conn->ibc_version == IBLND_MSG_VERSION_1)
+		return credits_hiw;
+
+	/* if queue depth is negotiated down, calculate hiw proportionally */
+	credits_hiw = (conn->ibc_queue_depth * t->lnd_peercredits_hiw) /
+		       nt->lct_peer_tx_credits;
+
+	return credits_hiw;
+}
+
 static inline int
 kiblnd_need_noop(struct kib_conn *conn)
 {
 	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
 	struct lnet_ni *ni = conn->ibc_peer->ibp_ni;
+	struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;
 
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
 	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
+	net_tunables = &ni->ni_net->net_tunables;
+
 
 	if (conn->ibc_outstanding_credits <
-	    IBLND_CREDITS_HIGHWATER(tunables, conn) &&
+	    kiblnd_credits_highwater(tunables, net_tunables, conn) &&
 	    !kiblnd_send_keepalive(conn))
 		return 0; /* No need to send NOOP */
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 35/42] lnet: libcfs: cfs_hash_for_each_empty optimization
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (33 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 34/42] lnet: o2iblnd: reset hiw proportionally James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 36/42] lustre: llite: always enable remote subdir mount James Simmons
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>

Restarts from bucket 0 in cfs_hash_for_each_empty()
cause excessive cpu consumption while checking first empty
buckets.

HPE-bug-id: LUS-11311
WC-bug-id: https://jira.whamcloud.com/browse/LU-16272
Lustre-commit: 306a9b666e5ea2882 ("LU-16272 libcfs: cfs_hash_for_each_empty optimization")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48972
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/hash.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/lnet/libcfs/hash.c b/net/lnet/libcfs/hash.c
index c9ff92dbfec7..6d5cd6d800b7 100644
--- a/net/lnet/libcfs/hash.c
+++ b/net/lnet/libcfs/hash.c
@@ -1541,7 +1541,7 @@ EXPORT_SYMBOL(cfs_hash_size_get);
  */
 static int
 cfs_hash_for_each_relax(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
-			void *data, int start)
+			void *data, int *pstart)
 {
 	struct hlist_node *next = NULL;
 	struct hlist_node *hnode;
@@ -1564,7 +1564,7 @@ cfs_hash_for_each_relax(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
 	cfs_hash_for_each_bucket(hs, &bd, i) {
 		struct hlist_head *hhead;
 
-		if (i < start)
+		if (pstart && i < *pstart)
 			continue;
 		else if (end > 0 && i >= end)
 			break;
@@ -1622,13 +1622,16 @@ cfs_hash_for_each_relax(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
 		if (rc) /* callback wants to break iteration */
 			break;
 	}
-	if (start > 0 && !rc) {
-		end = start;
-		start = 0;
+
+	if (pstart && *pstart > 0 && rc == 0) {
+		end = *pstart;
+		*pstart = 0;
 		goto again;
 	}
 
 	cfs_hash_unlock(hs, 0);
+	if (pstart)
+		*pstart = i;
 	return count;
 }
 
@@ -1646,7 +1649,7 @@ cfs_hash_for_each_nolock(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
 		return -EOPNOTSUPP;
 
 	cfs_hash_for_each_enter(hs);
-	cfs_hash_for_each_relax(hs, func, data, start);
+	cfs_hash_for_each_relax(hs, func, data, &start);
 	cfs_hash_for_each_exit(hs);
 
 	return 0;
@@ -1669,6 +1672,7 @@ cfs_hash_for_each_empty(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
 			void *data)
 {
 	unsigned int i = 0;
+	int start = 0;
 
 	if (cfs_hash_with_no_lock(hs))
 		return -EOPNOTSUPP;
@@ -1678,11 +1682,12 @@ cfs_hash_for_each_empty(struct cfs_hash *hs, cfs_hash_for_each_cb_t func,
 		return -EOPNOTSUPP;
 
 	cfs_hash_for_each_enter(hs);
-	while (cfs_hash_for_each_relax(hs, func, data, 0)) {
+	while (cfs_hash_for_each_relax(hs, func, data, &start)) {
 		CDEBUG(D_INFO, "Try to empty hash: %s, loop: %u\n",
 		       hs->hs_name, i++);
 	}
 	cfs_hash_for_each_exit(hs);
+	LASSERT(atomic_read(&hs->hs_count) == 0);
 	return 0;
 }
 EXPORT_SYMBOL(cfs_hash_for_each_empty);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 36/42] lustre: llite: always enable remote subdir mount
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (34 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 35/42] lnet: libcfs: cfs_hash_for_each_empty optimization James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 37/42] lnet: selftest: migrate LNet selftest group handling to Netlink James Simmons
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

For historical reason, ROOT is revalidated with IT_LOOKUP in
.permission to ensure permission is update to date because ROOT is
never looked up. But ROOT FID and layout is not changeable, it's
PERM lock that should be revalidated, i.e., revalidate with
IT_GETATTR instead of IT_LOOKUP.

Since PERM|UPDATE lock is on the MDT where object is located, client
can cache this lock, therefore remote subdir mount doesn't need to
lookup ROOT in each file access.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16026
Lustre-commit: 6f490275b0e0455a4 ("LU-16026 llite: always enable remote subdir mount")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48535
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index e343fc83d707..aa9c5daadcac 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5527,11 +5527,10 @@ int ll_inode_permission(struct inode *inode, int mask)
 		return -ECHILD;
 
        /* as root inode are NOT getting validated in lookup operation,
-	* need to do it before permission check.
+	* need to revalidate PERM before permission check.
 	*/
-
 	if (is_root_inode(inode)) {
-		rc = ll_inode_revalidate(inode->i_sb->s_root, IT_LOOKUP);
+		rc = ll_inode_revalidate(inode->i_sb->s_root, IT_GETATTR);
 		if (rc)
 			return rc;
 	}
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 37/42] lnet: selftest: migrate LNet selftest group handling to Netlink
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (35 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 36/42] lustre: llite: always enable remote subdir mount James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 38/42] lnet: use Netlink to support LNet ping commands James Simmons
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Replace the LSTIO_GROUP_LIST and LSTIO_GROUP_INFO ioctls with a Netlink
backend. Make this transitition transparent to the user. Be aware this
newer version of lnet_selftest.ko doesn't support older versions of the
lst tool. While the old interface allows only setting one group up at
a time the Netlink interface can be used to setup many groups at one
time. Currently we don't change the interface to handle larger NIDs but
this new interface will allow us to use the new NID format in a follow
on patch.

WC-bug-id: https://jira.whamcloud.com/browse/LU-8915
Lustre-commit: a6c2d277d09ce9b33 ("LU-8915 lnet: migrate LNet selftest group handling to Netlink")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49314
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/uapi/linux/lnet/lnetst.h |   2 +
 net/lnet/selftest/conctl.c       | 421 +++++++++++++++++++++++++------
 net/lnet/selftest/console.c      |  27 +-
 net/lnet/selftest/console.h      |   4 +-
 net/lnet/selftest/selftest.h     |  60 ++++-
 5 files changed, 405 insertions(+), 109 deletions(-)

diff --git a/include/uapi/linux/lnet/lnetst.h b/include/uapi/linux/lnet/lnetst.h
index d04496dd0701..2d5c40250125 100644
--- a/include/uapi/linux/lnet/lnetst.h
+++ b/include/uapi/linux/lnet/lnetst.h
@@ -583,11 +583,13 @@ struct sfw_counters {
  *
  * @LNET_SELFTEST_CMD_UNSPEC:		unspecified command to catch errors
  * @LNET_SELFTEST_CMD_SESSIONS:		command to manage sessions
+ * @LNET_SELFTEST_CMD_GROUPS:		command to manage selftest groups
  */
 enum lnet_selftest_commands {
 	LNET_SELFTEST_CMD_UNSPEC	= 0,
 
 	LNET_SELFTEST_CMD_SESSIONS	= 1,
+	LNET_SELFTEST_CMD_GROUPS	= 2,
 
 	__LNET_SELFTEST_CMD_MAX_PLUS_ONE,
 };
diff --git a/net/lnet/selftest/conctl.c b/net/lnet/selftest/conctl.c
index aa118851d183..ea590b26779b 100644
--- a/net/lnet/selftest/conctl.c
+++ b/net/lnet/selftest/conctl.c
@@ -35,7 +35,7 @@
  *
  * Author: Liang Zhen <liangzhen@clusterfs.com>
  */
-
+#include <linux/generic-radix-tree.h>
 #include <linux/lnet/lib-lnet.h>
 #include "console.h"
 
@@ -247,78 +247,6 @@ lst_nodes_add_ioctl(struct lstio_group_nodes_args *args)
 	return rc;
 }
 
-static int
-lst_group_list_ioctl(struct lstio_group_list_args *args)
-{
-	if (args->lstio_grp_key != console_session.ses_key)
-		return -EACCES;
-
-	if (args->lstio_grp_idx < 0 ||
-	    !args->lstio_grp_namep ||
-	    args->lstio_grp_nmlen <= 0 ||
-	    args->lstio_grp_nmlen > LST_NAME_SIZE)
-		return -EINVAL;
-
-	return lstcon_group_list(args->lstio_grp_idx,
-				 args->lstio_grp_nmlen,
-				 args->lstio_grp_namep);
-}
-
-static int
-lst_group_info_ioctl(struct lstio_group_info_args *args)
-{
-	char name[LST_NAME_SIZE + 1];
-	int ndent;
-	int index;
-	int rc;
-
-	if (args->lstio_grp_key != console_session.ses_key)
-		return -EACCES;
-
-	if (!args->lstio_grp_namep ||
-	    args->lstio_grp_nmlen <= 0 ||
-	    args->lstio_grp_nmlen > LST_NAME_SIZE)
-		return -EINVAL;
-
-	if (!args->lstio_grp_entp &&	/* output: group entry */
-	    !args->lstio_grp_dentsp)	/* output: node entry */
-		return -EINVAL;
-
-	if (args->lstio_grp_dentsp) {		/* have node entry */
-		if (!args->lstio_grp_idxp ||	/* node index */
-		    !args->lstio_grp_ndentp)	/* # of node entry */
-			return -EINVAL;
-
-		if (copy_from_user(&ndent, args->lstio_grp_ndentp,
-				   sizeof(ndent)) ||
-		    copy_from_user(&index, args->lstio_grp_idxp,
-				   sizeof(index)))
-			return -EFAULT;
-
-		if (ndent <= 0 || index < 0)
-			return -EINVAL;
-	}
-
-	if (copy_from_user(name, args->lstio_grp_namep,
-			   args->lstio_grp_nmlen))
-		return -EFAULT;
-
-	name[args->lstio_grp_nmlen] = 0;
-
-	rc = lstcon_group_info(name, args->lstio_grp_entp,
-			       &index, &ndent, args->lstio_grp_dentsp);
-
-	if (rc)
-		return rc;
-
-	if (args->lstio_grp_dentsp &&
-	    (copy_to_user(args->lstio_grp_idxp, &index, sizeof(index)) ||
-	     copy_to_user(args->lstio_grp_ndentp, &ndent, sizeof(ndent))))
-		return -EFAULT;
-
-	return 0;
-}
-
 static int
 lst_batch_add_ioctl(struct lstio_batch_add_args *args)
 {
@@ -690,10 +618,9 @@ lstcon_ioctl_entry(struct notifier_block *nb,
 		rc = lst_nodes_add_ioctl((struct lstio_group_nodes_args *)buf);
 		break;
 	case LSTIO_GROUP_LIST:
-		rc = lst_group_list_ioctl((struct lstio_group_list_args *)buf);
-		break;
+		fallthrough;
 	case LSTIO_GROUP_INFO:
-		rc = lst_group_info_ioctl((struct lstio_group_info_args *)buf);
+		rc = -EOPNOTSUPP;
 		break;
 	case LSTIO_BATCH_ADD:
 		rc = lst_batch_add_ioctl((struct lstio_batch_add_args *)buf);
@@ -982,8 +909,340 @@ static int lst_sessions_cmd(struct sk_buff *skb, struct genl_info *info)
 	return rc;
 }
 
+static char *lst_node_state2str(int state)
+{
+	if (state == LST_NODE_ACTIVE)
+		return "Active";
+	if (state == LST_NODE_BUSY)
+		return "Busy";
+	if (state == LST_NODE_DOWN)
+		return "Down";
+
+	return "Unknown";
+}
+
+int lst_node_str2state(char *str)
+{
+	int state = 0;
+
+	if (strcasecmp(str, "Active") == 0)
+		state = LST_NODE_ACTIVE;
+	else if (strcasecmp(str, "Busy") == 0)
+		state = LST_NODE_BUSY;
+	else if (strcasecmp(str, "Down") == 0)
+		state = LST_NODE_DOWN;
+	else if (strcasecmp(str, "Unknown") == 0)
+		state = LST_NODE_UNKNOWN;
+	else if (strcasecmp(str, "Invalid") == 0)
+		state = LST_NODE_UNKNOWN | LST_NODE_DOWN | LST_NODE_BUSY;
+	return state;
+}
+
+struct lst_genl_group_prop {
+	struct lstcon_group	*lggp_grp;
+	int			lggp_state_filter;
+};
+
+struct lst_genl_group_list {
+	GENRADIX(struct lst_genl_group_prop)	lggl_groups;
+	unsigned int				lggl_count;
+	unsigned int				lggl_index;
+	bool					lggl_verbose;
+};
+
+static inline struct lst_genl_group_list *
+lst_group_dump_ctx(struct netlink_callback *cb)
+{
+	return (struct lst_genl_group_list *)cb->args[0];
+}
+
+static int lst_groups_show_done(struct netlink_callback *cb)
+{
+	struct lst_genl_group_list *glist = lst_group_dump_ctx(cb);
+
+	if (glist) {
+		int i;
+
+		for (i = 0; i < glist->lggl_count; i++) {
+			struct lst_genl_group_prop *prop;
+
+			prop = genradix_ptr(&glist->lggl_groups, i);
+			if (!prop || !prop->lggp_grp)
+				continue;
+			lstcon_group_decref(prop->lggp_grp);
+		}
+		genradix_free(&glist->lggl_groups);
+		kfree(glist);
+	}
+	cb->args[0] = 0;
+
+	return 0;
+}
+
+/* LNet selftest groups ->start() handler for GET requests */
+static int lst_groups_show_start(struct netlink_callback *cb)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(cb->nlh);
+	struct netlink_ext_ack *extack = cb->extack;
+	struct nlattr *params = genlmsg_data(gnlh);
+	struct lst_genl_group_list *glist;
+	int msg_len = genlmsg_len(gnlh);
+	struct lstcon_group *grp;
+	struct nlattr *groups;
+	int rem, rc = 0;
+
+	glist = kzalloc(sizeof(*glist), GFP_KERNEL);
+	if (!glist)
+		return -ENOMEM;
+
+	genradix_init(&glist->lggl_groups);
+	cb->args[0] = (long)glist;
+
+	if (!msg_len) {
+		list_for_each_entry(grp, &console_session.ses_grp_list,
+				    grp_link) {
+			struct lst_genl_group_prop *prop;
+
+			prop = genradix_ptr_alloc(&glist->lggl_groups,
+						  glist->lggl_count++,
+						  GFP_ATOMIC);
+			if (!prop) {
+				NL_SET_ERR_MSG(extack,
+					       "failed to allocate group info");
+				rc = -ENOMEM;
+				goto report_err;
+			}
+			lstcon_group_addref(grp);  /* +1 ref for caller */
+			prop->lggp_grp = grp;
+		}
+
+		if (!glist->lggl_count) {
+			NL_SET_ERR_MSG(extack, "No groups found");
+			rc = -ENOENT;
+		}
+		goto report_err;
+	}
+	glist->lggl_verbose = true;
+
+	nla_for_each_attr(groups, params, msg_len, rem) {
+		struct lst_genl_group_prop *prop = NULL;
+		struct nlattr *group;
+		int rem2;
+
+		if (nla_type(groups) != LN_SCALAR_ATTR_LIST)
+			continue;
+
+		nla_for_each_nested(group, groups, rem2) {
+			if (nla_type(group) == LN_SCALAR_ATTR_VALUE) {
+				char name[LST_NAME_SIZE];
+
+				prop = genradix_ptr_alloc(&glist->lggl_groups,
+							  glist->lggl_count++,
+							  GFP_ATOMIC);
+				if (!prop) {
+					NL_SET_ERR_MSG(extack,
+						       "failed to allocate group info");
+					rc = -ENOMEM;
+					goto report_err;
+				}
+
+				rc = nla_strlcpy(name, group, sizeof(name));
+				if (rc < 0) {
+					NL_SET_ERR_MSG(extack,
+						       "failed to get name");
+					goto report_err;
+				}
+				rc = lstcon_group_find(name, &prop->lggp_grp);
+				if (rc < 0) {
+					/* don't stop reporting groups if one
+					 * doesn't exist.
+					 */
+					CWARN("LNet selftest group %s does not exit\n",
+					      name);
+					rc = 0;
+				}
+			} else if (nla_type(group) == LN_SCALAR_ATTR_LIST) {
+				struct nlattr *attr;
+				int rem3;
+
+				if (!prop) {
+					NL_SET_ERR_MSG(extack,
+						       "missing group information");
+					rc = -EINVAL;
+					goto report_err;
+				}
+
+				nla_for_each_nested(attr, group, rem3) {
+					char tmp[16];
+
+					if (nla_type(attr) != LN_SCALAR_ATTR_VALUE ||
+					    nla_strcmp(attr, "status") != 0)
+						continue;
+
+					attr = nla_next(attr, &rem3);
+					if (nla_type(attr) !=
+					    LN_SCALAR_ATTR_VALUE) {
+						NL_SET_ERR_MSG(extack,
+							       "invalid config param");
+						rc = -EINVAL;
+						goto report_err;
+					}
+
+					rc = nla_strlcpy(tmp, attr, sizeof(tmp));
+					if (rc < 0) {
+						NL_SET_ERR_MSG(extack,
+							       "failed to get prop attr");
+						goto report_err;
+					}
+					rc = 0;
+					prop->lggp_state_filter |=
+						lst_node_str2state(tmp);
+				}
+			}
+		}
+	}
+	if (!glist->lggl_count) {
+		NL_SET_ERR_MSG(extack, "No groups found");
+		rc = -ENOENT;
+	}
+report_err:
+	if (rc < 0)
+		lst_groups_show_done(cb);
+
+	return rc;
+}
+
+static const struct ln_key_list lst_group_keys = {
+	.lkl_maxattr			= LNET_SELFTEST_GROUP_MAX,
+	.lkl_list			= {
+		[LNET_SELFTEST_GROUP_ATTR_HDR]	= {
+			.lkp_value		= "groups",
+			.lkp_key_format		= LNKF_SEQUENCE,
+			.lkp_data_type		= NLA_NUL_STRING,
+		},
+		[LNET_SELFTEST_GROUP_ATTR_NAME]	= {
+			.lkp_data_type		= NLA_STRING,
+		},
+		[LNET_SELFTEST_GROUP_ATTR_NODELIST] = {
+			.lkp_key_format		= LNKF_MAPPING | LNKF_SEQUENCE,
+			.lkp_data_type		= NLA_NESTED,
+		},
+	},
+};
+
+static const struct ln_key_list lst_group_nodelist_keys = {
+	.lkl_maxattr			= LNET_SELFTEST_GROUP_NODELIST_PROP_MAX,
+	.lkl_list			= {
+		[LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_NID] = {
+			.lkp_value		= "nid",
+			.lkp_data_type		= NLA_STRING,
+		},
+		[LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_STATUS] = {
+			.lkp_value		= "status",
+			.lkp_data_type		= NLA_STRING,
+		},
+	},
+};
+
+static int lst_groups_show_dump(struct sk_buff *msg,
+				struct netlink_callback *cb)
+{
+	struct lst_genl_group_list *glist = lst_group_dump_ctx(cb);
+	struct netlink_ext_ack *extack = cb->extack;
+	int portid = NETLINK_CB(cb->skb).portid;
+	int seq = cb->nlh->nlmsg_seq;
+	int idx = 0, rc = 0;
+
+	if (!glist->lggl_index) {
+		const struct ln_key_list *all[] = {
+			&lst_group_keys, &lst_group_nodelist_keys, NULL
+		};
+
+		rc = lnet_genl_send_scalar_list(msg, portid, seq, &lst_family,
+						NLM_F_CREATE | NLM_F_MULTI,
+						LNET_SELFTEST_CMD_GROUPS, all);
+		if (rc < 0) {
+			NL_SET_ERR_MSG(extack, "failed to send key table");
+			goto send_error;
+		}
+	}
+
+	for (idx = glist->lggl_index; idx < glist->lggl_count; idx++) {
+		struct lst_genl_group_prop *group;
+		struct lstcon_ndlink *ndl;
+		struct nlattr *nodelist;
+		unsigned int count = 1;
+		void *hdr;
+
+		group = genradix_ptr(&glist->lggl_groups, idx);
+		if (!group)
+			continue;
+
+		hdr = genlmsg_put(msg, portid, seq, &lst_family,
+				  NLM_F_MULTI, LNET_SELFTEST_CMD_GROUPS);
+		if (!hdr) {
+			NL_SET_ERR_MSG(extack, "failed to send values");
+			rc = -EMSGSIZE;
+			goto send_error;
+		}
+
+		if (idx == 0)
+			nla_put_string(msg, LNET_SELFTEST_GROUP_ATTR_HDR, "");
+
+		nla_put_string(msg, LNET_SELFTEST_GROUP_ATTR_NAME,
+			       group->lggp_grp->grp_name);
+
+		if (!glist->lggl_verbose)
+			goto skip_details;
+
+		nodelist = nla_nest_start(msg,
+					  LNET_SELFTEST_GROUP_ATTR_NODELIST);
+		list_for_each_entry(ndl, &group->lggp_grp->grp_ndl_list,
+				    ndl_link) {
+			struct nlattr *node = nla_nest_start(msg, count);
+			char *ndstate;
+
+			if (group->lggp_state_filter &&
+			    !(group->lggp_state_filter & ndl->ndl_node->nd_state))
+				continue;
+
+			nla_put_string(msg,
+				       LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_NID,
+				       libcfs_id2str(ndl->ndl_node->nd_id));
+
+			ndstate = lst_node_state2str(ndl->ndl_node->nd_state);
+			nla_put_string(msg,
+				       LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_STATUS,
+				       ndstate);
+			nla_nest_end(msg, node);
+		}
+		nla_nest_end(msg, nodelist);
+skip_details:
+		genlmsg_end(msg, hdr);
+	}
+	glist->lggl_index = idx;
+send_error:
+	return rc;
+}
+
+#ifndef HAVE_NETLINK_CALLBACK_START
+static int lst_old_groups_show_dump(struct sk_buff *msg,
+				    struct netlink_callback *cb)
+{
+	if (!cb->args[0]) {
+		int rc = lst_groups_show_start(cb);
+
+		if (rc < 0)
+			return rc;
+	}
+
+	return lst_groups_show_dump(msg, cb);
+}
+#endif
+
 static const struct genl_multicast_group lst_mcast_grps[] = {
 	{ .name = "sessions",		},
+	{ .name	= "groups",		},
 };
 
 static const struct genl_ops lst_genl_ops[] = {
@@ -992,6 +1251,16 @@ static const struct genl_ops lst_genl_ops[] = {
 		.dumpit		= lst_sessions_show_dump,
 		.doit		= lst_sessions_cmd,
 	},
+	{
+		.cmd		= LNET_SELFTEST_CMD_GROUPS,
+#ifdef HAVE_NETLINK_CALLBACK_START
+		.start		= lst_groups_show_start,
+		.dumpit		= lst_groups_show_dump,
+#else
+		.dumpit		= lst_old_groups_show_dump,
+#endif
+		.done		= lst_groups_show_done,
+	},
 };
 
 static struct genl_family lst_family = {
diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c
index 1ed619114b2b..b6c98820d0bf 100644
--- a/net/lnet/selftest/console.c
+++ b/net/lnet/selftest/console.c
@@ -224,8 +224,7 @@ lstcon_group_alloc(char *name, struct lstcon_group **grpp)
 	return 0;
 }
 
-static void
-lstcon_group_addref(struct lstcon_group *grp)
+void lstcon_group_addref(struct lstcon_group *grp)
 {
 	grp->grp_ref++;
 }
@@ -245,8 +244,7 @@ lstcon_group_drain(struct lstcon_group *grp, int keep)
 	}
 }
 
-static void
-lstcon_group_decref(struct lstcon_group *grp)
+void lstcon_group_decref(struct lstcon_group *grp)
 {
 	int i;
 
@@ -264,8 +262,7 @@ lstcon_group_decref(struct lstcon_group *grp)
 	kfree(grp);
 }
 
-static int
-lstcon_group_find(const char *name, struct lstcon_group **grpp)
+int lstcon_group_find(const char *name, struct lstcon_group **grpp)
 {
 	struct lstcon_group *grp;
 
@@ -717,24 +714,6 @@ lstcon_group_refresh(char *name, struct list_head __user *result_up)
 	return rc;
 }
 
-int
-lstcon_group_list(int index, int len, char __user *name_up)
-{
-	struct lstcon_group *grp;
-
-	LASSERT(index >= 0);
-	LASSERT(name_up);
-
-	list_for_each_entry(grp, &console_session.ses_grp_list, grp_link) {
-		if (!index--) {
-			return copy_to_user(name_up, grp->grp_name, len) ?
-					    -EFAULT : 0;
-		}
-	}
-
-	return -ENOENT;
-}
-
 static int
 lstcon_nodes_getent(struct list_head *head, int *index_p,
 		    int *count_p, struct lstcon_node_ent __user *dents_up)
diff --git a/net/lnet/selftest/console.h b/net/lnet/selftest/console.h
index dd416dc82f35..40f33e97d7f3 100644
--- a/net/lnet/selftest/console.h
+++ b/net/lnet/selftest/console.h
@@ -206,6 +206,9 @@ int lstcon_nodes_debug(int timeout, int nnd,
 		       struct list_head __user *result_up);
 int lstcon_group_add(char *name);
 int lstcon_group_del(char *name);
+void lstcon_group_addref(struct lstcon_group *grp);
+void lstcon_group_decref(struct lstcon_group *grp);
+int lstcon_group_find(const char *name, struct lstcon_group **grpp);
 int lstcon_group_clean(char *name, int args);
 int lstcon_group_refresh(char *name, struct list_head __user *result_up);
 int lstcon_nodes_add(char *name, int nnd, struct lnet_process_id __user *nds_up,
@@ -216,7 +219,6 @@ int lstcon_nodes_remove(char *name, int nnd,
 int lstcon_group_info(char *name, struct lstcon_ndlist_ent __user *gent_up,
 		      int *index_p, int *ndent_p,
 		      struct lstcon_node_ent __user *ndents_up);
-int lstcon_group_list(int idx, int len, char __user *name_up);
 int lstcon_batch_add(char *name);
 int lstcon_batch_run(char *name, int timeout,
 		     struct list_head __user *result_up);
diff --git a/net/lnet/selftest/selftest.h b/net/lnet/selftest/selftest.h
index 5bffe7394dc2..5d0b47fe7e49 100644
--- a/net/lnet/selftest/selftest.h
+++ b/net/lnet/selftest/selftest.h
@@ -52,19 +52,19 @@
 /* enum lnet_selftest_session_attrs   - LNet selftest session Netlink
  *					attributes
  *
- * @LNET_SELFTEST_SESSION_UNSPEC:	unspecified attribute to catch errors
- * @LNET_SELFTEST_SESSION_PAD:		padding for 64-bit attributes, ignore
+ *  @LNET_SELFTEST_SESSION_UNSPEC:	unspecified attribute to catch errors
+ *  @LNET_SELFTEST_SESSION_PAD:		padding for 64-bit attributes, ignore
  *
- * @LENT_SELFTEST_SESSION_HDR:		Netlink group this data is for
+ *  @LENT_SELFTEST_SESSION_HDR:		Netlink group this data is for
  *					(NLA_NUL_STRING)
- * @LNET_SELFTEST_SESSION_NAME:	name of this session (NLA_STRING)
- * @LNET_SELFTEST_SESSION_KEY:		key used to represent the session
+ *  @LNET_SELFTEST_SESSION_NAME:	name of this session (NLA_STRING)
+ *  @LNET_SELFTEST_SESSION_KEY:		key used to represent the session
  *					(NLA_U32)
- * @LNET_SELFTEST_SESSION_TIMESTAMP:	timestamp when the session was created
+ *  @LNET_SELFTEST_SESSION_TIMESTAMP:	timestamp when the session was created
  *					(NLA_S64)
- * @LNET_SELFTEST_SESSION_NID:		NID of the node selftest ran on
+ *  @LNET_SELFTEST_SESSION_NID:		NID of the node selftest ran on
  *					(NLA_STRING)
- * @LNET_SELFTEST_SESSION_NODE_COUNT:	Number of nodes in use (NLA_U16)
+ *  @LNET_SELFTEST_SESSION_NODE_COUNT:	Number of nodes in use (NLA_U16)
  */
 enum lnet_selftest_session_attrs {
 	LNET_SELFTEST_SESSION_UNSPEC = 0,
@@ -82,6 +82,50 @@ enum lnet_selftest_session_attrs {
 
 #define LNET_SELFTEST_SESSION_MAX	(__LNET_SELFTEST_SESSION_MAX_PLUS_ONE - 1)
 
+/* enum lnet_selftest_group_attrs     - LNet selftest group Netlink attributes
+ *
+ *  @LNET_SELFTEST_GROUP_ATTR_UNSPEC:	unspecified attribute to catch errors
+ *
+ *  @LENT_SELFTEST_GROUP_ATTR_HDR:	Netlink group this data is for
+ *					(NLA_NUL_STRING)
+ *  @LNET_SELFTEST_GROUP_ATTR_NAME:	name of this group (NLA_STRING)
+ *  @LNET_SELFTEST_GROUP_ATTR_NODELIST:	List of nodes belonging to the group
+ *					(NLA_NESTED)
+ */
+enum lnet_selftest_group_attrs {
+	LNET_SELFTEST_GROUP_ATTR_UNSPEC = 0,
+
+	LNET_SELFTEST_GROUP_ATTR_HDR,
+	LNET_SELFTEST_GROUP_ATTR_NAME,
+	LNET_SELFTEST_GROUP_ATTR_NODELIST,
+
+	__LNET_SELFTEST_GROUP_MAX_PLUS_ONE,
+};
+
+#define LNET_SELFTEST_GROUP_MAX			(__LNET_SELFTEST_GROUP_MAX_PLUS_ONE - 1)
+
+/* enum lnet_selftest_group_nodelist_prop_attrs	      - Netlink attributes for
+ *							the properties of the
+ *							nodes that belong to a
+ *							group
+ *
+ *  @LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_UNSPEC:	unspecified attribute
+ *							to catch errors
+ *
+ *  @LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_NID:	Nodes's NID (NLA_STRING)
+ *  @LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_STATUS:	Status of the node
+ *							(NLA_STRING)
+ */
+enum lnet_selftest_group_nodelist_prop_attrs {
+	LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_UNSPEC = 0,
+
+	LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_NID,
+	LNET_SELFTEST_GROUP_NODELIST_PROP_ATTR_STATUS,
+	__LNET_SELFTEST_GROUP_NODELIST_PROP_MAX_PLUS_ONE,
+};
+
+#define LNET_SELFTEST_GROUP_NODELIST_PROP_MAX	(__LNET_SELFTEST_GROUP_NODELIST_PROP_MAX_PLUS_ONE - 1)
+
 #define SWI_STATE_NEWBORN		0
 #define SWI_STATE_REPLY_SUBMITTED	1
 #define SWI_STATE_REPLY_SENT		2
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 38/42] lnet: use Netlink to support LNet ping commands
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (36 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 37/42] lnet: selftest: migrate LNet selftest group handling to Netlink James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 39/42] lustre: llite: revert: "llite: clear stale page's uptodate bit" James Simmons
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Completely replace the old pre-MR ping command ioctl using
Netlink which will also handle large NIDs. We do update
IOC_LIBCFS_PING_PEER, which only supports only small NIDs,
so older tools will keep working.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10003
Lustre-commit: d137e9823ca1e97fc ("LU-10003 lnet: use Netlink to support LNet ping commands")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49360
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-lnet.h          |   1 +
 include/linux/lnet/lib-types.h         |  46 +++
 include/uapi/linux/lnet/libcfs_ioctl.h |   2 +-
 include/uapi/linux/lnet/lnet-dlc.h     |   2 +
 net/lnet/lnet/api-ni.c                 | 491 +++++++++++++++++++++----
 net/lnet/lnet/nidstrings.c             |  24 ++
 net/lnet/lnet/peer.c                   |   2 +-
 7 files changed, 495 insertions(+), 73 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 25289f5bba39..ed28af6fe8d5 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -895,6 +895,7 @@ struct lnet_ping_iter {
 u32 *ping_iter_first(struct lnet_ping_iter *pi, struct lnet_ping_buffer *pbuf,
 		     struct lnet_nid *nid);
 u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid);
+int ping_info_count_entries(struct lnet_ping_buffer *pbuf);
 
 static inline int lnet_push_target_resize_needed(void)
 {
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 73d962f18e06..eb54e754ad95 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -62,6 +62,7 @@ static inline char *libcfs_nidstr(const struct lnet_nid *nid)
 
 int libcfs_strnid(struct lnet_nid *nid, const char *str);
 char *libcfs_idstr(struct lnet_processid *id);
+int libcfs_strid(struct lnet_processid *id, const char *str);
 
 int cfs_match_nid_net(struct lnet_nid *nid, u32 net,
 		      struct list_head *net_num_list,
@@ -567,6 +568,51 @@ enum lnet_net_local_ni_tunables_attr {
 
 #define LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX (__LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX_PLUS_ONE - 1)
 
+/** LNet netlink ping API */
+
+/** enum lnet_ping_atts				      - LNet ping netlink properties
+ *							attributes to describe ping format
+ *							These values are used to piece together
+ *							messages for sending and receiving.
+ *
+ * @LNET_PING_ATTR_UNSPEC:				unspecified attribute to catch errors
+ *
+ * @LNET_PING_ATTR_HDR:					grouping for LNet ping data (NLA_NUL_STRING)
+ * @LNET_PING_ATTR_PRIMARY_NID:				Source NID for ping request (NLA_STRING)
+ * @LNET_PING_ATTR_ERRNO:				error code if we fail to ping (NLA_S16)
+ * @LNET_PING_ATTR_MULTIRAIL:				Report if MR is supported (NLA_FLAG)
+ * @LNET_PING_ATTR_PEER_NI_LIST:			List of peer NI's (NLA_NESTED)
+ */
+enum lnet_ping_attr {
+	LNET_PING_ATTR_UNSPEC = 0,
+
+	LNET_PING_ATTR_HDR,
+	LNET_PING_ATTR_PRIMARY_NID,
+	LNET_PING_ATTR_ERRNO,
+	LNET_PING_ATTR_MULTIRAIL,
+	LNET_PING_ATTR_PEER_NI_LIST,
+	__LNET_PING_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_PING_ATTR_MAX (__LNET_PING_ATTR_MAX_PLUS_ONE - 1)
+
+/** enium lnet_ping_peer_ni_attr		      - LNet peer ni information reported by
+ *							ping command. A list of these are
+ *							returned with a ping request.
+ *
+ * @LNET_PING_PEER_NI_ATTR_UNSPEC:			unspecified attribute to catch errrors
+ *
+ * @LNET_PING_PEER_NI_ATTR_NID:				NID address of peer NI. (NLA_STRING)
+ */
+enum lnet_ping_peer_ni_attr {
+	LNET_PING_PEER_NI_ATTR_UNSPEC = 0,
+
+	LNET_PING_PEER_NI_ATTR_NID,
+	__LNET_PING_PEER_NI_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_PING_PEER_NI_ATTR_MAX (__LNET_PING_PEER_NI_ATTR_MAX_PLUS_ONE - 1)
+
 struct lnet_ni {
 	spinlock_t		ni_lock;
 	/* chain on the lnet_net structure */
diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 89ac0758c1b1..e012532fc88a 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -102,7 +102,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_LNET_DIST		_IOWR('e', 58, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_CONFIGURE		_IOWR('e', 59, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_TESTPROTOCOMPAT	_IOWR('e', 60, IOCTL_LIBCFS_TYPE)
-#define IOC_LIBCFS_PING			_IOWR('e', 61, IOCTL_LIBCFS_TYPE)
+/* IOC_LIBCFS_PING obsolete in 2.16, was _IOWR('e', 61, IOCTL_LIBCFS_TYPE) */
 #define IOC_LIBCFS_PING_PEER		_IOWR('e', 62, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_LNETST		_IOWR('e', 63, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_LNET_FAULT		_IOWR('e', 64, IOCTL_LIBCFS_TYPE)
diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index 58697c134d6a..63578a04a286 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -57,6 +57,7 @@
  *  @LNET_CMD_UNSPEC:		unspecified command to catch errors
  *
  *  @LNET_CMD_NETS:		command to manage the LNet networks
+ *  @LNET_CMD_PING:		command to send pings to LNet connections
  */
 enum lnet_commands {
 	LNET_CMD_UNSPEC		= 0,
@@ -66,6 +67,7 @@ enum lnet_commands {
 	LNET_CMD_PEERS		= 3,
 	LNET_CMD_ROUTES		= 4,
 	LNET_CMD_CONNS		= 5,
+	LNET_CMD_PING		= 6,
 
 	__LNET_CMD_MAX_PLUS_ONE
 };
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 18dc3e7cccc6..2c7f5211bbee 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -31,6 +31,9 @@
  */
 
 #define DEBUG_SUBSYSTEM S_LNET
+
+#include <linux/ctype.h>
+#include <linux/generic-radix-tree.h>
 #include <linux/log2.h>
 #include <linux/ktime.h>
 #include <linux/moduleparam.h>
@@ -223,8 +226,23 @@ static void lnet_set_lnd_timeout(void)
  */
 static atomic_t lnet_dlc_seq_no = ATOMIC_INIT(0);
 
-static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
-		     signed long timeout, struct lnet_process_id __user *ids,
+struct lnet_fail_ping {
+	struct lnet_processid		lfp_id;
+	int				lfp_errno;
+};
+
+struct lnet_genl_ping_list {
+	unsigned int			lgpl_index;
+	unsigned int			lgpl_list_count;
+	unsigned int			lgpl_failed_count;
+	signed long			lgpl_timeout;
+	struct lnet_nid			lgpl_src_nid;
+	GENRADIX(struct lnet_fail_ping)	lgpl_failed;
+	GENRADIX(struct lnet_processid)	lgpl_list;
+};
+
+static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid,
+		     signed long timeout, struct lnet_genl_ping_list *plist,
 		     int n_ids);
 
 static int lnet_discover(struct lnet_process_id id, u32 force,
@@ -4367,35 +4385,15 @@ LNetCtl(unsigned int cmd, void *arg)
 	case IOC_LIBCFS_LNET_FAULT:
 		return lnet_fault_ctl(data->ioc_flags, data);
 
-	case IOC_LIBCFS_PING: {
-		struct lnet_process_id id4;
-		signed long timeout;
-
-		id4.nid = data->ioc_nid;
-		id4.pid = data->ioc_u32[0];
-
-		/* If timeout is negative then set default of 3 minutes */
-		if (((s32)data->ioc_u32[1] <= 0) ||
-		    data->ioc_u32[1] > (DEFAULT_PEER_TIMEOUT * MSEC_PER_SEC))
-			timeout = DEFAULT_PEER_TIMEOUT * HZ;
-		else
-			timeout = msecs_to_jiffies(data->ioc_u32[1]);
-
-		rc = lnet_ping(id4, &LNET_ANY_NID, timeout, data->ioc_pbuf1,
-			       data->ioc_plen1 / sizeof(struct lnet_process_id));
-
-		if (rc < 0)
-			return rc;
-
-		data->ioc_count = rc;
-		return 0;
-	}
-
 	case IOC_LIBCFS_PING_PEER: {
 		struct lnet_ioctl_ping_data *ping = arg;
+		struct lnet_process_id __user *ids = ping->ping_buf;
 		struct lnet_nid src_nid = LNET_ANY_NID;
+		struct lnet_genl_ping_list plist;
+		struct lnet_processid id;
 		struct lnet_peer *lp;
 		signed long timeout;
+		int count, i;
 
 		/* Check if the supplied ping data supports source nid
 		 * NB: This check is sufficient if lnet_ioctl_ping_data has
@@ -4416,15 +4414,30 @@ LNetCtl(unsigned int cmd, void *arg)
 		else
 			timeout = msecs_to_jiffies(ping->op_param);
 
-		rc = lnet_ping(ping->ping_id, &src_nid, timeout,
-			       ping->ping_buf,
+		id.pid = ping->ping_id.pid;
+		lnet_nid4_to_nid(ping->ping_id.nid, &id.nid);
+		rc = lnet_ping(&id, &src_nid, timeout, &plist,
 			       ping->ping_count);
 		if (rc < 0)
-			return rc;
+			goto report_ping_err;
+		count = rc;
+
+		for (i = 0; i < count; i++) {
+			struct lnet_processid *result;
+			struct lnet_process_id tmpid;
+
+			result = genradix_ptr(&plist.lgpl_list, i);
+			memset(&tmpid, 0, sizeof(tmpid));
+			tmpid.pid = result->pid;
+			tmpid.nid = lnet_nid_to_nid4(&result->nid);
+			if (copy_to_user(&ids[i], &tmpid, sizeof(tmpid))) {
+				rc = -EFAULT;
+				goto report_ping_err;
+			}
+		}
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lnet_nid4_to_nid(ping->ping_id.nid, &nid);
-		lp = lnet_find_peer(&nid);
+		lp = lnet_find_peer(&id.nid);
 		if (lp) {
 			ping->ping_id.nid =
 				lnet_nid_to_nid4(&lp->lp_primary_nid);
@@ -4433,8 +4446,10 @@ LNetCtl(unsigned int cmd, void *arg)
 		}
 		mutex_unlock(&the_lnet.ln_api_mutex);
 
-		ping->ping_count = rc;
-		return 0;
+		ping->ping_count = count;
+report_ping_err:
+		genradix_free(&plist.lgpl_list);
+		return rc;
 	}
 
 	case IOC_LIBCFS_DISCOVER: {
@@ -5229,9 +5244,339 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 	return rc;
 }
 
+static inline struct lnet_genl_ping_list *
+lnet_ping_dump_ctx(struct netlink_callback *cb)
+{
+	return (struct lnet_genl_ping_list *)cb->args[0];
+}
+
+static int lnet_ping_show_done(struct netlink_callback *cb)
+{
+	struct lnet_genl_ping_list *plist = lnet_ping_dump_ctx(cb);
+
+	if (plist) {
+		genradix_free(&plist->lgpl_failed);
+		genradix_free(&plist->lgpl_list);
+		kfree(plist);
+		cb->args[0] = 0;
+	}
+
+	return 0;
+}
+
+/* LNet ping ->start() handler for GET requests */
+static int lnet_ping_show_start(struct netlink_callback *cb)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(cb->nlh);
+	struct netlink_ext_ack *extack = cb->extack;
+	struct lnet_genl_ping_list *plist;
+	int msg_len = genlmsg_len(gnlh);
+	struct nlattr *params, *top;
+	int rem, rc = 0;
+
+	if (the_lnet.ln_refcount == 0) {
+		NL_SET_ERR_MSG(extack, "Network is down");
+		return -ENETDOWN;
+	}
+
+	if (!msg_len) {
+		NL_SET_ERR_MSG(extack, "Ping needs NID targets");
+		return -ENOENT;
+	}
+
+	plist = kzalloc(sizeof(*plist), GFP_KERNEL);
+	if (!plist) {
+		NL_SET_ERR_MSG(extack, "failed to setup ping list");
+		return -ENOMEM;
+	}
+	genradix_init(&plist->lgpl_list);
+	plist->lgpl_timeout = DEFAULT_PEER_TIMEOUT * HZ;
+	plist->lgpl_src_nid = LNET_ANY_NID;
+	plist->lgpl_index = 0;
+	plist->lgpl_list_count = 0;
+	cb->args[0] = (long)plist;
+
+	params = genlmsg_data(gnlh);
+	nla_for_each_attr(top, params, msg_len, rem) {
+		struct nlattr *nids;
+		int rem2;
+
+		switch (nla_type(top)) {
+		case LN_SCALAR_ATTR_VALUE:
+			if (nla_strcmp(top, "timeout") == 0) {
+				s64 timeout;
+
+				top = nla_next(top, &rem);
+				if (nla_type(top) != LN_SCALAR_ATTR_INT_VALUE) {
+					NL_SET_ERR_MSG(extack,
+						       "invalid timeout param");
+					rc = -EINVAL;
+					goto report_err;
+				}
+
+				/* If timeout is negative then set default of
+				 * 3 minutes
+				 */
+				timeout = nla_get_s64(top);
+				if (timeout > 0 &&
+				    timeout < (DEFAULT_PEER_TIMEOUT * MSEC_PER_SEC))
+					plist->lgpl_timeout =
+						nsecs_to_jiffies(timeout * NSEC_PER_MSEC);
+			} else if (nla_strcmp(top, "source") == 0) {
+				char nidstr[LNET_NIDSTR_SIZE + 1];
+
+				top = nla_next(top, &rem);
+				if (nla_type(top) != LN_SCALAR_ATTR_VALUE) {
+					NL_SET_ERR_MSG(extack,
+						       "invalid source param");
+					rc = -EINVAL;
+					goto report_err;
+				}
+
+				rc = nla_strlcpy(nidstr, top, sizeof(nidstr));
+				if (rc < 0) {
+					NL_SET_ERR_MSG(extack,
+						       "failed to parse source nid");
+					goto report_err;
+				}
+
+				rc = libcfs_strnid(&plist->lgpl_src_nid,
+						   strim(nidstr));
+				if (rc < 0) {
+					NL_SET_ERR_MSG(extack,
+						       "invalid source nid");
+					goto report_err;
+				}
+				rc = 0;
+			}
+			break;
+		case LN_SCALAR_ATTR_LIST:
+			nla_for_each_nested(nids, top, rem2) {
+				char nid[LNET_NIDSTR_SIZE + 1];
+				struct lnet_processid *id;
+
+				if (nla_type(nids) != LN_SCALAR_ATTR_VALUE)
+					continue;
+
+				memset(nid, 0, sizeof(nid));
+				rc = nla_strlcpy(nid, nids, sizeof(nid));
+				if (rc < 0) {
+					NL_SET_ERR_MSG(extack,
+						       "failed to get NID");
+					goto report_err;
+				}
+
+				id = genradix_ptr_alloc(&plist->lgpl_list,
+							plist->lgpl_list_count++,
+							GFP_ATOMIC);
+				if (!id) {
+					NL_SET_ERR_MSG(extack,
+						       "failed to allocate NID");
+					rc = -ENOMEM;
+					goto report_err;
+				}
+
+				rc = libcfs_strid(id, strim(nid));
+				if (rc < 0) {
+					NL_SET_ERR_MSG(extack, "invalid NID");
+					goto report_err;
+				}
+				rc = 0;
+			}
+			fallthrough;
+		default:
+			break;
+		}
+	}
+report_err:
+	if (rc < 0)
+		lnet_ping_show_done(cb);
+
+	return rc;
+}
+
+static const struct ln_key_list ping_props_list = {
+	.lkl_maxattr			= LNET_PING_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_PING_ATTR_HDR]            = {
+			.lkp_value              = "ping",
+			.lkp_key_format		= LNKF_SEQUENCE | LNKF_MAPPING,
+			.lkp_data_type		= NLA_NUL_STRING,
+		},
+		[LNET_PING_ATTR_PRIMARY_NID]	= {
+			.lkp_value		= "primary nid",
+			.lkp_data_type          = NLA_STRING
+		},
+		[LNET_PING_ATTR_ERRNO]		= {
+			.lkp_value		= "errno",
+			.lkp_data_type		= NLA_S16
+		},
+		[LNET_PING_ATTR_MULTIRAIL]	= {
+			.lkp_value              = "Multi-Rail",
+			.lkp_data_type          = NLA_FLAG
+		},
+		[LNET_PING_ATTR_PEER_NI_LIST]	= {
+			.lkp_value		= "peer_ni",
+			.lkp_key_format         = LNKF_SEQUENCE | LNKF_MAPPING,
+			.lkp_data_type          = NLA_NESTED
+		},
+	},
+};
+
+static struct ln_key_list ping_peer_ni_list = {
+	.lkl_maxattr			= LNET_PING_PEER_NI_ATTR_MAX,
+	.lkl_list                       = {
+		[LNET_PING_PEER_NI_ATTR_NID]	= {
+			.lkp_value		= "nid",
+			.lkp_data_type		= NLA_STRING
+		},
+	},
+};
+
+static int lnet_ping_show_dump(struct sk_buff *msg,
+			       struct netlink_callback *cb)
+{
+	struct lnet_genl_ping_list *plist = lnet_ping_dump_ctx(cb);
+	struct genlmsghdr *gnlh = nlmsg_data(cb->nlh);
+	struct netlink_ext_ack *extack = cb->extack;
+	int portid = NETLINK_CB(cb->skb).portid;
+	int seq = cb->nlh->nlmsg_seq;
+	int idx = plist->lgpl_index;
+	int rc = 0, i = 0;
+
+	if (!plist->lgpl_index) {
+		const struct ln_key_list *all[] = {
+			&ping_props_list, &ping_peer_ni_list, NULL
+		};
+
+		rc = lnet_genl_send_scalar_list(msg, portid, seq,
+						&lnet_family,
+						NLM_F_CREATE | NLM_F_MULTI,
+						LNET_CMD_PING, all);
+		if (rc < 0) {
+			NL_SET_ERR_MSG(extack, "failed to send key table");
+			goto send_error;
+		}
+
+		genradix_init(&plist->lgpl_failed);
+	}
+
+	while (idx < plist->lgpl_list_count) {
+		struct lnet_nid primary_nid = LNET_ANY_NID;
+		struct lnet_genl_ping_list peers;
+		struct lnet_processid *id;
+		struct nlattr *nid_list;
+		struct lnet_peer *lp;
+		bool mr_flag = false;
+		unsigned int count;
+		void *hdr = NULL;
+
+		id = genradix_ptr(&plist->lgpl_list, idx++);
+		if (nid_is_lo0(&id->nid))
+			continue;
+
+		rc = lnet_ping(id, &plist->lgpl_src_nid, plist->lgpl_timeout,
+			       &peers, lnet_interfaces_max);
+		if (rc < 0) {
+			struct lnet_fail_ping *fail;
+
+			fail = genradix_ptr_alloc(&plist->lgpl_failed,
+						  plist->lgpl_failed_count++,
+						  GFP_ATOMIC);
+			if (!fail) {
+				NL_SET_ERR_MSG(extack,
+					       "failed to allocate failed NID");
+				goto send_error;
+			}
+			fail->lfp_id = *id;
+			fail->lfp_errno = rc;
+			goto cant_reach;
+		}
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		lp = lnet_find_peer(&id->nid);
+		if (lp) {
+			primary_nid = lp->lp_primary_nid;
+			mr_flag = lnet_peer_is_multi_rail(lp);
+			lnet_peer_decref_locked(lp);
+		}
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		hdr = genlmsg_put(msg, portid, seq, &lnet_family,
+				  NLM_F_MULTI, LNET_CMD_PING);
+		if (!hdr) {
+			NL_SET_ERR_MSG(extack, "failed to send values");
+			genlmsg_cancel(msg, hdr);
+			rc = -EMSGSIZE;
+			goto send_error;
+		}
+
+		if (i++ == 0)
+			nla_put_string(msg, LNET_PING_ATTR_HDR, "");
+
+		nla_put_string(msg, LNET_PING_ATTR_PRIMARY_NID,
+			       libcfs_nidstr(&primary_nid));
+		if (mr_flag)
+			nla_put_flag(msg, LNET_PING_ATTR_MULTIRAIL);
+
+		nid_list = nla_nest_start(msg, LNET_PING_ATTR_PEER_NI_LIST);
+		for (count = 0; count < rc; count++) {
+			struct lnet_processid *result;
+			struct nlattr *nid_attr;
+			char *idstr;
+
+			result = genradix_ptr(&peers.lgpl_list, count);
+			if (nid_is_lo0(&result->nid))
+				continue;
+
+			nid_attr = nla_nest_start(msg, count + 1);
+			if (gnlh->version == 1)
+				idstr = libcfs_nidstr(&result->nid);
+			else
+				idstr = libcfs_idstr(result);
+			nla_put_string(msg, LNET_PING_PEER_NI_ATTR_NID, idstr);
+			nla_nest_end(msg, nid_attr);
+		}
+		nla_nest_end(msg, nid_list);
+		genlmsg_end(msg, hdr);
+cant_reach:
+		genradix_free(&peers.lgpl_list);
+	}
+
+	for (i = 0; i < plist->lgpl_failed_count; i++) {
+		struct lnet_fail_ping *fail;
+		void *hdr;
+
+		fail = genradix_ptr(&plist->lgpl_failed, i);
+
+		hdr = genlmsg_put(msg, portid, seq, &lnet_family,
+				  NLM_F_MULTI, LNET_CMD_PING);
+		if (!hdr) {
+			NL_SET_ERR_MSG(extack, "failed to send failed values");
+			genlmsg_cancel(msg, hdr);
+			rc = -EMSGSIZE;
+			goto send_error;
+		}
+
+		if (i == 0)
+			nla_put_string(msg, LNET_PING_ATTR_HDR, "");
+
+		nla_put_string(msg, LNET_PING_ATTR_PRIMARY_NID,
+			       libcfs_nidstr(&fail->lfp_id.nid));
+		nla_put_s16(msg, LNET_PING_ATTR_ERRNO, fail->lfp_errno);
+		genlmsg_end(msg, hdr);
+	}
+	rc = 0; /* don't treat it as an error */
+
+	plist->lgpl_index = idx;
+send_error:
+	return rc;
+}
+
 static const struct genl_multicast_group lnet_mcast_grps[] = {
 	{ .name	=	"ip2net",	},
 	{ .name =	"net",		},
+	{ .name	=	"ping",		},
 };
 
 static const struct genl_ops lnet_genl_ops[] = {
@@ -5242,6 +5587,12 @@ static const struct genl_ops lnet_genl_ops[] = {
 		.done		= lnet_net_show_done,
 		.doit		= lnet_net_cmd,
 	},
+	{
+		.cmd		= LNET_CMD_PING,
+		.start		= lnet_ping_show_start,
+		.dumpit		= lnet_ping_show_dump,
+		.done		= lnet_ping_show_done,
+	},
 };
 
 static struct genl_family lnet_family = {
@@ -5337,29 +5688,26 @@ lnet_ping_event_handler(struct lnet_event *event)
 		complete(&pd->completion);
 }
 
-/* lnet_ping() only works with nid4 nids, so we can calculate
- * size from number of nids
- */
-#define LNET_PING_INFO_SIZE(NNIDS) \
-	offsetof(struct lnet_ping_info, pi_ni[NNIDS])
-
-static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
-		     signed long timeout, struct lnet_process_id __user *ids,
+static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid,
+		     signed long timeout, struct lnet_genl_ping_list *plist,
 		     int n_ids)
 {
+	int id_bytes = sizeof(struct lnet_ni_status); /* For 0@lo */
 	struct lnet_md md = { NULL };
 	struct ping_data pd = { 0 };
 	struct lnet_ping_buffer *pbuf;
-	struct lnet_process_id tmpid;
-	struct lnet_processid id;
-	int id_bytes;
-	int i;
+	struct lnet_processid pid;
+	struct lnet_ping_iter pi;
+	int i = 0;
+	u32 *st;
 	int nob;
 	int rc;
 	int rc2;
 
+	genradix_init(&plist->lgpl_list);
+
 	/* n_ids limit is arbitrary */
-	if (n_ids <= 0 || id4.nid == LNET_NID_ANY)
+	if (n_ids <= 0 || LNET_NID_IS_ANY(&id->nid))
 		return -EINVAL;
 
 	/* if the user buffer has more space than the lnet_interfaces_max
@@ -5368,10 +5716,10 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 	if (n_ids > lnet_interfaces_max)
 		n_ids = lnet_interfaces_max;
 
-	if (id4.pid == LNET_PID_ANY)
-		id4.pid = LNET_PID_LUSTRE;
+	if (id->pid == LNET_PID_ANY)
+		id->pid = LNET_PID_LUSTRE;
 
-	id_bytes = LNET_PING_INFO_SIZE(n_ids);
+	id_bytes += lnet_ping_sts_size(&id->nid) * n_ids;
 	pbuf = lnet_ping_buffer_alloc(id_bytes, GFP_NOFS);
 	if (!pbuf)
 		return -ENOMEM;
@@ -5393,8 +5741,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		goto fail_ping_buffer_decref;
 	}
 
-	lnet_pid4_to_pid(id4, &id);
-	rc = LNetGet(src_nid, pd.mdh, &id, LNET_RESERVED_PORTAL,
+	rc = LNetGet(src_nid, pd.mdh, id, LNET_RESERVED_PORTAL,
 		     LNET_PROTO_PING_MATCHBITS, 0, false);
 	if (rc) {
 		/* Don't CERROR; this could be deliberate! */
@@ -5410,6 +5757,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		LNetMDUnlink(pd.mdh);
 		wait_for_completion(&pd.completion);
 	}
+
 	if (!pd.replied) {
 		rc = pd.rc ?: -EIO;
 		goto fail_ping_buffer_decref;
@@ -5420,9 +5768,9 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 
 	rc = -EPROTO;		/* if I can't parse... */
 
-	if (nob < 8) {
+	if (nob < LNET_PING_INFO_HDR_SIZE) {
 		CERROR("%s: ping info too short %d\n",
-		       libcfs_idstr(&id), nob);
+		       libcfs_idstr(id), nob);
 		goto fail_ping_buffer_decref;
 	}
 
@@ -5430,54 +5778,55 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		lnet_swap_pinginfo(pbuf);
 	} else if (pbuf->pb_info.pi_magic != LNET_PROTO_PING_MAGIC) {
 		CERROR("%s: Unexpected magic %08x\n",
-		       libcfs_idstr(&id), pbuf->pb_info.pi_magic);
+		       libcfs_idstr(id), pbuf->pb_info.pi_magic);
 		goto fail_ping_buffer_decref;
 	}
 
 	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_NI_STATUS)) {
 		CERROR("%s: ping w/o NI status: 0x%x\n",
-		       libcfs_idstr(&id), pbuf->pb_info.pi_features);
+		       libcfs_idstr(id), pbuf->pb_info.pi_features);
 		goto fail_ping_buffer_decref;
 	}
 
 	/* Test if smaller than lnet_pinginfo with just one pi_ni status info.
 	 * That one might contain size when large nids are used.
 	 */
-	if (nob < LNET_PING_INFO_SIZE(1)) {
+	if (nob < offsetof(struct lnet_ping_info, pi_ni[1])) {
 		CERROR("%s: Short reply %d(%lu min)\n",
-		       libcfs_idstr(&id), nob, LNET_PING_INFO_SIZE(1));
+		       libcfs_idstr(id), nob,
+		       offsetof(struct lnet_ping_info, pi_ni[1]));
 		goto fail_ping_buffer_decref;
 	}
 
-	if (pbuf->pb_info.pi_nnis < n_ids) {
-		n_ids = pbuf->pb_info.pi_nnis;
+	if (ping_info_count_entries(pbuf) < n_ids) {
+		n_ids = ping_info_count_entries(pbuf);
 		id_bytes = lnet_ping_info_size(&pbuf->pb_info);
 	}
 
 	if (nob < id_bytes) {
 		CERROR("%s: Short reply %d(%d expected)\n",
-		       libcfs_idstr(&id), nob, id_bytes);
+		       libcfs_idstr(id), nob, id_bytes);
 		goto fail_ping_buffer_decref;
 	}
 
-	rc = -EFAULT;		/* if I segv in copy_to_user()... */
-
-	memset(&tmpid, 0, sizeof(tmpid));
-	for (i = 0; i < n_ids; i++) {
-		tmpid.pid = pbuf->pb_info.pi_pid;
-		tmpid.nid = pbuf->pb_info.pi_ni[i].ns_nid;
-		if (copy_to_user(&ids[i], &tmpid, sizeof(tmpid)))
+	for (st = ping_iter_first(&pi, pbuf, &pid.nid);
+	     st;
+	     st = ping_iter_next(&pi, &pid.nid)) {
+		id = genradix_ptr_alloc(&plist->lgpl_list, i++, GFP_ATOMIC);
+		if (!id) {
+			rc = -ENOMEM;
 			goto fail_ping_buffer_decref;
-	}
-	rc = pbuf->pb_info.pi_nnis;
+		}
 
+		id->pid = pbuf->pb_info.pi_pid;
+		id->nid = pid.nid;
+	}
+	rc = i;
 fail_ping_buffer_decref:
 	lnet_ping_buffer_decref(pbuf);
 	return rc;
 }
 
-#undef LNET_PING_INFO_SIZE
-
 static int
 lnet_discover(struct lnet_process_id id4, u32 force,
 	      struct lnet_process_id __user *ids,
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index ac2aa973a412..b5a585507d6a 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -1170,6 +1170,30 @@ libcfs_idstr(struct lnet_processid *id)
 }
 EXPORT_SYMBOL(libcfs_idstr);
 
+int
+libcfs_strid(struct lnet_processid *id, const char *str)
+{
+	char *tmp = strchr(str, '-');
+
+	id->pid = LNET_PID_LUSTRE;
+	if (tmp &&
+	    strncmp("LNET_PID_ANY-", str, tmp - str) != 0) {
+		char pid[LNET_NIDSTR_SIZE];
+		int rc;
+
+		strscpy(pid, str, tmp - str);
+		rc = kstrtou32(pid, 10, &id->pid);
+		if (rc < 0)
+			return rc;
+		tmp++;
+	} else {
+		tmp = (char *)str;
+	}
+
+	return libcfs_strnid(&id->nid, tmp);
+}
+EXPORT_SYMBOL(libcfs_strid);
+
 int
 libcfs_str2anynid(lnet_nid_t *nidp, const char *str)
 {
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index a9759860a5cd..da1f8d4d67a3 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2955,7 +2955,7 @@ u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid)
 	return NULL;
 }
 
-static int ping_info_count_entries(struct lnet_ping_buffer *pbuf)
+int ping_info_count_entries(struct lnet_ping_buffer *pbuf)
 {
 	struct lnet_ping_iter pi;
 	u32 *st;
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 39/42] lustre: llite: revert: "llite: clear stale page's uptodate bit"
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (37 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 38/42] lnet: use Netlink to support LNet ping commands James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 40/42] lnet: validate data sent from user land properly James Simmons
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

This reverts commit 23c4c1c09cfebccea37a88a27f122646168cbad4

which caused a bug in cl_page_own() race with ll_releasepage()
and cl_pagevec_put() assertion failure.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16160
Lustre-commit: 84c9618190f9e3a52 ("LU-16160 revert: "llite: clear stale page's uptodate bit")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49541
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h |  15 +---
 fs/lustre/llite/rw.c          |  10 +--
 fs/lustre/llite/vvp_io.c      | 124 +++-------------------------------
 fs/lustre/llite/vvp_page.c    |   5 --
 fs/lustre/obdclass/cl_page.c  |  37 +++-------
 5 files changed, 19 insertions(+), 172 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 8be58ffb9f34..41ce0b02e00e 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -768,15 +768,7 @@ struct cl_page {
 	enum cl_page_type		 cp_type:CP_TYPE_BITS;
 	unsigned int			 cp_defer_uptodate:1,
 					 cp_ra_updated:1,
-					 cp_ra_used:1,
-					 /* fault page read grab extra referece */
-					 cp_fault_ref:1,
-					 /**
-					  * if fault page got delete before returned to
-					  * filemap_fault(), defer the vmpage detach/put
-					  * until filemap_fault() has been handled.
-					  */
-					 cp_defer_detach:1;
+					 cp_ra_used:1;
 
 	/* which slab kmem index this memory allocated from */
 	short int			 cp_kmem_index;
@@ -2401,11 +2393,6 @@ int cl_io_lru_reserve(const struct lu_env *env, struct cl_io *io,
 int cl_io_read_ahead(const struct lu_env *env, struct cl_io *io,
 		     pgoff_t start, struct cl_read_ahead *ra);
 
-static inline int cl_io_is_pagefault(const struct cl_io *io)
-{
-	return io->ci_type == CIT_FAULT && !io->u.ci_fault.ft_mkwrite;
-}
-
 /**
  * True, if @io is an O_APPEND write(2).
  */
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 0283af422712..2290b3112380 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1947,15 +1947,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 			unlock_page(vmpage);
 			result = 0;
 		}
-		if (cl_io_is_pagefault(io) && result == 0) {
-			/**
-			 * page fault, retain the cl_page reference until
-			 * vvp_io_kernel_fault() release it.
-			 */
-			page->cp_fault_ref = 1;
-		} else {
-			cl_page_put(env, page);
-		}
+		cl_page_put(env, page);
 	} else {
 		unlock_page(vmpage);
 		result = PTR_ERR(page);
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 317704172080..eacb35b500e5 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -1302,41 +1302,14 @@ static void vvp_io_rw_end(const struct lu_env *env,
 	trunc_sem_up_read(&lli->lli_trunc_sem);
 }
 
-static void detach_and_deref_page(struct cl_page *clp, struct page *vmpage)
-{
-	if (!clp->cp_defer_detach)
-		return;
-
-	/**
-	 * cl_page_delete0() took a vmpage reference, but not unlink the vmpage
-	 * from its cl_page.
-	 */
-	clp->cp_defer_detach = 0;
-	ClearPagePrivate(vmpage);
-	vmpage->private = 0;
-
-	put_page(vmpage);
-	refcount_dec(&clp->cp_ref);
-}
-
-static int vvp_io_kernel_fault(const struct lu_env *env,
-			       struct vvp_fault_io *cfio)
+static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 {
 	struct vm_fault *vmf = cfio->ft_vmf;
-	struct file *vmff = cfio->ft_vma->vm_file;
-	struct address_space *mapping = vmff->f_mapping;
-	struct inode *inode = mapping->host;
-	struct page *vmpage = NULL;
-	struct cl_page *clp = NULL;
-	int rc = 0;
 
-	ll_inode_size_lock(inode);
-retry:
 	cfio->ft_flags = filemap_fault(vmf);
 	cfio->ft_flags_valid = 1;
 
 	if (vmf->page) {
-		/* success, vmpage is locked */
 		CDEBUG(D_PAGE,
 		       "page %p map %p index %lu flags %lx count %u priv %0lx: got addr %p type NOPAGE\n",
 		       vmf->page, vmf->page->mapping, vmf->page->index,
@@ -1348,105 +1321,24 @@ static int vvp_io_kernel_fault(const struct lu_env *env,
 		}
 
 		cfio->ft_vmpage = vmf->page;
-
-		/**
-		 * ll_filemap_fault()->ll_readpage() could get an extra cl_page
-		 * reference. So we have to get the cl_page's to check its
-		 * cp_fault_ref and drop the reference later.
-		 */
-		clp = cl_vmpage_page(vmf->page, NULL);
-
-		goto unlock;
-	}
-
-	/* filemap_fault() fails, vmpage is not locked */
-	if (!clp) {
-		vmpage = find_get_page(mapping, vmf->pgoff);
-		if (vmpage) {
-			lock_page(vmpage);
-			clp = cl_vmpage_page(vmpage, NULL);
-			unlock_page(vmpage);
-		}
+		return 0;
 	}
 
 	if (cfio->ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
-		pgoff_t max_idx;
-
-		/**
-		 * ll_filemap_fault()->ll_readpage() could fill vmpage
-		 * correctly, and unlock the vmpage, while memory pressure or
-		 * truncate could detach cl_page from vmpage, and kernel
-		 * filemap_fault() will wait_on_page_locked(vmpage) and find
-		 * out that the vmpage has been cleared its uptodate bit,
-		 * so it returns VM_FAULT_SIGBUS.
-		 *
-		 * In this case, we'd retry the filemap_fault()->ll_readpage()
-		 * to rebuild the cl_page and fill vmpage with uptodated data.
-		 */
-		if (likely(vmpage)) {
-			bool need_retry = false;
-
-			if (clp) {
-				if (clp->cp_defer_detach) {
-					detach_and_deref_page(clp, vmpage);
-					/**
-					 * check i_size to make sure it's not
-					 * over EOF, we don't want to call
-					 * filemap_fault() repeatedly since it
-					 * returns VM_FAULT_SIGBUS without even
-					 * trying if vmf->pgoff is over EOF.
-					 */
-					max_idx = DIV_ROUND_UP(i_size_read(inode),
-							       PAGE_SIZE);
-					if (vmf->pgoff < max_idx)
-						need_retry = true;
-				}
-				if (clp->cp_fault_ref) {
-					clp->cp_fault_ref = 0;
-					/* ref not released in ll_readpage() */
-					cl_page_put(env, clp);
-				}
-				if (need_retry)
-					goto retry;
-			}
-		}
-
 		CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", (void *)vmf->address);
-		rc = -EFAULT;
-		goto unlock;
+		return -EFAULT;
 	}
 
 	if (cfio->ft_flags & VM_FAULT_OOM) {
 		CDEBUG(D_PAGE, "got addr %p - OOM\n", (void *)vmf->address);
-		rc = -ENOMEM;
-		goto unlock;
+		return -ENOMEM;
 	}
 
-	if (cfio->ft_flags & VM_FAULT_RETRY) {
-		rc = -EAGAIN;
-		goto unlock;
-	}
+	if (cfio->ft_flags & VM_FAULT_RETRY)
+		return -EAGAIN;
 
 	CERROR("Unknown error in page fault %d!\n", cfio->ft_flags);
-	rc = -EINVAL;
-unlock:
-	ll_inode_size_unlock(inode);
-	if (clp) {
-		if (clp->cp_defer_detach && vmpage)
-			detach_and_deref_page(clp, vmpage);
-
-		/* additional cl_page ref has been taken in ll_readpage() */
-		if (clp->cp_fault_ref) {
-			clp->cp_fault_ref = 0;
-			/* ref not released in ll_readpage() */
-			cl_page_put(env, clp);
-		}
-		/* ref taken in this function */
-		cl_page_put(env, clp);
-	}
-	if (vmpage)
-		put_page(vmpage);
-	return rc;
+	return -EINVAL;
 }
 
 static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
@@ -1486,7 +1378,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		LASSERT(cfio->ft_vmpage);
 		lock_page(cfio->ft_vmpage);
 	} else {
-		result = vvp_io_kernel_fault(env, cfio);
+		result = vvp_io_kernel_fault(cfio);
 		if (result != 0)
 			return result;
 	}
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 9e8c1588347f..f359596bc32d 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -104,11 +104,6 @@ static void vvp_page_completion_read(const struct lu_env *env,
 		ll_ra_count_put(ll_i2sbi(inode), 1);
 
 	if (ioret == 0)  {
-		/**
-		 * cp_defer_uptodate is used for readahead page, and the
-		 * vmpage Uptodate bit is deferred to set in ll_readpage/
-		 * ll_io_read_page.
-		 */
 		if (!cp->cp_defer_uptodate)
 			SetPageUptodate(vmpage);
 	} else if (cp->cp_defer_uptodate) {
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 3bc1a9b0eb98..7011235a9b3c 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -725,35 +725,16 @@ static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp)
 		LASSERT(PageLocked(vmpage));
 		LASSERT((struct cl_page *)vmpage->private == cp);
 
-		/**
-		 * clear vmpage uptodate bit, since ll_read_ahead_pages()->
-		 * ll_read_ahead_page() could pick up this stale vmpage and
-		 * take it as uptodated.
-		 */
-		ClearPageUptodate(vmpage);
-		/**
-		 * vvp_io_kernel_fault()->ll_readpage() set cp_fault_ref
-		 * and need it to check cl_page to retry the page fault read.
+		/* Drop the reference count held in vvp_page_init */
+		refcount_dec(&cp->cp_ref);
+		ClearPagePrivate(vmpage);
+		vmpage->private = 0;
+
+		/*
+		 * The reference from vmpage to cl_page is removed,
+		 * but the reference back is still here. It is removed
+		 * later in cl_page_free().
 		 */
-		if (cp->cp_fault_ref) {
-			cp->cp_defer_detach = 1;
-			/**
-			 * get a vmpage reference, so that filemap_fault()
-			 * won't free it from pagecache.
-			 */
-			get_page(vmpage);
-		} else {
-			/* Drop the reference count held in vvp_page_init */
-			refcount_dec(&cp->cp_ref);
-			ClearPagePrivate(vmpage);
-			vmpage->private = 0;
-
-			/*
-			 * The reference from vmpage to cl_page is removed,
-			 * but the reference back is still here. It is removed
-			 * later in cl_page_free().
-			 */
-		}
 	}
 }
 
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 40/42] lnet: validate data sent from user land properly
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (38 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 39/42] lustre: llite: revert: "llite: clear stale page's uptodate bit" James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 41/42] lnet: modify lnet_inetdev to work with large NIDS James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 42/42] lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Testing with improper setting from user land exposed some bugs in
the kernel's code handling of these cases. For tunables sent from
user land we need to do proper range checking. An improper cast
in the new Netlink tunables code preventing setting the default
LND tunable settings. Also silently ignore trying to set LND
tunables when its not supported. We shouldn't stop NI setup in
this case. Lastly setup the NI tunables to -1 when user land
doesn't provide any input. This tells the LND driver to use it
default values for the tunables. Resolve a double free when
setting up a NI with a non-existing interface. Another fix is for
net locking in lnet_net_cmd().

For lnetctl fix the YAML handling when only conns_per_peer is
requested. I only tested conns_per_peer and NI tunables changes
together before which missed the mentioned case.

Fixes: fafd24988 ("lnet: use Netlink to support old and new NI APIs.")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16460
Lustre-commit: 17a3b5688435ab5f7 ("LU-16460 lnet: validate data sent from user land properly")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49588
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c |  6 +++-
 net/lnet/klnds/socklnd/socklnd.c |  6 +++-
 net/lnet/lnet/api-ni.c           | 47 +++++++++++++++++++-------------
 3 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index cbb3445c7c25..67259569b392 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -533,6 +533,7 @@ static int
 kiblnd_nl_set(int cmd, struct nlattr *attr, int type, void *data)
 {
 	struct lnet_lnd_tunables *tunables = data;
+	s64 num;
 
 	if (cmd != LNET_CMD_NETS)
 		return -EOPNOTSUPP;
@@ -563,7 +564,10 @@ kiblnd_nl_set(int cmd, struct nlattr *attr, int type, void *data)
 		tunables->lnd_tun_u.lnd_o2ib.lnd_ntx = nla_get_s64(attr);
 		break;
 	case LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER:
-		tunables->lnd_tun_u.lnd_o2ib.lnd_conns_per_peer = nla_get_s64(attr);
+		num = nla_get_s64(attr);
+		clamp_t(s64, num, 1, 127);
+		tunables->lnd_tun_u.lnd_o2ib.lnd_conns_per_peer = num;
+		fallthrough;
 	default:
 		break;
 	}
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 0a4fb966f498..cc2b7f46c53b 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -39,6 +39,7 @@
 
 #include <linux/ethtool.h>
 #include <linux/inetdevice.h>
+#include <linux/kernel.h>
 #include <linux/sunrpc/addr.h>
 #include <net/addrconf.h>
 #include "socklnd.h"
@@ -854,6 +855,7 @@ static int
 ksocknal_nl_set(int cmd, struct nlattr *attr, int type, void *data)
 {
 	struct lnet_lnd_tunables *tunables = data;
+	s64 num;
 
 	if (cmd != LNET_CMD_NETS)
 		return -EOPNOTSUPP;
@@ -862,7 +864,9 @@ ksocknal_nl_set(int cmd, struct nlattr *attr, int type, void *data)
 	    nla_type(attr) != LN_SCALAR_ATTR_INT_VALUE)
 		return -EINVAL;
 
-	tunables->lnd_tun_u.lnd_sock.lnd_conns_per_peer = nla_get_s64(attr);
+	num = nla_get_s64(attr);
+	clamp_t(s64, num, 1, 127);
+	tunables->lnd_tun_u.lnd_sock.lnd_conns_per_peer = num;
 
 	return 0;
 }
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 2c7f5211bbee..a4fb95f26788 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3626,7 +3626,8 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf, u32 net_id,
 
 	mutex_unlock(&the_lnet.ln_api_mutex);
 
-	if (rc)
+	/* If NI already exist delete this new unused copy */
+	if (rc == -EEXIST)
 		lnet_ni_free(ni);
 
 	return rc;
@@ -4868,16 +4869,20 @@ static int lnet_genl_parse_tunables(struct nlattr *settings,
 		num = nla_get_s64(param);
 		switch (type) {
 		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT:
-			tun->lt_cmn.lct_peer_timeout = num;
+			if (num >= 0)
+				tun->lt_cmn.lct_peer_timeout = num;
 			break;
 		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS:
-			tun->lt_cmn.lct_peer_tx_credits = num;
+			if (num > 0)
+				tun->lt_cmn.lct_peer_tx_credits = num;
 			break;
 		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS:
-			tun->lt_cmn.lct_peer_rtr_credits = num;
+			if (num > 0)
+				tun->lt_cmn.lct_peer_rtr_credits = num;
 			break;
 		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS:
-			tun->lt_cmn.lct_max_tx_credits = num;
+			if (num > 0)
+				tun->lt_cmn.lct_max_tx_credits = num;
 			break;
 		default:
 			rc = -EINVAL;
@@ -4887,25 +4892,21 @@ static int lnet_genl_parse_tunables(struct nlattr *settings,
 	return rc;
 }
 
-static int
-lnet_genl_parse_lnd_tunables(struct nlattr *settings,
-			     struct lnet_ioctl_config_lnd_tunables *tun,
-			     const struct lnet_lnd *lnd)
+static int lnet_genl_parse_lnd_tunables(struct nlattr *settings,
+					struct lnet_lnd_tunables *tun,
+					const struct lnet_lnd *lnd)
 {
 	const struct ln_key_list *list = lnd->lnd_keys;
 	struct nlattr *param;
 	int rem, rc = 0;
 	int i = 1;
 
-	if (!list)
+	/* silently ignore these setting if the LND driver doesn't
+	 * support any LND tunables
+	 */
+	if (!list || !lnd->lnd_nl_set || !list->lkl_maxattr)
 		return 0;
 
-	if (!lnd->lnd_nl_set)
-		return -EOPNOTSUPP;
-
-	if (!list->lkl_maxattr)
-		return -ERANGE;
-
 	nla_for_each_nested(param, settings, rem) {
 		if (nla_type(param) != LN_SCALAR_ATTR_VALUE)
 			continue;
@@ -5007,7 +5008,7 @@ lnet_genl_parse_local_ni(struct nlattr *entry, struct genl_info *info,
 			}
 
 			rc = lnet_genl_parse_lnd_tunables(settings,
-							  tun, lnd);
+							  &tun->lt_tun, lnd);
 			if (rc < 0) {
 				GENL_SET_ERR_MSG(info,
 						 "failed to parse lnd tunables");
@@ -5151,7 +5152,11 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 				struct lnet_ioctl_config_lnd_tunables tun;
 
 				memset(&tun, 0, sizeof(tun));
+				/* Use LND defaults */
 				tun.lt_cmn.lct_peer_timeout = -1;
+				tun.lt_cmn.lct_peer_tx_credits = -1;
+				tun.lt_cmn.lct_peer_rtr_credits = -1;
+				tun.lt_cmn.lct_max_tx_credits = -1;
 				conf.lic_ncpts = 0;
 
 				rc = lnet_genl_parse_local_ni(entry, info,
@@ -5176,6 +5181,7 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 					if (!net) {
 						GENL_SET_ERR_MSG(info,
 								 "LNet net doesn't exist");
+						lnet_net_unlock(LNET_LOCK_EX);
 						goto out;
 					}
 					list_for_each_entry(ni, &net->net_ni_list,
@@ -5190,7 +5196,6 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 
 						lnet_net_unlock(LNET_LOCK_EX);
 						rc = lnet_dyn_del_ni(&ni->ni_nid);
-						lnet_net_lock(LNET_LOCK_EX);
 						if (rc < 0) {
 							GENL_SET_ERR_MSG(info,
 									 "cannot del LNet NI");
@@ -5199,7 +5204,11 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 						break;
 					}
 
-					lnet_net_unlock(LNET_LOCK_EX);
+					if (rc < 0) { /* will be -ENODEV */
+						GENL_SET_ERR_MSG(info,
+								 "interface invalid for deleting LNet NI");
+						lnet_net_unlock(LNET_LOCK_EX);
+					}
 				} else {
 					rc = lnet_dyn_add_ni(&conf, net_id, &tun);
 					switch (rc) {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 41/42] lnet: modify lnet_inetdev to work with large NIDS
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (39 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 40/42] lnet: validate data sent from user land properly James Simmons
@ 2023-01-23 23:00 ` James Simmons
  2023-01-23 23:00 ` [lustre-devel] [PATCH 42/42] lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Change li_ipv6 field in struct lnet_inetdev to li_size which
now represents the size of the NID address. This will work
with the GUID of Inifiniband as well. Second change is
to store li_ipaddr always in network format. This will allow
direct comparsion between li_ipaddr and the nid_addr of
struct lnet_nid. We will ensure AF_IB will also be in the
same format as what will be stored in struct lnet_nid.
Implement setup with a NID address for the ko2iblnd LND driver.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13642
Lustre-commit: 0b406c91d175b6cdb ("LU-13642 lnet: modify lnet_inetdev to work with large NIDS")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49525
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-lnet.h        |  2 +-
 include/uapi/linux/lnet/lnet-types.h | 10 ---------
 net/lnet/klnds/o2iblnd/o2iblnd.c     | 16 ++++++++------
 net/lnet/klnds/socklnd/socklnd.c     |  6 +++---
 net/lnet/lnet/config.c               | 32 ++++++++++++++--------------
 5 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index ed28af6fe8d5..d03dcf849bd1 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -832,7 +832,7 @@ struct lnet_inetdev {
 	};
 	u32	li_index;
 	bool	li_iff_master;
-	bool	li_ipv6;
+	u32	li_size;
 	char	li_name[IFNAMSIZ];
 };
 
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 8a1d2d749b4b..6c6a66ebfd44 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -120,16 +120,6 @@ static inline bool nid_is_nid4(const struct lnet_nid *nid)
 	return NID_ADDR_BYTES(nid) == 4;
 }
 
-static inline bool nid_is_ipv4(const struct lnet_nid *nid)
-{
-	return NID_ADDR_BYTES(nid) == 4;
-}
-
-static inline bool nid_is_ipv6(const struct lnet_nid *nid)
-{
-	return NID_ADDR_BYTES(nid) == 16;
-}
-
 /* check for address set */
 static inline bool nid_addr_is_set(const struct lnet_nid *nid)
 {
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 67259569b392..c1dfbe58eeb8 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -3202,6 +3202,9 @@ static int kiblnd_startup(struct lnet_ni *ni)
 		ifname = ni->ni_interface;
 	} else {
 		ifname = *kiblnd_tunables.kib_default_ipif;
+		rc = libcfs_strnid(&ni->ni_nid, ifname);
+		if (rc < 0 || ni->ni_nid.nid_type != O2IBLND)
+			memset(&ni->ni_nid, 0, sizeof(ni->ni_nid));
 	}
 
 	if (strlen(ifname) >= sizeof(ibdev->ibd_ifname)) {
@@ -3214,12 +3217,13 @@ static int kiblnd_startup(struct lnet_ni *ni)
 	if (rc < 0)
 		goto failed;
 
-	for (i = 0; i < rc; i++) {
-		if (strcmp(ifname, ifaces[i].li_name) == 0)
-			break;
-	}
+	i = lnet_inet_select(ni, ifaces, rc);
+	if (i < 0)
+		goto failed;
 
-	if (i == rc) {
+	if (nid_addr_is_set(&ni->ni_nid)) {
+		strscpy(ifname, ifaces[i].li_name, sizeof(ifname));
+	} else if (strcmp(ifname, ifaces[i].li_name) != 0) {
 		CERROR("ko2iblnd: No matching interfaces\n");
 		rc = -ENOENT;
 		goto failed;
@@ -3235,7 +3239,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
 			goto failed;
 		}
 
-		ibdev->ibd_ifip = ifaces[i].li_ipaddr;
+		ibdev->ibd_ifip = ntohl(ifaces[i].li_ipaddr);
 		strlcpy(ibdev->ibd_ifname, ifaces[i].li_name,
 			sizeof(ibdev->ibd_ifname));
 		ibdev->ibd_can_failover = ifaces[i].li_iff_master;
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index cc2b7f46c53b..b8d6e287a093 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -2586,7 +2586,7 @@ ksocknal_startup(struct lnet_ni *ni)
 
 	ni->ni_dev_cpt = ifaces[if_idx].li_cpt;
 	ksi->ksni_index = ifaces[if_idx].li_index;
-	if (ifaces[if_idx].li_ipv6) {
+	if (ifaces[if_idx].li_size == sizeof(struct in6_addr)) {
 		struct sockaddr_in6 *sa;
 		sa = (void *)&ksi->ksni_addr;
 		memset(sa, 0, sizeof(*sa));
@@ -2601,9 +2601,9 @@ ksocknal_startup(struct lnet_ni *ni)
 		sa = (void *)&ksi->ksni_addr;
 		memset(sa, 0, sizeof(*sa));
 		sa->sin_family = AF_INET;
-		sa->sin_addr.s_addr = htonl(ifaces[if_idx].li_ipaddr);
+		sa->sin_addr.s_addr = ifaces[if_idx].li_ipaddr;
 		ksi->ksni_netmask = ifaces[if_idx].li_netmask;
-		ni->ni_nid.nid_size = 4 - 4;
+		ni->ni_nid.nid_size = 0;
 		ni->ni_nid.nid_addr[0] = sa->sin_addr.s_addr;
 	}
 	strlcpy(ksi->ksni_name, ifaces[if_idx].li_name, sizeof(ksi->ksni_name));
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index 0c4405f0f13b..a54e1dbe3e2e 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -1546,9 +1546,9 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6)
 
 			ifaces[nip].li_cpt = cpt;
 			ifaces[nip].li_iff_master = !!(flags & IFF_MASTER);
-			ifaces[nip].li_ipv6 = false;
+			ifaces[nip].li_size = sizeof(ifa->ifa_local);
 			ifaces[nip].li_index = dev->ifindex;
-			ifaces[nip].li_ipaddr = ntohl(ifa->ifa_local);
+			ifaces[nip].li_ipaddr = ifa->ifa_local;
 			ifaces[nip].li_netmask = ntohl(ifa->ifa_mask);
 			strlcpy(ifaces[nip].li_name, ifa->ifa_label,
 				sizeof(ifaces[nip].li_name));
@@ -1586,7 +1586,7 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6)
 
 			ifaces[nip].li_cpt = cpt;
 			ifaces[nip].li_iff_master = !!(flags & IFF_MASTER);
-			ifaces[nip].li_ipv6 = true;
+			ifaces[nip].li_size = sizeof(struct in6_addr);
 			ifaces[nip].li_index = dev->ifindex;
 			memcpy(ifaces[nip].li_ipv6addr,
 			       &ifa6->addr, sizeof(struct in6_addr));
@@ -1636,16 +1636,14 @@ int lnet_inet_select(struct lnet_ni *ni,
 			/* IP unspecified, use IP of first matching interface */
 			break;
 
-		if (ifaces[if_idx].li_ipv6 &&
-		    nid_is_ipv6(&ni->ni_nid)) {
-			if (memcmp(ni->ni_nid.nid_addr,
-				   ifaces[if_idx].li_ipv6addr,
-				   sizeof(struct in6_addr)) == 0)
-				break;
-		} else if (!ifaces[if_idx].li_ipv6 &&
-			   nid_is_ipv4(&ni->ni_nid)) {
-			if (ni->ni_nid.nid_addr[0] ==
-			    htonl(ifaces[if_idx].li_ipaddr))
+		if (ifaces[if_idx].li_size == NID_ADDR_BYTES(&ni->ni_nid)) {
+			char *addr = (char *)&ifaces[if_idx].li_ipaddr;
+
+			if (ifaces[if_idx].li_size != 4)
+				addr = (char *)ifaces[if_idx].li_ipv6addr;
+
+			if (memcmp(ni->ni_nid.nid_addr, addr,
+				   ifaces[if_idx].li_size) == 0)
 				break;
 		}
 	}
@@ -1654,11 +1652,13 @@ int lnet_inet_select(struct lnet_ni *ni,
 		return if_idx;
 
 	if (ni->ni_interface)
-		CERROR("ksocklnd: failed to find interface %s%s%s\n",
+		CERROR("%s: failed to find interface %s%s%s\n",
+		       libcfs_lnd2modname(ni->ni_nid.nid_type),
 		       ni->ni_interface, addr_set ? "@" : "",
 		       addr_set ? libcfs_nidstr(&ni->ni_nid) : "");
 	else
-		CERROR("ksocklnd: failed to find IP address %s\n",
+		CERROR("%s: failed to find IP address %s\n",
+		       libcfs_lnd2modname(ni->ni_nid.nid_type),
 		       libcfs_nidstr(&ni->ni_nid));
 
 	return -EINVAL;
@@ -1700,7 +1700,7 @@ lnet_parse_ip2nets(const char **networksp, const char *ip2nets)
 	}
 
 	for (i = 0; i < nip; i++)
-		ipaddrs[i] = ifaces[i].li_ipaddr;
+		ipaddrs[i] = ntohl(ifaces[i].li_ipaddr);
 
 	rc = lnet_match_networks(networksp, ip2nets, ipaddrs, nip);
 	if (rc < 0) {
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [lustre-devel] [PATCH 42/42] lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK
  2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
                   ` (40 preceding siblings ...)
  2023-01-23 23:00 ` [lustre-devel] [PATCH 41/42] lnet: modify lnet_inetdev to work with large NIDS James Simmons
@ 2023-01-23 23:00 ` James Simmons
  41 siblings, 0 replies; 43+ messages in thread
From: James Simmons @ 2023-01-23 23:00 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

The LDLM_FL_SERVER_LOCK flag and accompanying accessor macros have
never been used since they were first introduced.  Remove them.
It looks like this may have been duplicated by LDLM_FL_NS_SRV.

WC-bug-id: https://jira.whamcloud.com/browse/LU-2771
Lustre-commit: cb0aa0285b32fb432 ("LU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49563
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm_flags.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm_flags.h b/fs/lustre/include/lustre_dlm_flags.h
index 25480ee218e1..d4cd231ce80e 100644
--- a/fs/lustre/include/lustre_dlm_flags.h
+++ b/fs/lustre/include/lustre_dlm_flags.h
@@ -340,12 +340,6 @@
 #define ldlm_set_destroyed(_l)		LDLM_SET_FLAG((_l), 1ULL << 50)
 #define ldlm_clear_destroyed(_l)	LDLM_CLEAR_FLAG((_l), 1ULL << 50)
 
-/** flag whether this is a server namespace lock */
-#define LDLM_FL_SERVER_LOCK		0x0008000000000000ULL /* bit 51 */
-#define ldlm_is_server_lock(_l)		LDLM_TEST_FLAG((_l), 1ULL << 51)
-#define ldlm_set_server_lock(_l)	LDLM_SET_FLAG((_l), 1ULL << 51)
-#define ldlm_clear_server_lock(_l)	LDLM_CLEAR_FLAG((_l), 1ULL << 51)
-
 /**
  * It's set in lock_res_and_lock() and unset in unlock_res_and_lock().
  *
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2023-01-23 23:43 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-23 23:00 [lustre-devel] [PATCH 00/42] lustre: sync to OpenSFS tree as of Jan 22 2023 James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 01/42] lustre: osc: pack osc_async_page better James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 02/42] lnet: lnet_peer_merge_data to understand large addr James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 03/42] lnet: router_discover - handle large addrs in ping James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 04/42] lnet: Drop LNet message if deadline exceeded James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 05/42] lnet: change lnet_find_best_lpni to handle large NIDs James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 06/42] lustre: ldebugfs: add histogram to stats counter James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 07/42] lustre: llite: wake_up after cl_object_kill James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 08/42] lustre: pcc: use two bits to indicate pcc type for attach James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 09/42] lustre: ldebugfs: make job_stats and rename_stats valid YAML James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 10/42] lustre: misc: fix stats snapshot_time to use wallclock James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 11/42] lustre: pools: force creation of a component without a pool James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 12/42] lustre: sec: reserve flag for fid2path for encrypted files James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 13/42] lustre: llite: update statx size/ctime for fallocate James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 14/42] lustre: ptlrpc: fiemap flexible array James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 15/42] lustre: ptlrpc: Add LCME_FL_PARITY to wirecheck James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 16/42] lnet: selftest: lst read-outside of allocation James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 17/42] lustre: misc: rename lprocfs_stats functions James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 18/42] lustre: osc: Fix possible null pointer James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 19/42] lustre: ptlrpc: NUL terminate long jobid strings James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 20/42] lustre: uapi: remove _GNU_SOURCE dependency in lustre_user.h James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 21/42] lnet: handles unregister/register events James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 22/42] lustre: update version to 2.15.53 James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 24/42] lustre: move to kobj_type default_groups James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 25/42] lnet: increase transaction timeout James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 26/42] lnet: Allow IP specification James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 27/42] lustre: obdclass: fix T10PI prototypes James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 28/42] lustre: obdclass: prefer T10 checksum if the target supports it James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 29/42] lustre: llite: remove false outdated comment James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 30/42] lnet: socklnd: clarify error message on timeout James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 31/42] lustre: llite: replace selinux_is_enabled() James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 32/42] lustre: enc: S_ENCRYPTED flag on OST objects for enc files James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 33/42] lnet: asym route inconsistency warning James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 34/42] lnet: o2iblnd: reset hiw proportionally James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 35/42] lnet: libcfs: cfs_hash_for_each_empty optimization James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 36/42] lustre: llite: always enable remote subdir mount James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 37/42] lnet: selftest: migrate LNet selftest group handling to Netlink James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 38/42] lnet: use Netlink to support LNet ping commands James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 39/42] lustre: llite: revert: "llite: clear stale page's uptodate bit" James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 40/42] lnet: validate data sent from user land properly James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 41/42] lnet: modify lnet_inetdev to work with large NIDS James Simmons
2023-01-23 23:00 ` [lustre-devel] [PATCH 42/42] lustre: ldlm: remove obsolete LDLM_FL_SERVER_LOCK James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).