lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022
@ 2022-11-20 14:16 James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit James Simmons
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Merge the next batch of work from the OpenSFS tree to the Linux
native client. Most of the IPv6 ping work is now merged.

Alex Zhuravlev (1):
  lustre: llite: remove linefeed from LDLM_DEBUG

Bobi Jam (1):
  lustre: llite: clear stale page's uptodate bit

Chris Horn (2):
  lnet: Don't modify uptodate peer with temp NI
  lnet: Signal completion on ping send failure

James Simmons (3):
  lnet: use Netlink to support old and new NI APIs.
  lnet: fix build issue when IPv6 is disabled.
  lnet: selftest: migrate LNet selftest session handling to Netlink

Lei Feng (1):
  lustre: obdclass: fill jobid in a safe way

Mikhail Pershin (1):
  lustre: llog: skip bad records in llog

Mr NeilBrown (7):
  lustre: obdclass: improve precision of wakeups for mod_rpcs
  lnet: allow ping packet to contain large nids
  lnet: extend lnet_is_nid_in_ping_info()
  lnet: find correct primary for peer
  lnet: change lnet_notify() to take struct lnet_nid
  lnet: discard lnet_nid2ni_*()
  lnet: change lnet_debug_peer() to struct lnet_nid

Patrick Farrell (1):
  lustre: osc: Remove oap lock

Serguei Smirnov (2):
  lnet: o2iblnd: add verbose debug prints for rx/tx events
  lnet: fix debug message in lnet_discovery_event_reply

Shaun Tancheff (1):
  lustre: llite: Explicitly support .splice_write

Vitaly Fertman (2):
  lustre: clio: append to non-existent component
  lustre: ldlm: group lock unlock fix

 fs/lustre/include/cl_object.h          |  15 +-
 fs/lustre/include/lustre_dlm.h         |   1 +
 fs/lustre/include/lustre_osc.h         |  17 -
 fs/lustre/include/obd.h                |   1 -
 fs/lustre/ldlm/ldlm_lib.c              |   1 -
 fs/lustre/ldlm/ldlm_lock.c             |  28 +-
 fs/lustre/llite/file.c                 |  79 ++-
 fs/lustre/llite/llite_internal.h       |   3 +
 fs/lustre/llite/llite_lib.c            |   3 +
 fs/lustre/llite/namei.c                |   2 +-
 fs/lustre/llite/rw.c                   |  10 +-
 fs/lustre/llite/vvp_io.c               | 124 ++++-
 fs/lustre/llite/vvp_page.c             |   5 +
 fs/lustre/lov/lov_page.c               |   2 +
 fs/lustre/mdc/mdc_dev.c                |  58 +-
 fs/lustre/obdclass/cl_page.c           |  37 +-
 fs/lustre/obdclass/genops.c            | 158 +++---
 fs/lustre/obdclass/jobid.c             |  13 +-
 fs/lustre/obdclass/llog.c              |  86 +--
 fs/lustre/osc/osc_cache.c              |  11 -
 fs/lustre/osc/osc_io.c                 |   8 +-
 fs/lustre/osc/osc_lock.c               | 157 +-----
 fs/lustre/osc/osc_object.c             |  16 -
 fs/lustre/osc/osc_page.c               |   5 -
 fs/lustre/osc/osc_request.c            |  14 +-
 include/linux/lnet/lib-lnet.h          |  23 +-
 include/linux/lnet/lib-types.h         | 142 +++++
 include/uapi/linux/lnet/libcfs_ioctl.h |   2 +-
 include/uapi/linux/lnet/lnet-dlc.h     |  23 +
 include/uapi/linux/lnet/lnet-idl.h     |  58 +-
 include/uapi/linux/lnet/lnet-types.h   |  15 +
 include/uapi/linux/lnet/lnetst.h       |  21 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c       |  88 ++-
 net/lnet/klnds/o2iblnd/o2iblnd.h       |  94 +++-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c    | 136 +++--
 net/lnet/klnds/socklnd/socklnd.c       |  39 +-
 net/lnet/klnds/socklnd/socklnd.h       |   9 +
 net/lnet/lnet/api-ni.c                 | 961 ++++++++++++++++++++++++++++-----
 net/lnet/lnet/config.c                 |   6 +-
 net/lnet/lnet/lib-msg.c                |   2 +-
 net/lnet/lnet/module.c                 |  42 +-
 net/lnet/lnet/peer.c                   | 144 ++++-
 net/lnet/lnet/router.c                 |  15 +-
 net/lnet/selftest/conctl.c             | 349 +++++++++---
 net/lnet/selftest/conrpc.c             |  28 +-
 net/lnet/selftest/console.c            |  81 +--
 net/lnet/selftest/console.h            |  68 +--
 net/lnet/selftest/framework.c          |  43 +-
 net/lnet/selftest/selftest.h           |  78 ++-
 49 files changed, 2427 insertions(+), 894 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock James Simmons
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage()
call path before deleting vmpage from page cache, the page could be
possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait().

If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's
uptodate bit, the read ahead could pick it up and think it's already
uptodate wrongly.

In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the
filemap_fault() will call ll_readpage() to read vmpage and wait for
the unlock of the vmpage, and when ll_readpage() successfully read
the vmpage then unlock the vmpage, memory pressure or truncate can
get in and delete the cl_page, afterward filemap_fault() find that
the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix
this situation, this patch makes vvp_io_kernel_fault() restart
filemap_fault() to get uptodated vmpage again.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16160
Lustre-commit: 5b911e03261c3de6b ("LU-16160 llite: clear stale page's uptodate bit")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48607
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h |  15 ++++-
 fs/lustre/llite/rw.c          |  10 +++-
 fs/lustre/llite/vvp_io.c      | 124 +++++++++++++++++++++++++++++++++++++++---
 fs/lustre/llite/vvp_page.c    |   5 ++
 fs/lustre/obdclass/cl_page.c  |  37 ++++++++++---
 5 files changed, 172 insertions(+), 19 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 41ce0b0..8be58ff 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -768,7 +768,15 @@ struct cl_page {
 	enum cl_page_type		 cp_type:CP_TYPE_BITS;
 	unsigned int			 cp_defer_uptodate:1,
 					 cp_ra_updated:1,
-					 cp_ra_used:1;
+					 cp_ra_used:1,
+					 /* fault page read grab extra referece */
+					 cp_fault_ref:1,
+					 /**
+					  * if fault page got delete before returned to
+					  * filemap_fault(), defer the vmpage detach/put
+					  * until filemap_fault() has been handled.
+					  */
+					 cp_defer_detach:1;
 
 	/* which slab kmem index this memory allocated from */
 	short int			 cp_kmem_index;
@@ -2393,6 +2401,11 @@ int cl_io_lru_reserve(const struct lu_env *env, struct cl_io *io,
 int cl_io_read_ahead(const struct lu_env *env, struct cl_io *io,
 		     pgoff_t start, struct cl_read_ahead *ra);
 
+static inline int cl_io_is_pagefault(const struct cl_io *io)
+{
+	return io->ci_type == CIT_FAULT && !io->u.ci_fault.ft_mkwrite;
+}
+
 /**
  * True, if @io is an O_APPEND write(2).
  */
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 2290b31..0283af4 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1947,7 +1947,15 @@ int ll_readpage(struct file *file, struct page *vmpage)
 			unlock_page(vmpage);
 			result = 0;
 		}
-		cl_page_put(env, page);
+		if (cl_io_is_pagefault(io) && result == 0) {
+			/**
+			 * page fault, retain the cl_page reference until
+			 * vvp_io_kernel_fault() release it.
+			 */
+			page->cp_fault_ref = 1;
+		} else {
+			cl_page_put(env, page);
+		}
 	} else {
 		unlock_page(vmpage);
 		result = PTR_ERR(page);
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index ef7a3d92..be6f17f 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -1292,14 +1292,41 @@ static void vvp_io_rw_end(const struct lu_env *env,
 	trunc_sem_up_read(&lli->lli_trunc_sem);
 }
 
-static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
+static void detach_and_deref_page(struct cl_page *clp, struct page *vmpage)
+{
+	if (!clp->cp_defer_detach)
+		return;
+
+	/**
+	 * cl_page_delete0() took a vmpage reference, but not unlink the vmpage
+	 * from its cl_page.
+	 */
+	clp->cp_defer_detach = 0;
+	ClearPagePrivate(vmpage);
+	vmpage->private = 0;
+
+	put_page(vmpage);
+	refcount_dec(&clp->cp_ref);
+}
+
+static int vvp_io_kernel_fault(const struct lu_env *env,
+			       struct vvp_fault_io *cfio)
 {
 	struct vm_fault *vmf = cfio->ft_vmf;
+	struct file *vmff = cfio->ft_vma->vm_file;
+	struct address_space *mapping = vmff->f_mapping;
+	struct inode *inode = mapping->host;
+	struct page *vmpage = NULL;
+	struct cl_page *clp = NULL;
+	int rc = 0;
 
+	ll_inode_size_lock(inode);
+retry:
 	cfio->ft_flags = filemap_fault(vmf);
 	cfio->ft_flags_valid = 1;
 
 	if (vmf->page) {
+		/* success, vmpage is locked */
 		CDEBUG(D_PAGE,
 		       "page %p map %p index %lu flags %lx count %u priv %0lx: got addr %p type NOPAGE\n",
 		       vmf->page, vmf->page->mapping, vmf->page->index,
@@ -1311,24 +1338,105 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 		}
 
 		cfio->ft_vmpage = vmf->page;
-		return 0;
+
+		/**
+		 * ll_filemap_fault()->ll_readpage() could get an extra cl_page
+		 * reference. So we have to get the cl_page's to check its
+		 * cp_fault_ref and drop the reference later.
+		 */
+		clp = cl_vmpage_page(vmf->page, NULL);
+
+		goto unlock;
+	}
+
+	/* filemap_fault() fails, vmpage is not locked */
+	if (!clp) {
+		vmpage = find_get_page(mapping, vmf->pgoff);
+		if (vmpage) {
+			lock_page(vmpage);
+			clp = cl_vmpage_page(vmpage, NULL);
+			unlock_page(vmpage);
+		}
 	}
 
 	if (cfio->ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
+		pgoff_t max_idx;
+
+		/**
+		 * ll_filemap_fault()->ll_readpage() could fill vmpage
+		 * correctly, and unlock the vmpage, while memory pressure or
+		 * truncate could detach cl_page from vmpage, and kernel
+		 * filemap_fault() will wait_on_page_locked(vmpage) and find
+		 * out that the vmpage has been cleared its uptodate bit,
+		 * so it returns VM_FAULT_SIGBUS.
+		 *
+		 * In this case, we'd retry the filemap_fault()->ll_readpage()
+		 * to rebuild the cl_page and fill vmpage with uptodated data.
+		 */
+		if (likely(vmpage)) {
+			bool need_retry = false;
+
+			if (clp) {
+				if (clp->cp_defer_detach) {
+					detach_and_deref_page(clp, vmpage);
+					/**
+					 * check i_size to make sure it's not
+					 * over EOF, we don't want to call
+					 * filemap_fault() repeatedly since it
+					 * returns VM_FAULT_SIGBUS without even
+					 * trying if vmf->pgoff is over EOF.
+					 */
+					max_idx = DIV_ROUND_UP(i_size_read(inode),
+							       PAGE_SIZE);
+					if (vmf->pgoff < max_idx)
+						need_retry = true;
+				}
+				if (clp->cp_fault_ref) {
+					clp->cp_fault_ref = 0;
+					/* ref not released in ll_readpage() */
+					cl_page_put(env, clp);
+				}
+				if (need_retry)
+					goto retry;
+			}
+		}
+
 		CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", (void *)vmf->address);
-		return -EFAULT;
+		rc = -EFAULT;
+		goto unlock;
 	}
 
 	if (cfio->ft_flags & VM_FAULT_OOM) {
 		CDEBUG(D_PAGE, "got addr %p - OOM\n", (void *)vmf->address);
-		return -ENOMEM;
+		rc = -ENOMEM;
+		goto unlock;
 	}
 
-	if (cfio->ft_flags & VM_FAULT_RETRY)
-		return -EAGAIN;
+	if (cfio->ft_flags & VM_FAULT_RETRY) {
+		rc = -EAGAIN;
+		goto unlock;
+	}
 
 	CERROR("Unknown error in page fault %d!\n", cfio->ft_flags);
-	return -EINVAL;
+	rc = -EINVAL;
+unlock:
+	ll_inode_size_unlock(inode);
+	if (clp) {
+		if (clp->cp_defer_detach && vmpage)
+			detach_and_deref_page(clp, vmpage);
+
+		/* additional cl_page ref has been taken in ll_readpage() */
+		if (clp->cp_fault_ref) {
+			clp->cp_fault_ref = 0;
+			/* ref not released in ll_readpage() */
+			cl_page_put(env, clp);
+		}
+		/* ref taken in this function */
+		cl_page_put(env, clp);
+	}
+	if (vmpage)
+		put_page(vmpage);
+	return rc;
 }
 
 static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io,
@@ -1368,7 +1476,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 		LASSERT(cfio->ft_vmpage);
 		lock_page(cfio->ft_vmpage);
 	} else {
-		result = vvp_io_kernel_fault(cfio);
+		result = vvp_io_kernel_fault(env, cfio);
 		if (result != 0)
 			return result;
 	}
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index f359596..9e8c158 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -104,6 +104,11 @@ static void vvp_page_completion_read(const struct lu_env *env,
 		ll_ra_count_put(ll_i2sbi(inode), 1);
 
 	if (ioret == 0)  {
+		/**
+		 * cp_defer_uptodate is used for readahead page, and the
+		 * vmpage Uptodate bit is deferred to set in ll_readpage/
+		 * ll_io_read_page.
+		 */
 		if (!cp->cp_defer_uptodate)
 			SetPageUptodate(vmpage);
 	} else if (cp->cp_defer_uptodate) {
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 7011235..3bc1a9b 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -725,16 +725,35 @@ static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp)
 		LASSERT(PageLocked(vmpage));
 		LASSERT((struct cl_page *)vmpage->private == cp);
 
-		/* Drop the reference count held in vvp_page_init */
-		refcount_dec(&cp->cp_ref);
-		ClearPagePrivate(vmpage);
-		vmpage->private = 0;
-
-		/*
-		 * The reference from vmpage to cl_page is removed,
-		 * but the reference back is still here. It is removed
-		 * later in cl_page_free().
+		/**
+		 * clear vmpage uptodate bit, since ll_read_ahead_pages()->
+		 * ll_read_ahead_page() could pick up this stale vmpage and
+		 * take it as uptodated.
 		 */
+		ClearPageUptodate(vmpage);
+		/**
+		 * vvp_io_kernel_fault()->ll_readpage() set cp_fault_ref
+		 * and need it to check cl_page to retry the page fault read.
+		 */
+		if (cp->cp_fault_ref) {
+			cp->cp_defer_detach = 1;
+			/**
+			 * get a vmpage reference, so that filemap_fault()
+			 * won't free it from pagecache.
+			 */
+			get_page(vmpage);
+		} else {
+			/* Drop the reference count held in vvp_page_init */
+			refcount_dec(&cp->cp_ref);
+			ClearPagePrivate(vmpage);
+			vmpage->private = 0;
+
+			/*
+			 * The reference from vmpage to cl_page is removed,
+			 * but the reference back is still here. It is removed
+			 * later in cl_page_free().
+			 */
+		}
 	}
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI James Simmons
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

The OAP lock is taken around setting the oap flags, but not
any of the other fields in oap.  As far as I can tell, this
is just some cargo cult belief about locking - there's no
reason for it.

Remove it entirely.  (From the code, a queued spin lock
appears to be 12 bytes on x86_64.)

WC-bug-id: https://jira.whamcloud.com/browse/LU-15619
Lustre-commit: b2274a716087fad24 ("LU-15619 osc: Remove oap lock")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46719
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |  2 --
 fs/lustre/osc/osc_cache.c      | 11 -----------
 fs/lustre/osc/osc_io.c         |  8 ++------
 fs/lustre/osc/osc_page.c       |  5 -----
 4 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 2e8c184..a0f1afc 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -88,8 +88,6 @@ struct osc_async_page {
 
 	struct ptlrpc_request	*oap_request;
 	struct osc_object	*oap_obj;
-
-	spinlock_t		oap_lock;
 };
 
 #define oap_page	oap_brw_page.pg
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index e563809..b5776a1 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -1140,9 +1140,7 @@ static int osc_extent_make_ready(const struct lu_env *env,
 		rc = osc_make_ready(env, oap, OBD_BRW_WRITE);
 		switch (rc) {
 		case 0:
-			spin_lock(&oap->oap_lock);
 			oap->oap_async_flags |= ASYNC_READY;
-			spin_unlock(&oap->oap_lock);
 			break;
 		case -EALREADY:
 			LASSERT((oap->oap_async_flags & ASYNC_READY) != 0);
@@ -1165,9 +1163,7 @@ static int osc_extent_make_ready(const struct lu_env *env,
 			 "last_oap_count %d\n", last_oap_count);
 		LASSERT(last->oap_page_off + last_oap_count <= PAGE_SIZE);
 		last->oap_count = last_oap_count;
-		spin_lock(&last->oap_lock);
 		last->oap_async_flags |= ASYNC_COUNT_STABLE;
-		spin_unlock(&last->oap_lock);
 	}
 
 	/* for the rest of pages, we don't need to call osf_refresh_count()
@@ -1176,9 +1172,7 @@ static int osc_extent_make_ready(const struct lu_env *env,
 	list_for_each_entry(oap, &ext->oe_pages, oap_pending_item) {
 		if (!(oap->oap_async_flags & ASYNC_COUNT_STABLE)) {
 			oap->oap_count = PAGE_SIZE - oap->oap_page_off;
-			spin_lock(&last->oap_lock);
 			oap->oap_async_flags |= ASYNC_COUNT_STABLE;
-			spin_unlock(&last->oap_lock);
 		}
 	}
 
@@ -1866,9 +1860,7 @@ static void osc_ap_completion(const struct lu_env *env, struct client_obd *cli,
 	}
 
 	/* As the transfer for this page is being done, clear the flags */
-	spin_lock(&oap->oap_lock);
 	oap->oap_async_flags = 0;
-	spin_unlock(&oap->oap_lock);
 
 	if (oap->oap_cmd & OBD_BRW_WRITE && xid > 0) {
 		spin_lock(&cli->cl_loi_list_lock);
@@ -2330,7 +2322,6 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops,
 	INIT_LIST_HEAD(&oap->oap_pending_item);
 	INIT_LIST_HEAD(&oap->oap_rpc_item);
 
-	spin_lock_init(&oap->oap_lock);
 	CDEBUG(D_INFO, "oap %p vmpage %p obj off %llu\n",
 	       oap, vmpage, oap->oap_obj_off);
 	return 0;
@@ -2619,9 +2610,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
 	if (rc)
 		goto out;
 
-	spin_lock(&oap->oap_lock);
 	oap->oap_async_flags |= ASYNC_READY | ASYNC_URGENT;
-	spin_unlock(&oap->oap_lock);
 
 	if (current->flags & PF_MEMALLOC)
 		ext->oe_memalloc = 1;
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index aa8f61d..b9362d9 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -192,12 +192,8 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios,
 			continue;
 		}
 
-		if (page->cp_type != CPT_TRANSIENT) {
-			spin_lock(&oap->oap_lock);
-			oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY;
-			oap->oap_async_flags |= ASYNC_COUNT_STABLE;
-			spin_unlock(&oap->oap_lock);
-		}
+		if (page->cp_type != CPT_TRANSIENT)
+			oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY | ASYNC_COUNT_STABLE;
 
 		osc_page_submit(env, opg, crt, brw_flags);
 		list_add_tail(&oap->oap_pending_item, &list);
diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c
index ba10ba3..667825a 100644
--- a/fs/lustre/osc/osc_page.c
+++ b/fs/lustre/osc/osc_page.c
@@ -204,12 +204,7 @@ static void osc_page_clip(const struct lu_env *env,
 	opg->ops_from = from;
 	/* argument @to is exclusive, but @ops_to is inclusive */
 	opg->ops_to = to - 1;
-	/* This isn't really necessary for transient pages, but we also don't
-	 * call clip on transient pages often, so it's OK.
-	 */
-	spin_lock(&oap->oap_lock);
 	oap->oap_async_flags |= ASYNC_COUNT_STABLE;
-	spin_unlock(&oap->oap_lock);
 }
 
 static int osc_page_flush(const struct lu_env *env,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write James Simmons
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

When processing the config log it is possible that we attempt to
add temp NIs after discovery has completed on a peer. These temp
may not actually exist on the peer. Since discovery has already
completed the peer is considered up-to-date and we can end up with
incorrect peer entries. We shouldn't add temp NIs to a peer that
is already up-to-date.

HPE-bug-id: LUS-10867
WC-bug-id: https://jira.whamcloud.com/browse/LU-15852
Lustre-commit: 8f718df474e453fbc ("LU-15852 lnet: Don't modify uptodate peer with temp NI")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47322
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index d8d1857..52ad791 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1855,6 +1855,7 @@ struct lnet_peer_net *
 int
 lnet_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr,
 		 bool temp)
+__must_hold(&the_lnet.ln_api_mutex)
 {
 	struct lnet_peer *lp = NULL;
 	struct lnet_peer_ni *lpni;
@@ -1906,6 +1907,13 @@ struct lnet_peer_net *
 		return -EPERM;
 	}
 
+	if (temp && lnet_peer_is_uptodate(lp)) {
+		CDEBUG(D_NET,
+		       "Don't add temporary peer NI for uptodate peer %s\n",
+		       libcfs_nidstr(&lp->lp_primary_nid));
+		return -EINVAL;
+	}
+
 	return lnet_peer_add_nid(lp, nid, flags);
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (2 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events James Simmons
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

Linux commit v5.9-rc1-6-g36e2c7421f02
  fs: don't allow splice read/write without explicit ops

Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.

HPE-bug-id: LUS-11259
WC-bug-id: https://jira.whamcloud.com/browse/LU-16258
Lustre-commit: c619b6d6a54235cc0 ("LU-16258 llite: Explicitly support .splice_write")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48928
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 350d5df..34a449e 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5564,6 +5564,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 	.mmap			= ll_file_mmap,
 	.llseek			= ll_file_seek,
 	.splice_read		= generic_file_splice_read,
+	.splice_write		= iter_file_splice_write,
 	.fsync			= ll_fsync,
 	.flush			= ll_flush,
 	.fallocate		= ll_fallocate,
@@ -5578,6 +5579,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 	.mmap			= ll_file_mmap,
 	.llseek			= ll_file_seek,
 	.splice_read		= generic_file_splice_read,
+	.splice_write		= iter_file_splice_write,
 	.fsync			= ll_fsync,
 	.flush			= ll_flush,
 	.flock			= ll_file_flock,
@@ -5595,6 +5597,7 @@ int ll_inode_permission(struct inode *inode, int mask)
 	.mmap			= ll_file_mmap,
 	.llseek			= ll_file_seek,
 	.splice_read		= generic_file_splice_read,
+	.splice_write		= iter_file_splice_write,
 	.fsync			= ll_fsync,
 	.flush			= ll_flush,
 	.flock			= ll_file_noflock,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (3 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs James Simmons
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Added/modified debug messages for syncing with mlnx driver
debug output. On rx/tx events print message type, size and
peer credits. Make printing of debug message on o2iblnd conn
refcount change events compile-time optional. Add compile-time
option for dumping detailed connection state info to net debug log.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16172
Lustre-commit: b4e06ff1e4856ce08 ("LU-16172 o2iblnd: add verbose debug prints for rx/tx events")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48600
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  78 +++++++++++++++++-------
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 117 ++++++++++++++++++++++--------------
 2 files changed, 129 insertions(+), 66 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 56d486f..bef7a55 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -588,28 +588,32 @@ static inline int kiblnd_timeout(void)
 	return dev->ibd_can_failover;
 }
 
-#define kiblnd_conn_addref(conn)					\
-do {									\
-	CDEBUG(D_NET, "conn[%p] (%d)++\n",				\
-	       (conn), atomic_read(&(conn)->ibc_refcount));		\
-	atomic_inc(&(conn)->ibc_refcount);				\
-} while (0)
-
-#define kiblnd_conn_decref(conn)					\
-do {									\
-	unsigned long flags;						\
-									\
-	CDEBUG(D_NET, "conn[%p] (%d)--\n",				\
-	       (conn), atomic_read(&(conn)->ibc_refcount));		\
-	LASSERT_ATOMIC_POS(&(conn)->ibc_refcount);			\
-	if (atomic_dec_and_test(&(conn)->ibc_refcount)) {		\
-		spin_lock_irqsave(&kiblnd_data.kib_connd_lock, flags);	\
-		list_add_tail(&(conn)->ibc_list,			\
-				  &kiblnd_data.kib_connd_zombies);	\
-		wake_up(&kiblnd_data.kib_connd_waitq);			\
-		spin_unlock_irqrestore(&kiblnd_data.kib_connd_lock, flags);\
-	}								\
-} while (0)
+static inline void kiblnd_conn_addref(struct kib_conn *conn)
+{
+#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
+	CDEBUG(D_NET, "conn[%p] (%d)++\n",
+	       conn, atomic_read(&conn->ibc_refcount));
+#endif
+	atomic_inc(&(conn)->ibc_refcount);
+}
+
+static inline void kiblnd_conn_decref(struct kib_conn *conn)
+{
+	unsigned long flags;
+
+#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
+	CDEBUG(D_NET, "conn[%p] (%d)--\n",
+	       conn, atomic_read(&conn->ibc_refcount));
+#endif
+	LASSERT_ATOMIC_POS(&conn->ibc_refcount);
+	if (atomic_dec_and_test(&conn->ibc_refcount)) {
+		spin_lock_irqsave(&kiblnd_data.kib_connd_lock, flags);
+		list_add_tail(&conn->ibc_list,
+			      &kiblnd_data.kib_connd_zombies);
+		wake_up(&kiblnd_data.kib_connd_waitq);
+		spin_unlock_irqrestore(&kiblnd_data.kib_connd_lock, flags);
+	}
+}
 
 void kiblnd_destroy_peer(struct kref *kref);
 
@@ -971,3 +975,33 @@ void kiblnd_pack_msg(struct lnet_ni *ni, struct kib_msg *msg, int version,
 int kiblnd_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg,
 		int delayed, struct iov_iter *to, unsigned int rlen);
 unsigned int kiblnd_get_dev_prio(struct lnet_ni *ni, unsigned int dev_idx);
+
+#define kiblnd_dump_conn_dbg(conn)			\
+({							\
+	if (conn && conn->ibc_cmid)			\
+		CDEBUG(D_NET,				\
+		       "conn %p state %d nposted %d/%d c/o/r %d/%d/%d ce %d : cm_id %p qp_num 0x%x device_name %s\n",    \
+		       conn,				\
+		       conn->ibc_state,			\
+		       conn->ibc_noops_posted,		\
+		       conn->ibc_nsends_posted,		\
+		       conn->ibc_credits,		\
+		       conn->ibc_outstanding_credits,	\
+		       conn->ibc_reserved_credits,	\
+		       conn->ibc_comms_error,		\
+		       conn->ibc_cmid,			\
+		       conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0,	\
+		       conn->ibc_cmid->qp ? (conn->ibc_cmid->qp->device ? dev_name(&conn->ibc_cmid->qp->device->dev) : "NULL") : "NULL");	\
+	else if (conn)                                  \
+		CDEBUG(D_NET,				\
+		       "conn %p state %d nposted %d/%d c/o/r %d/%d/%d ce %d : cm_id NULL\n",	\
+		       conn,				\
+		       conn->ibc_state,			\
+		       conn->ibc_noops_posted,		\
+		       conn->ibc_nsends_posted,		\
+		       conn->ibc_credits,		\
+		       conn->ibc_outstanding_credits,	\
+		       conn->ibc_reserved_credits,	\
+		       conn->ibc_comms_error		\
+		       );				\
+})
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b16841e..d4de326 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -337,9 +337,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
 
-	CDEBUG(D_NET, "Received %x[%d] from %s\n",
+	CDEBUG(D_NET, "Received %x[%d] nob %u cm_id %p qp_num 0x%x\n",
 	       msg->ibm_type, credits,
-	       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+	       msg->ibm_nob,
+	       conn->ibc_cmid,
+	       conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0);
+	kiblnd_dump_conn_dbg(conn);
 
 	if (credits) {
 		/* Have I received credits that will let me send? */
@@ -760,8 +763,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	}
 
 	if (credit && !conn->ibc_credits) {   /* no credits */
-		CDEBUG(D_NET, "%s: no credits\n",
-		       libcfs_nid2str(peer_ni->ibp_nid));
+		CDEBUG(D_NET, "%s: no credits cm_id %p qp_num 0x%x\n",
+		       libcfs_nid2str(peer_ni->ibp_nid),
+		       conn->ibc_cmid,
+		       conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0);
+		kiblnd_dump_conn_dbg(conn);
 		return -EAGAIN;
 	}
 
@@ -790,12 +796,22 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR;
 		kiblnd_tx_done(tx);
 		spin_lock(&conn->ibc_lock);
-		CDEBUG(D_NET, "%s(%d): redundant or enough NOOP\n",
+		CDEBUG(D_NET, "%s(%d): redundant or enough NOOP cm_id %p qp_num 0x%x\n",
 		       libcfs_nid2str(peer_ni->ibp_nid),
-		       conn->ibc_noops_posted);
+		       conn->ibc_noops_posted,
+		       conn->ibc_cmid,
+		       conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0);
+		kiblnd_dump_conn_dbg(conn);
 		return 0;
 	}
 
+	CDEBUG(D_NET, "Transmit %x[%d] nob %u cm_id %p qp_num 0x%x\n",
+	       msg->ibm_type, credit,
+	       msg->ibm_nob,
+	       conn->ibc_cmid,
+	       conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0);
+	kiblnd_dump_conn_dbg(conn);
+
 	kiblnd_pack_msg(peer_ni->ibp_ni, msg, ver,
 			conn->ibc_outstanding_credits,
 			peer_ni->ibp_nid, conn->ibc_incarnation);
@@ -1000,6 +1016,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_DROPPED;
 		tx->tx_waiting = 0;	/* don't wait for peer_ni */
 		tx->tx_status = -EIO;
+#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
+		kiblnd_dump_conn_dbg(conn);
+#endif
 	}
 
 	idle = !tx->tx_sending &&	/* This is the final callback */
@@ -1982,10 +2001,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	    list_empty(&conn->ibc_tx_queue_rsrvd) &&
 	    list_empty(&conn->ibc_tx_queue_nocred) &&
 	    list_empty(&conn->ibc_active_txs)) {
-		CDEBUG(D_NET, "closing conn to %s\n",
+		CDEBUG(D_NET, "closing conn %p to %s\n",
+		       conn,
 		       libcfs_nid2str(peer_ni->ibp_nid));
 	} else {
-		CNETERR("Closing conn to %s: error %d%s%s%s%s%s\n",
+		CNETERR("Closing conn %p to %s: error %d%s%s%s%s%s\n",
+			conn,
 			libcfs_nid2str(peer_ni->ibp_nid), error,
 			list_empty(&conn->ibc_tx_queue) ? "" : "(sending)",
 			list_empty(&conn->ibc_tx_noops) ? "" : "(sending_noops)",
@@ -2660,11 +2681,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	cp.retry_count = *kiblnd_tunables.kib_retry_count;
 	cp.rnr_retry_count = *kiblnd_tunables.kib_rnr_retry_count;
 
-	CDEBUG(D_NET, "Accept %s\n", libcfs_nid2str(nid));
+	CDEBUG(D_NET, "Accept %s conn %p\n", libcfs_nid2str(nid), conn);
 
 	rc = rdma_accept(cmid, &cp);
 	if (rc) {
-		CERROR("Can't accept %s: %d\n", libcfs_nid2str(nid), rc);
+		CNETERR("Can't accept %s: %d cm_id %p\n", libcfs_nid2str(nid), rc, cmid);
 		rej.ibr_version = version;
 		rej.ibr_why = IBLND_REJECT_FATAL;
 
@@ -3085,10 +3106,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	rc = rdma_connect(cmid, &cp);
 	if (rc) {
-		CERROR("Can't connect to %s: %d\n",
-		       libcfs_nid2str(peer_ni->ibp_nid), rc);
+		CNETERR("Can't connect to %s: %d cm_id %p\n",
+			libcfs_nid2str(peer_ni->ibp_nid), rc, cmid);
 		kiblnd_connreq_done(conn, rc);
 		kiblnd_conn_decref(conn);
+	} else {
+		CDEBUG(D_NET, "Connected to %s: cm_id %p\n",
+		       libcfs_nid2str(peer_ni->ibp_nid), cmid);
 	}
 
 	return 0;
@@ -3112,13 +3136,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		rc = kiblnd_passive_connect(cmid,
 					    (void *)KIBLND_CONN_PARAM(event),
 					    KIBLND_CONN_PARAM_LEN(event));
-		CDEBUG(D_NET, "connreq: %d\n", rc);
+		CDEBUG(D_NET, "connreq: %d cm_id %p\n", rc, cmid);
 		return rc;
 
 	case RDMA_CM_EVENT_ADDR_ERROR:
 		peer_ni = (struct kib_peer_ni *)cmid->context;
-		CNETERR("%s: ADDR ERROR %d\n",
-			libcfs_nid2str(peer_ni->ibp_nid), event->status);
+		CNETERR("%s: ADDR ERROR %d cm_id %p\n",
+			libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid);
 		kiblnd_peer_connect_failed(peer_ni, 1, -EHOSTUNREACH);
 		kiblnd_peer_decref(peer_ni);
 		return -EHOSTUNREACH;      /* rc destroys cmid */
@@ -3126,13 +3150,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case RDMA_CM_EVENT_ADDR_RESOLVED:
 		peer_ni = (struct kib_peer_ni *)cmid->context;
 
-		CDEBUG(D_NET, "%s Addr resolved: %d\n",
-		       libcfs_nid2str(peer_ni->ibp_nid), event->status);
+		CDEBUG(D_NET, "%s Addr resolved: %d cm_id %p\n",
+		       libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid);
 
 		if (event->status) {
-			CNETERR("Can't resolve address for %s: %d\n",
+			CNETERR("Can't resolve address for %s: %d cm_id %p\n",
 				libcfs_nid2str(peer_ni->ibp_nid),
-				event->status);
+				event->status, cmid);
 			rc = event->status;
 		} else {
 			rc = rdma_resolve_route(cmid,
@@ -3151,8 +3175,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			}
 
 			/* Can't initiate route resolution */
-			CERROR("Can't resolve route for %s: %d\n",
-			       libcfs_nid2str(peer_ni->ibp_nid), rc);
+			CNETERR("Can't resolve route for %s: %d cm_id %p\n",
+				libcfs_nid2str(peer_ni->ibp_nid), rc, cmid);
 		}
 		kiblnd_peer_connect_failed(peer_ni, 1, rc);
 		kiblnd_peer_decref(peer_ni);
@@ -3160,8 +3184,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	case RDMA_CM_EVENT_ROUTE_ERROR:
 		peer_ni = (struct kib_peer_ni *)cmid->context;
-		CNETERR("%s: ROUTE ERROR %d\n",
-			libcfs_nid2str(peer_ni->ibp_nid), event->status);
+		CNETERR("%s: ROUTE ERROR %d cm_id %p\n",
+			libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid);
 		kiblnd_peer_connect_failed(peer_ni, 1, -EHOSTUNREACH);
 		kiblnd_peer_decref(peer_ni);
 		return -EHOSTUNREACH;	/* rc destroys cmid */
@@ -3174,17 +3198,15 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (!event->status)
 			return kiblnd_active_connect(cmid);
 
-		CNETERR("Can't resolve route for %s: %d\n",
-			libcfs_nid2str(peer_ni->ibp_nid), event->status);
+		CNETERR("Can't resolve route for %s: %d cm_id %p\n",
+			libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid);
 		kiblnd_peer_connect_failed(peer_ni, 1, event->status);
 		kiblnd_peer_decref(peer_ni);
 		return event->status;	/* rc destroys cmid */
 
 	case RDMA_CM_EVENT_UNREACHABLE:
-		CNETERR("%s: UNREACHABLE %d, ibc_state: %d\n",
-			libcfs_nid2str(conn->ibc_peer->ibp_nid),
-			event->status,
-			conn->ibc_state);
+		CNETERR("%s: UNREACHABLE %d cm_id %p conn %p\n",
+			libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status, cmid, conn);
 		LASSERT(conn->ibc_state != IBLND_CONN_ESTABLISHED &&
 			conn->ibc_state != IBLND_CONN_INIT);
 		if (conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT ||
@@ -3198,8 +3220,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		conn = (struct kib_conn *)cmid->context;
 		LASSERT(conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT ||
 			conn->ibc_state == IBLND_CONN_PASSIVE_WAIT);
-		CNETERR("%s: CONNECT ERROR %d\n",
-			libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status);
+		CNETERR("%s: CONNECT ERROR %d cm_id %p conn %p\n",
+			libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status, cmid, conn);
 		kiblnd_connreq_done(conn, -ENOTCONN);
 		kiblnd_conn_decref(conn);
 		return 0;
@@ -3211,9 +3233,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			LBUG();
 
 		case IBLND_CONN_PASSIVE_WAIT:
-			CERROR("%s: REJECTED %d\n",
+			CERROR("%s: REJECTED %d cm_id %p\n",
 			       libcfs_nid2str(conn->ibc_peer->ibp_nid),
-			       event->status);
+			       event->status, cmid);
 			kiblnd_connreq_done(conn, -ECONNRESET);
 			break;
 
@@ -3233,14 +3255,14 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			LBUG();
 
 		case IBLND_CONN_PASSIVE_WAIT:
-			CDEBUG(D_NET, "ESTABLISHED (passive): %s\n",
-			       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+			CDEBUG(D_NET, "ESTABLISHED (passive): %s cm_id %p conn %p\n",
+			       libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn);
 			kiblnd_connreq_done(conn, 0);
 			break;
 
 		case IBLND_CONN_ACTIVE_CONNECT:
-			CDEBUG(D_NET, "ESTABLISHED(active): %s\n",
-			       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+			CDEBUG(D_NET, "ESTABLISHED(active): %s cm_id %p conn %p\n",
+			       libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn);
 			kiblnd_check_connreply(conn,
 					       (void *)KIBLND_CONN_PARAM(event),
 					       KIBLND_CONN_PARAM_LEN(event));
@@ -3255,8 +3277,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case RDMA_CM_EVENT_DISCONNECTED:
 		conn = (struct kib_conn *)cmid->context;
 		if (conn->ibc_state < IBLND_CONN_ESTABLISHED) {
-			CERROR("%s DISCONNECTED\n",
-			       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+			CERROR("%s DISCONNECTED cm_id %p conn %p\n",
+			       libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn);
 			kiblnd_connreq_done(conn, -ECONNRESET);
 		} else {
 			kiblnd_close_conn(conn, 0);
@@ -3372,6 +3394,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 				       conn->ibc_credits,
 				       conn->ibc_outstanding_credits,
 				       conn->ibc_reserved_credits);
+#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
+				kiblnd_dump_conn_dbg(conn);
+#endif
 				list_add(&conn->ibc_connd_list, &closes);
 			} else {
 				list_add(&conn->ibc_connd_list, &checksends);
@@ -3425,7 +3450,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	LASSERT(!in_interrupt());
 	LASSERT(current == kiblnd_data.kib_connd);
 	LASSERT(conn->ibc_state == IBLND_CONN_CLOSING);
-
+#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
+	kiblnd_dump_conn_dbg(conn);
+#endif
 	rdma_disconnect(conn->ibc_cmid);
 	kiblnd_finalise_conn(conn);
 
@@ -3716,6 +3743,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	    (conn->ibc_nrx > 0 ||
 	     conn->ibc_nsends_posted > 0)) {
 		kiblnd_conn_addref(conn); /* +1 ref for sched_conns */
+		kiblnd_dump_conn_dbg(conn);
 		conn->ibc_scheduled = 1;
 		list_add_tail(&conn->ibc_sched_list, &sched->ibs_conns);
 
@@ -3788,8 +3816,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 				rc = ib_req_notify_cq(conn->ibc_cq,
 						      IB_CQ_NEXT_COMP);
 				if (rc < 0) {
-					CWARN("%s: ib_req_notify_cq failed: %d, closing connection\n",
-					      libcfs_nid2str(conn->ibc_peer->ibp_nid), rc);
+					CWARN("%s: ib_req_notify_cq failed: %d, closing connection %p\n",
+					      libcfs_nid2str(conn->ibc_peer->ibp_nid),
+					      rc, conn);
 					kiblnd_close_conn(conn, -EIO);
 					kiblnd_conn_decref(conn);
 					spin_lock_irqsave(&sched->ibs_lock,
@@ -3810,9 +3839,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			}
 
 			if (rc < 0) {
-				CWARN("%s: ib_poll_cq failed: %d, closing connection\n",
+				CWARN("%s: ib_poll_cq failed: %d, closing connection %p\n",
 				      libcfs_nid2str(conn->ibc_peer->ibp_nid),
-				      rc);
+				      rc, conn);
 				kiblnd_close_conn(conn, -EIO);
 				kiblnd_conn_decref(conn);
 				spin_lock_irqsave(&sched->ibs_lock, flags);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs.
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (4 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs James Simmons
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

The LNet layer uses two different sets of ioctls. One ioctl set is
for Multi-Rail and the other is an older API. Both are in heavy
use and with the upcoming support for IPv6 we are looking at an
explosion of ioctls. The solution is to move the LNet layer to
Netlink which can easily handle all the differences between the
APIs. This also resolves a long standing issue of the user land
API constantly changing in a non-compatible way with previous
versions.

This patch unifies the handling the LNet NI to use Netlink and is
fully aware of the new large NID addressing.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10003
Lustre-commit: 8f8f6e2f36e56e53e ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48814
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/linux/lnet/lib-lnet.h          |   6 +-
 include/linux/lnet/lib-types.h         | 103 +++++
 include/uapi/linux/lnet/libcfs_ioctl.h |   2 +-
 include/uapi/linux/lnet/lnet-dlc.h     |  23 +
 include/uapi/linux/lnet/lnet-types.h   |  15 +
 net/lnet/klnds/o2iblnd/o2iblnd.c       |  88 +++-
 net/lnet/klnds/o2iblnd/o2iblnd.h       |  16 +
 net/lnet/klnds/socklnd/socklnd.c       |  37 +-
 net/lnet/klnds/socklnd/socklnd.h       |   9 +
 net/lnet/lnet/api-ni.c                 | 779 +++++++++++++++++++++++++++++++--
 net/lnet/lnet/config.c                 |   4 +-
 net/lnet/lnet/module.c                 |  42 +-
 12 files changed, 1054 insertions(+), 70 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index bd4acef..13ce2bf 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -457,6 +457,7 @@ struct lnet_ni *
 struct lnet_ni *
 lnet_ni_alloc_w_cpt_array(struct lnet_net *net, u32 *cpts, u32 ncpts,
 			  char *iface);
+int lnet_ni_add_interface(struct lnet_ni *ni, char *iface);
 
 static inline int
 lnet_nid2peerhash(struct lnet_nid *nid)
@@ -622,8 +623,9 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src,
 struct lnet_remotenet *lnet_find_rnet_locked(u32 net);
 int lnet_dyn_add_net(struct lnet_ioctl_config_data *conf);
 int lnet_dyn_del_net(u32 net);
-int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf);
-int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf);
+int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf, u32 net,
+		    struct lnet_ioctl_config_lnd_tunables *tun);
+int lnet_dyn_del_ni(struct lnet_nid *nid);
 int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
 struct lnet_net *lnet_get_net_locked(u32 net_id);
 void lnet_net_clr_pref_rtrs(struct lnet_net *net);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 499385b..2d3b044 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -335,6 +335,11 @@ struct lnet_lnd {
 	/* get dma_dev priority */
 	unsigned int (*lnd_get_dev_prio)(struct lnet_ni *ni,
 					 unsigned int dev_idx);
+
+	/* Handle LND specific Netlink handling */
+	int (*lnd_nl_set)(int cmd, struct nlattr *attr, int type, void *data);
+
+	const struct ln_key_list *lnd_keys;
 };
 
 /* FIXME !!!!! The abstract for GPU page support (PCI peer2peer)
@@ -464,6 +469,104 @@ struct lnet_net {
 	struct list_head	net_rtr_pref_nids;
 };
 
+/* Normally Netlink atttributes are defined in UAPI headers but Lustre is
+ * different in that the ABI is in a constant state of change unlike other
+ * Netlink interfaces. LNet sends a special header to help user land handle
+ * the differences.
+ */
+
+/** enum lnet_net_attrs		      - LNet NI netlink properties
+ *					attributes that describe LNet 'NI'
+ *					These values are used to piece together
+ *					messages for sending and receiving.
+ *
+ * @LNET_NET_ATTR_UNSPEC:		unspecified attribute to catch errors
+ *
+ * @LNET_NET_ATTR_HDR:			grouping for LNet net data (NLA_NESTED)
+ * @LNET_NET_ATTR_TYPE:			LNet net this NI belongs to (NLA_STRING)
+ * @LNET_NET_ATTR_LOCAL:		Local NI information (NLA_NESTED)
+ */
+enum lnet_net_attrs {
+	LNET_NET_ATTR_UNSPEC = 0,
+
+	LNET_NET_ATTR_HDR,
+	LNET_NET_ATTR_TYPE,
+	LNET_NET_ATTR_LOCAL,
+
+	__LNET_NET_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_ATTR_MAX (__LNET_NET_ATTR_MAX_PLUS_ONE - 1)
+
+/** enum lnet_net_local_ni_attrs      - LNet local NI netlink properties
+ *					attributes that describe local NI
+ *
+ * @LNET_NET_LOCAL_NI_ATTR_UNSPEC:	unspecified attribute to catch errors
+ *
+ * @LNET_NET_LOCAL_NI_ATTR_NID:		NID that represents this NI (NLA_STRING)
+ * @LNET_NET_LOCAL_NI_ATTR_STATUS:	State of this NI (NLA_STRING)
+ * @LNET_NET_LOCAL_NI_ATTR_INTERFACE:	Defines physical devices (NLA_NESTED)
+ *					Used to be many devices but no longer.
+ */
+enum lnet_net_local_ni_attrs {
+	LNET_NET_LOCAL_NI_ATTR_UNSPEC = 0,
+
+	LNET_NET_LOCAL_NI_ATTR_NID,
+	LNET_NET_LOCAL_NI_ATTR_STATUS,
+	LNET_NET_LOCAL_NI_ATTR_INTERFACE,
+
+	__LNET_NET_LOCAL_NI_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_LOCAL_NI_ATTR_MAX (__LNET_NET_LOCAL_NI_ATTR_MAX_PLUS_ONE - 1)
+
+/** enum lnet_net_local_ni_intf_attrs - LNet NI device netlink properties
+ *					attribute that reports the device
+ *					in use
+ *
+ * @LNET_NET_LOCAL_NI_INTF_ATTR_UNSPEC:	unspecified attribute to catch errors
+ *
+ * @LNET_NET_LOCAL_NI_INTF_ATTR_TYPE:	Physcial device interface (NLA_STRING)
+ */
+enum lnet_net_local_ni_intf_attrs {
+	LNET_NET_LOCAL_NI_INTF_ATTR_UNSPEC = 0,
+
+	LNET_NET_LOCAL_NI_INTF_ATTR_TYPE,
+
+	__LNET_NET_LOCAL_NI_INTF_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_LOCAL_NI_INTF_ATTR_MAX (__LNET_NET_LOCAL_NI_INTF_ATTR_MAX_PLUS_ONE - 1)
+
+/** enum lnet_net_local_ni_tunables_attrs	      - LNet NI tunables
+ *							netlink properties.
+ *							Performance options
+ *							for your NI.
+ *
+ * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC:		unspecified attribute
+ *							to catch errors
+ *
+ * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT:	Timeout for LNet peer.
+ *							(NLA_S32)
+ * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS:	Credits for LNet peer.
+ *							(NLA_S32)
+ * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS: Buffer credits for
+ *							 LNet peer. (NLA_S32)
+ * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS:		Credits for LNet peer
+ *							TX. (NLA_S32)
+ */
+enum lnet_net_local_ni_tunables_attr {
+	LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC = 0,
+
+	LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT,
+	LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS,
+	LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS,
+	LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS,
+	__LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX (__LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX_PLUS_ONE - 1)
+
 struct lnet_ni {
 	spinlock_t		ni_lock;
 	/* chain on the lnet_net structure */
diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index f2ae76c..89ac075 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -94,7 +94,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_MARK_DEBUG		_IOWR('e', 32, IOCTL_LIBCFS_TYPE)
 /* IOC_LIBCFS_MEMHOG obsolete in 2.8.0, was _IOWR('e', 36, IOCTL_LIBCFS_TYPE) */
 /* lnet ioctls */
-#define IOC_LIBCFS_GET_NI		_IOWR('e', 50, IOCTL_LIBCFS_TYPE)
+/* IOC_LIBCFS_GET_NI obsolete in 2.16, was _IOWR('e', 50, IOCTL_LIBCFS_TYPE) */
 #define IOC_LIBCFS_FAIL_NID		_IOWR('e', 51, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_NOTIFY_ROUTER	_IOWR('e', 55, IOCTL_LIBCFS_TYPE)
 #define IOC_LIBCFS_UNCONFIGURE		_IOWR('e', 56, IOCTL_LIBCFS_TYPE)
diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index 415968a..58697c1 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -49,6 +49,29 @@
 #define __user
 #endif
 
+#define LNET_GENL_NAME		"lnet"
+#define LNET_GENL_VERSION	0x05
+
+/* enum lnet_commands	      - Supported core LNet Netlink commands
+ *
+ *  @LNET_CMD_UNSPEC:		unspecified command to catch errors
+ *
+ *  @LNET_CMD_NETS:		command to manage the LNet networks
+ */
+enum lnet_commands {
+	LNET_CMD_UNSPEC		= 0,
+
+	LNET_CMD_CONFIGURE	= 1,
+	LNET_CMD_NETS		= 2,
+	LNET_CMD_PEERS		= 3,
+	LNET_CMD_ROUTES		= 4,
+	LNET_CMD_CONNS		= 5,
+
+	__LNET_CMD_MAX_PLUS_ONE
+};
+
+#define LNET_CMD_MAX (__LNET_CMD_MAX_PLUS_ONE - 1)
+
 /*
  * To allow for future enhancements to extend the tunables
  * add a hdr to this structure, so that the version can be set
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 5a2ea45..304add9 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -37,8 +37,12 @@
 #include <linux/types.h>
 #include <linux/lnet/lnet-idl.h>
 
+#include <linux/types.h>
 #include <linux/string.h>
 #include <asm/byteorder.h>
+#ifndef __KERNEL__
+#include <stdbool.h>
+#endif
 
 /** \addtogroup lnet
  * @{
@@ -111,6 +115,17 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
 
 #define LNET_NET_ANY LNET_NIDNET(LNET_NID_ANY)
 
+/* check for address set */
+static inline bool nid_addr_is_set(const struct lnet_nid *nid)
+{
+	int sum = 0, i;
+
+	for (i = 0; i < NID_ADDR_BYTES(nid); i++)
+		sum |= nid->nid_addr[i];
+
+	return sum ? true : false;
+}
+
 static inline int nid_is_nid4(const struct lnet_nid *nid)
 {
 	return NID_ADDR_BYTES(nid) == 4;
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 94ff926..cbb3445 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -491,6 +491,86 @@ void kiblnd_unlink_peer_locked(struct kib_peer_ni *peer_ni)
 	spin_unlock(&conn->ibc_lock);
 }
 
+static const struct ln_key_list kiblnd_tunables_keys = {
+	.lkl_maxattr                    = LNET_NET_O2IBLND_TUNABLES_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS]  = {
+			.lkp_value	= "peercredits_hiw",
+			.lkp_data_type	= NLA_U32
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND]  = {
+			.lkp_value	= "map_on_demand",
+			.lkp_data_type	= NLA_FLAG
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS]  = {
+			.lkp_value	= "concurrent_sends",
+			.lkp_data_type	= NLA_U32
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE]  = {
+			.lkp_value	= "fmr_pool_size",
+			.lkp_data_type	= NLA_U32
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER]  = {
+			.lkp_value	= "fmr_flush_trigger",
+			.lkp_data_type	= NLA_U32
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE]  = {
+			.lkp_value	= "fmr_cache",
+			.lkp_data_type	= NLA_U32
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_NTX]  = {
+			.lkp_value	= "ntx",
+			.lkp_data_type	= NLA_U16
+		},
+		[LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER]  = {
+			.lkp_value	= "conns_per_peer",
+			.lkp_data_type	= NLA_U16
+		},
+	},
+};
+
+static int
+kiblnd_nl_set(int cmd, struct nlattr *attr, int type, void *data)
+{
+	struct lnet_lnd_tunables *tunables = data;
+
+	if (cmd != LNET_CMD_NETS)
+		return -EOPNOTSUPP;
+
+	if (nla_type(attr) != LN_SCALAR_ATTR_INT_VALUE)
+		return -EINVAL;
+
+	switch (type) {
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_peercredits_hiw = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_map_on_demand = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_concurrent_sends = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_pool_size = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_flush_trigger = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_cache = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_NTX:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_ntx = nla_get_s64(attr);
+		break;
+	case LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER:
+		tunables->lnd_tun_u.lnd_o2ib.lnd_conns_per_peer = nla_get_s64(attr);
+	default:
+		break;
+	}
+
+	return 0;
+}
+
 static void
 kiblnd_dump_peer_debug_info(struct kib_peer_ni *peer_ni)
 {
@@ -3173,7 +3253,11 @@ static int kiblnd_startup(struct lnet_ni *ni)
 
 	net->ibn_dev = ibdev;
 	ni->ni_nid.nid_addr[0] = cpu_to_be32(ibdev->ibd_ifip);
-
+	if (!ni->ni_interface) {
+		rc = lnet_ni_add_interface(ni, ifaces[i].li_name);
+		if (rc < 0)
+			CWARN("ko2iblnd failed to allocate ni_interface\n");
+	}
 	ni->ni_dev_cpt = ifaces[i].li_cpt;
 
 	rc = kiblnd_dev_start_threads(ibdev, newdev, ni->ni_cpts, ni->ni_ncpts);
@@ -3220,6 +3304,8 @@ static int kiblnd_startup(struct lnet_ni *ni)
 	.lnd_send	= kiblnd_send,
 	.lnd_recv	= kiblnd_recv,
 	.lnd_get_dev_prio = kiblnd_get_dev_prio,
+	.lnd_nl_set	= kiblnd_nl_set,
+	.lnd_keys	= &kiblnd_tunables_keys,
 };
 
 static void ko2inlnd_assert_wire_constants(void)
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index bef7a55..e3c069b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -65,6 +65,22 @@
 #include <linux/lnet/lib-lnet.h>
 #include "o2iblnd-idl.h"
 
+enum kiblnd_ni_lnd_tunables_attr {
+	LNET_NET_O2IBLND_TUNABLES_ATTR_UNSPEC = 0,
+
+	LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_NTX,
+	LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER,
+	__LNET_NET_O2IBLND_TUNABLES_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_O2IBLND_TUNABLES_ATTR_MAX (__LNET_NET_O2IBLND_TUNABLES_ATTR_MAX_PLUS_ONE - 1)
+
 #define IBLND_PEER_HASH_BITS		7	/* log2 of # peer_ni lists */
 
 #define IBLND_N_SCHED			2
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index e8f8020..21fccfa 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -840,6 +840,33 @@ struct ksock_peer_ni *
 	return 0;
 }
 
+static const struct ln_key_list ksocknal_tunables_keys = {
+	.lkl_maxattr			= LNET_NET_SOCKLND_TUNABLES_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER]  = {
+			.lkp_value	= "conns_per_peer",
+			.lkp_data_type	= NLA_S32
+		},
+	},
+};
+
+static int
+ksocknal_nl_set(int cmd, struct nlattr *attr, int type, void *data)
+{
+	struct lnet_lnd_tunables *tunables = data;
+
+	if (cmd != LNET_CMD_NETS)
+		return -EOPNOTSUPP;
+
+	if (type != LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER ||
+	    nla_type(attr) != LN_SCALAR_ATTR_INT_VALUE)
+		return -EINVAL;
+
+	tunables->lnd_tun_u.lnd_sock.lnd_conns_per_peer = nla_get_s64(attr);
+
+	return 0;
+}
+
 static int
 ksocknal_connecting(struct ksock_conn_cb *conn_cb, struct sockaddr *sa)
 {
@@ -2520,16 +2547,20 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 	ksi = &net->ksnn_interface;
 	/* Use the first discovered interface or look in the list */
 	if (ni->ni_interface) {
-		for (i = 0; i < rc; i++)
+		for (i = 0; i < rc; i++) {
 			if (strcmp(ifaces[i].li_name, ni->ni_interface) == 0)
 				break;
-
+		}
 		/* ni_interface doesn't contain the interface we want */
 		if (i == rc) {
 			CERROR("ksocklnd: failed to find interface %s\n",
 			       ni->ni_interface);
 			goto fail_1;
 		}
+	} else {
+		rc = lnet_ni_add_interface(ni, ifaces[i].li_name);
+		if (rc < 0)
+			CWARN("ksocklnd failed to allocate ni_interface\n");
 	}
 
 	ni->ni_dev_cpt = ifaces[i].li_cpt;
@@ -2590,6 +2621,8 @@ static void __exit ksocklnd_exit(void)
 	.lnd_recv		= ksocknal_recv,
 	.lnd_notify_peer_down	= ksocknal_notify_gw_down,
 	.lnd_accept		= ksocknal_accept,
+	.lnd_nl_set		= ksocknal_nl_set,
+	.lnd_keys		= &ksocknal_tunables_keys,
 };
 
 static int __init ksocklnd_init(void)
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index bb68a3d..50892b1 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -74,6 +74,15 @@
 # define SOCKNAL_RISK_KMAP_DEADLOCK	1
 #endif
 
+enum ksocklnd_ni_lnd_tunables_attr {
+	LNET_NET_SOCKLND_TUNABLES_ATTR_UNSPEC = 0,
+
+	LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER,
+	__LNET_NET_SOCKLND_TUNABLES_ATTR_MAX_PLUS_ONE,
+};
+
+#define LNET_NET_SOCKLND_TUNABLES_ATTR_MAX (__LNET_NET_SOCKLND_TUNABLES_ATTR_MAX_PLUS_ONE - 1)
+
 /* per scheduler state */
 struct ksock_sched {				/* per scheduler state */
 	spinlock_t		kss_lock;	/* serialise */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 9459fc0..af875ba 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -34,6 +34,8 @@
 #include <linux/log2.h>
 #include <linux/ktime.h>
 #include <linux/moduleparam.h>
+#include <linux/sched/signal.h>
+#include <net/genetlink.h>
 
 #include <linux/lnet/udsp.h>
 #include <linux/lnet/lib-lnet.h>
@@ -2498,6 +2500,36 @@ static void lnet_push_target_fini(void)
 	return rc;
 }
 
+static struct lnet_lnd *lnet_load_lnd(u32 lnd_type)
+{
+	struct lnet_lnd *lnd;
+	int rc = 0;
+
+	mutex_lock(&the_lnet.ln_lnd_mutex);
+	lnd = lnet_find_lnd_by_type(lnd_type);
+	if (!lnd) {
+		mutex_unlock(&the_lnet.ln_lnd_mutex);
+		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
+		mutex_lock(&the_lnet.ln_lnd_mutex);
+
+		lnd = lnet_find_lnd_by_type(lnd_type);
+		if (!lnd) {
+			mutex_unlock(&the_lnet.ln_lnd_mutex);
+			CERROR("Can't load LND %s, module %s, rc=%d\n",
+			       libcfs_lnd2str(lnd_type),
+			       libcfs_lnd2modname(lnd_type), rc);
+#ifndef HAVE_MODULE_LOADING_SUPPORT
+			LCONSOLE_ERROR_MSG(0x104,
+					   "Your kernel must be compiled with kernel module loading support.");
+#endif
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	mutex_unlock(&the_lnet.ln_lnd_mutex);
+
+	return lnd;
+}
+
 static int
 lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 {
@@ -2525,27 +2557,14 @@ static void lnet_push_target_fini(void)
 	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets, &net_l)) {
 		lnd_type = LNET_NETTYP(net->net_id);
 
-		mutex_lock(&the_lnet.ln_lnd_mutex);
-		lnd = lnet_find_lnd_by_type(lnd_type);
-
-		if (!lnd) {
-			mutex_unlock(&the_lnet.ln_lnd_mutex);
-			rc = request_module("%s", libcfs_lnd2modname(lnd_type));
-			mutex_lock(&the_lnet.ln_lnd_mutex);
-
-			lnd = lnet_find_lnd_by_type(lnd_type);
-			if (!lnd) {
-				mutex_unlock(&the_lnet.ln_lnd_mutex);
-				CERROR("Can't load LND %s, module %s, rc=%d\n",
-				libcfs_lnd2str(lnd_type),
-				libcfs_lnd2modname(lnd_type), rc);
-				rc = -EINVAL;
-				goto failed0;
-			}
+		lnd = lnet_load_lnd(lnd_type);
+		if (IS_ERR(lnd)) {
+			rc = PTR_ERR(lnd);
+			goto failed0;
 		}
 
+		mutex_lock(&the_lnet.ln_lnd_mutex);
 		net->net_lnd = lnd;
-
 		mutex_unlock(&the_lnet.ln_lnd_mutex);
 
 		net_l = net;
@@ -2766,6 +2785,8 @@ int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(lnet_genl_send_scalar_list);
 
+static struct genl_family lnet_family;
+
 /**
  * Initialize LNet library.
  *
@@ -2803,6 +2824,13 @@ int lnet_lib_init(void)
 		return rc;
 	}
 
+	rc = genl_register_family(&lnet_family);
+	if (rc != 0) {
+		lnet_destroy_locks();
+		CERROR("Can't register LNet netlink family: %d\n", rc);
+		return rc;
+	}
+
 	the_lnet.ln_refcount = 0;
 	INIT_LIST_HEAD(&the_lnet.ln_net_zombie);
 	INIT_LIST_HEAD(&the_lnet.ln_msg_resend);
@@ -2846,6 +2874,7 @@ void lnet_lib_exit(void)
 	for (i = 0; i < NUM_LNDS; i++)
 		LASSERT(!the_lnet.ln_lnds[i]);
 	lnet_destroy_locks();
+	genl_unregister_family(&lnet_family);
 }
 
 /**
@@ -3525,31 +3554,24 @@ static int lnet_handle_legacy_ip2nets(char *ip2nets,
 	return rc;
 }
 
-int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf)
+int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf, u32 net_id,
+		    struct lnet_ioctl_config_lnd_tunables *tun)
 {
 	struct lnet_net *net;
 	struct lnet_ni *ni;
-	struct lnet_ioctl_config_lnd_tunables *tun = NULL;
 	int rc, i;
-	u32 net_id, lnd_type;
-
-	/* get the tunables if they are available */
-	if (conf->lic_cfg_hdr.ioc_len >=
-	    sizeof(*conf) + sizeof(*tun))
-		tun = (struct lnet_ioctl_config_lnd_tunables *)
-			conf->lic_bulk;
+	u32 lnd_type;
 
 	/* handle legacy ip2nets from DLC */
 	if (conf->lic_legacy_ip2nets[0] != '\0')
 		return lnet_handle_legacy_ip2nets(conf->lic_legacy_ip2nets,
 						  tun);
 
-	net_id = LNET_NIDNET(conf->lic_nid);
 	lnd_type = LNET_NETTYP(net_id);
 
 	if (!libcfs_isknown_lnd(lnd_type)) {
 		CERROR("No valid net and lnd information provided\n");
-		return -EINVAL;
+		return -ENOENT;
 	}
 
 	net = lnet_net_alloc(net_id, NULL);
@@ -3559,7 +3581,7 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf)
 	for (i = 0; i < conf->lic_ncpts; i++) {
 		if (conf->lic_cpts[i] >= LNET_CPT_NUMBER) {
 			lnet_net_free(net);
-			return -EINVAL;
+			return -ERANGE;
 		}
 	}
 
@@ -3588,16 +3610,15 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf)
 	return rc;
 }
 
-int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf)
+int lnet_dyn_del_ni(struct lnet_nid *nid)
 {
 	struct lnet_net *net;
 	struct lnet_ni *ni;
-	u32 net_id = LNET_NIDNET(conf->lic_nid);
+	u32 net_id = LNET_NID_NET(nid);
 	struct lnet_ping_buffer *pbuf;
 	struct lnet_handle_md ping_mdh;
 	int net_bytes, rc;
 	bool net_empty;
-	u32 addr;
 
 	/* don't allow userspace to shutdown the LOLND */
 	if (LNET_NETTYP(net_id) == LOLND)
@@ -3619,8 +3640,7 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf)
 		goto unlock_net;
 	}
 
-	addr = LNET_NIDADDR(conf->lic_nid);
-	if (addr == 0) {
+	if (!nid_addr_is_set(nid)) {
 		/* remove the entire net */
 		net_bytes = lnet_get_net_ni_bytes_locked(net);
 
@@ -3642,10 +3662,9 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf)
 		goto unlock_api_mutex;
 	}
 
-	ni = lnet_nid2ni_locked(conf->lic_nid, 0);
+	ni = lnet_nid_to_ni_locked(nid, 0);
 	if (!ni) {
-		CERROR("nid %s not found\n",
-		       libcfs_nid2str(conf->lic_nid));
+		CERROR("nid %s not found\n", libcfs_nidstr(nid));
 		rc = -ENOENT;
 		goto unlock_net;
 	}
@@ -3952,8 +3971,6 @@ u32 lnet_get_dlc_seq_locked(void)
 {
 	struct libcfs_ioctl_data *data = arg;
 	struct lnet_ioctl_config_data *config;
-	struct lnet_process_id id4 = {};
-	struct lnet_processid id = {};
 	struct lnet_ni *ni;
 	struct lnet_nid nid;
 	int rc;
@@ -3963,11 +3980,6 @@ u32 lnet_get_dlc_seq_locked(void)
 		     sizeof(struct lnet_ioctl_config_data));
 
 	switch (cmd) {
-	case IOC_LIBCFS_GET_NI:
-		rc = LNetGetId(data->ioc_count, &id);
-		data->ioc_nid = lnet_nid_to_nid4(&id.nid);
-		return rc;
-
 	case IOC_LIBCFS_FAIL_NID:
 		return lnet_fail_nid(data->ioc_nid, data->ioc_count);
 
@@ -4351,6 +4363,7 @@ u32 lnet_get_dlc_seq_locked(void)
 		return lnet_fault_ctl(data->ioc_flags, data);
 
 	case IOC_LIBCFS_PING: {
+		struct lnet_process_id id4;
 		signed long timeout;
 
 		id4.nid = data->ioc_nid;
@@ -4561,6 +4574,682 @@ u32 lnet_get_dlc_seq_locked(void)
 }
 EXPORT_SYMBOL(LNetCtl);
 
+static const struct ln_key_list net_props_list = {
+	.lkl_maxattr			= LNET_NET_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_NET_ATTR_HDR]		= {
+			.lkp_value		= "net",
+			.lkp_key_format		= LNKF_SEQUENCE | LNKF_MAPPING,
+			.lkp_data_type		= NLA_NUL_STRING,
+		},
+		[LNET_NET_ATTR_TYPE]		= {
+			.lkp_value		= "net type",
+			.lkp_data_type		= NLA_STRING
+		},
+		[LNET_NET_ATTR_LOCAL]           = {
+			.lkp_value		= "local NI(s)",
+			.lkp_key_format		= LNKF_SEQUENCE | LNKF_MAPPING,
+			.lkp_data_type		= NLA_NESTED
+		},
+	},
+};
+
+static struct ln_key_list local_ni_list = {
+	.lkl_maxattr			= LNET_NET_LOCAL_NI_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_NET_LOCAL_NI_ATTR_NID]	= {
+			.lkp_value		= "nid",
+			.lkp_data_type		= NLA_STRING
+		},
+		[LNET_NET_LOCAL_NI_ATTR_STATUS] = {
+			.lkp_value		= "status",
+			.lkp_data_type		= NLA_STRING
+		},
+		[LNET_NET_LOCAL_NI_ATTR_INTERFACE] = {
+			.lkp_value		= "interfaces",
+			.lkp_key_format		= LNKF_MAPPING,
+			.lkp_data_type		= NLA_NESTED
+		},
+	},
+};
+
+static const struct ln_key_list local_ni_interfaces_list = {
+	.lkl_maxattr			= LNET_NET_LOCAL_NI_INTF_ATTR_MAX,
+	.lkl_list			= {
+		[LNET_NET_LOCAL_NI_INTF_ATTR_TYPE] = {
+			.lkp_value	= "0",
+			.lkp_data_type	= NLA_STRING
+		},
+	},
+};
+
+/* Use an index since the traversal is across LNet nets and ni collections */
+struct lnet_genl_net_list {
+	unsigned int	lngl_net_id;
+	unsigned int	lngl_idx;
+};
+
+static inline struct lnet_genl_net_list *
+lnet_net_dump_ctx(struct netlink_callback *cb)
+{
+	return (struct lnet_genl_net_list *)cb->args[0];
+}
+
+static int lnet_net_show_done(struct netlink_callback *cb)
+{
+	struct lnet_genl_net_list *nlist = lnet_net_dump_ctx(cb);
+
+	kfree(nlist);
+	cb->args[0] = 0;
+
+	return 0;
+}
+
+/* LNet net ->start() handler for GET requests */
+static int lnet_net_show_start(struct netlink_callback *cb)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(cb->nlh);
+	struct netlink_ext_ack *extack = cb->extack;
+	struct lnet_genl_net_list *nlist;
+	int msg_len = genlmsg_len(gnlh);
+	struct nlattr *params, *top;
+	int rem, rc = 0;
+
+	if (the_lnet.ln_refcount == 0) {
+		NL_SET_ERR_MSG(extack, "LNet stack down");
+		return -ENETDOWN;
+	}
+
+	nlist = kmalloc(sizeof(*nlist), GFP_KERNEL);
+	if (!nlist)
+		return -ENOMEM;
+
+	nlist->lngl_net_id = LNET_NET_ANY;
+	nlist->lngl_idx = 0;
+	cb->args[0] = (long)nlist;
+
+	if (!msg_len)
+		return 0;
+
+	params = genlmsg_data(gnlh);
+	nla_for_each_attr(top, params, msg_len, rem) {
+		struct nlattr *net;
+		int rem2;
+
+		nla_for_each_nested(net, top, rem2) {
+			char filter[LNET_NIDSTR_SIZE];
+
+			if (nla_type(net) != LN_SCALAR_ATTR_VALUE ||
+			    nla_strcmp(net, "name") != 0)
+				continue;
+
+			net = nla_next(net, &rem2);
+			if (nla_type(net) != LN_SCALAR_ATTR_VALUE) {
+				NL_SET_ERR_MSG(extack, "invalid config param");
+				rc = -EINVAL;
+				goto report_err;
+			}
+
+			rc = nla_strlcpy(filter, net, sizeof(filter));
+			if (rc < 0) {
+				NL_SET_ERR_MSG(extack, "failed to get param");
+				goto report_err;
+			}
+			rc = 0;
+
+			nlist->lngl_net_id = libcfs_str2net(filter);
+			if (nlist->lngl_net_id == LNET_NET_ANY) {
+				NL_SET_ERR_MSG(extack, "cannot parse net");
+				rc = -ENOENT;
+				goto report_err;
+			}
+		}
+	}
+report_err:
+	if (rc < 0)
+		lnet_net_show_done(cb);
+
+	return rc;
+}
+
+static int lnet_net_show_dump(struct sk_buff *msg,
+			      struct netlink_callback *cb)
+{
+	struct lnet_genl_net_list *nlist = lnet_net_dump_ctx(cb);
+	struct netlink_ext_ack *extack = cb->extack;
+	int portid = NETLINK_CB(cb->skb).portid;
+	int seq = cb->nlh->nlmsg_seq;
+	struct lnet_net *net;
+	int idx = 0, rc = 0;
+	bool found = false;
+	void *hdr = NULL;
+
+	if (!nlist->lngl_idx) {
+		const struct ln_key_list *all[] = {
+			&net_props_list, &local_ni_list,
+			&local_ni_interfaces_list,
+			NULL
+		};
+
+		rc = lnet_genl_send_scalar_list(msg, portid, seq,
+						&lnet_family,
+						NLM_F_CREATE | NLM_F_MULTI,
+						LNET_CMD_NETS, all);
+		if (rc < 0) {
+			NL_SET_ERR_MSG(extack, "failed to send key table");
+			goto send_error;
+		}
+	}
+
+	lnet_net_lock(LNET_LOCK_EX);
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		struct lnet_ni *ni;
+
+		if (nlist->lngl_net_id != LNET_NET_ANY &&
+		    nlist->lngl_net_id != net->net_id)
+			continue;
+
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			struct nlattr *local_ni, *ni_attr;
+			char *status = "up";
+
+			if (idx++ < nlist->lngl_idx)
+				continue;
+
+			hdr = genlmsg_put(msg, portid, seq, &lnet_family,
+					  NLM_F_MULTI, LNET_CMD_NETS);
+			if (!hdr) {
+				NL_SET_ERR_MSG(extack, "failed to send values");
+				rc = -EMSGSIZE;
+				goto net_unlock;
+			}
+
+			if (idx == 1)
+				nla_put_string(msg, LNET_NET_ATTR_HDR, "");
+
+			nla_put_string(msg, LNET_NET_ATTR_TYPE,
+				       libcfs_net2str(net->net_id));
+			found = true;
+
+			local_ni = nla_nest_start(msg, LNET_NET_ATTR_LOCAL);
+			ni_attr = nla_nest_start(msg, idx - 1);
+
+			lnet_ni_lock(ni);
+			nla_put_string(msg, LNET_NET_LOCAL_NI_ATTR_NID,
+				       libcfs_nidstr(&ni->ni_nid));
+			if (nid_is_lo0(&ni->ni_nid) &&
+			    *ni->ni_status != LNET_NI_STATUS_UP)
+				status = "down";
+			nla_put_string(msg, LNET_NET_LOCAL_NI_ATTR_STATUS, "up");
+
+			if (!nid_is_lo0(&ni->ni_nid) && ni->ni_interface) {
+				struct nlattr *intf_nest, *intf_attr;
+
+				intf_nest = nla_nest_start(msg,
+							   LNET_NET_LOCAL_NI_ATTR_INTERFACE);
+				intf_attr = nla_nest_start(msg, 0);
+				nla_put_string(msg,
+					       LNET_NET_LOCAL_NI_INTF_ATTR_TYPE,
+					       ni->ni_interface);
+				nla_nest_end(msg, intf_attr);
+				nla_nest_end(msg, intf_nest);
+			}
+
+			lnet_ni_unlock(ni);
+			nla_nest_end(msg, ni_attr);
+			nla_nest_end(msg, local_ni);
+
+			genlmsg_end(msg, hdr);
+		}
+	}
+
+	if (!found) {
+		struct nlmsghdr *nlh = nlmsg_hdr(msg);
+
+		nlmsg_cancel(msg, nlh);
+		NL_SET_ERR_MSG(extack, "Network is down");
+		rc = -ESRCH;
+	}
+net_unlock:
+	lnet_net_unlock(LNET_LOCK_EX);
+send_error:
+	nlist->lngl_idx = idx;
+
+	return rc;
+}
+
+static int lnet_genl_parse_tunables(struct nlattr *settings,
+				    struct lnet_ioctl_config_lnd_tunables *tun)
+{
+	struct nlattr *param;
+	int rem, rc = 0;
+
+	nla_for_each_nested(param, settings, rem) {
+		int type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC;
+		s64 num;
+
+		if (nla_type(param) != LN_SCALAR_ATTR_VALUE)
+			continue;
+
+		if (nla_strcmp(param, "peer_timeout") == 0)
+			type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT;
+		else if (nla_strcmp(param, "peer_credits") == 0)
+			type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS;
+		else if (nla_strcmp(param, "peer_buffer_credits") == 0)
+			type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS;
+		else if (nla_strcmp(param, "credits") == 0)
+			type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS;
+
+		param = nla_next(param, &rem);
+		if (nla_type(param) != LN_SCALAR_ATTR_INT_VALUE)
+			return -EINVAL;
+
+		num = nla_get_s64(param);
+		switch (type) {
+		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT:
+			tun->lt_cmn.lct_peer_timeout = num;
+			break;
+		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS:
+			tun->lt_cmn.lct_peer_tx_credits = num;
+			break;
+		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS:
+			tun->lt_cmn.lct_peer_rtr_credits = num;
+			break;
+		case LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS:
+			tun->lt_cmn.lct_max_tx_credits = num;
+			break;
+		default:
+			rc = -EINVAL;
+			break;
+		}
+	}
+	return rc;
+}
+
+static int
+lnet_genl_parse_lnd_tunables(struct nlattr *settings,
+			     struct lnet_ioctl_config_lnd_tunables *tun,
+			     const struct lnet_lnd *lnd)
+{
+	const struct ln_key_list *list = lnd->lnd_keys;
+	struct nlattr *param;
+	int rem, rc = 0;
+	int i = 1;
+
+	if (!list)
+		return 0;
+
+	if (!lnd->lnd_nl_set)
+		return -EOPNOTSUPP;
+
+	if (!list->lkl_maxattr)
+		return -ERANGE;
+
+	nla_for_each_nested(param, settings, rem) {
+		if (nla_type(param) != LN_SCALAR_ATTR_VALUE)
+			continue;
+
+		for (i = 1; i <= list->lkl_maxattr; i++) {
+			if (!list->lkl_list[i].lkp_value ||
+			    nla_strcmp(param, list->lkl_list[i].lkp_value) != 0)
+				continue;
+
+			param = nla_next(param, &rem);
+			rc = lnd->lnd_nl_set(LNET_CMD_NETS, param, i, tun);
+			if (rc < 0)
+				return rc;
+		}
+	}
+
+	return rc;
+}
+
+static int
+lnet_genl_parse_local_ni(struct nlattr *entry, struct genl_info *info,
+			 int net_id, struct lnet_ioctl_config_ni *conf,
+			 struct lnet_ioctl_config_lnd_tunables *tun,
+			 bool *ni_list)
+{
+	struct nlattr *settings;
+	int rem3, rc = 0;
+
+	nla_for_each_nested(settings, entry, rem3) {
+		if (nla_type(settings) != LN_SCALAR_ATTR_VALUE)
+			continue;
+
+		if (nla_strcmp(settings, "interfaces") == 0) {
+			struct nlattr *intf;
+			int rem4;
+
+			settings = nla_next(settings, &rem3);
+			if (nla_type(settings) !=
+			    LN_SCALAR_ATTR_LIST) {
+				GENL_SET_ERR_MSG(info,
+						 "invalid interfaces");
+				rc = -EINVAL;
+				goto out;
+			}
+
+			nla_for_each_nested(intf, settings, rem4) {
+				intf = nla_next(intf, &rem4);
+				if (nla_type(intf) !=
+				    LN_SCALAR_ATTR_VALUE) {
+					GENL_SET_ERR_MSG(info,
+							 "0 key is invalid");
+					rc = -EINVAL;
+					goto out;
+				}
+
+				rc = nla_strlcpy(conf->lic_ni_intf, intf,
+						 sizeof(conf->lic_ni_intf));
+				if (rc < 0) {
+					GENL_SET_ERR_MSG(info,
+							 "failed to parse interfaces");
+					goto out;
+				}
+			}
+			*ni_list = true;
+		} else if (nla_strcmp(settings, "tunables") == 0) {
+			settings = nla_next(settings, &rem3);
+			if (nla_type(settings) !=
+			    LN_SCALAR_ATTR_LIST) {
+				GENL_SET_ERR_MSG(info,
+						 "invalid tunables");
+				rc = -EINVAL;
+				goto out;
+			}
+
+			rc = lnet_genl_parse_tunables(settings, tun);
+			if (rc < 0) {
+				GENL_SET_ERR_MSG(info,
+						 "failed to parse tunables");
+				goto out;
+			}
+		} else if ((nla_strcmp(settings, "lnd tunables") == 0)) {
+			const struct lnet_lnd *lnd;
+
+			lnd = lnet_load_lnd(LNET_NETTYP(net_id));
+			if (IS_ERR(lnd)) {
+				GENL_SET_ERR_MSG(info,
+						 "LND type not supported");
+				rc = PTR_ERR(lnd);
+				goto out;
+			}
+
+			settings = nla_next(settings, &rem3);
+			if (nla_type(settings) !=
+			    LN_SCALAR_ATTR_LIST) {
+				GENL_SET_ERR_MSG(info,
+						 "lnd tunables should be list\n");
+				rc = -EINVAL;
+				goto out;
+			}
+
+			rc = lnet_genl_parse_lnd_tunables(settings,
+							  tun, lnd);
+			if (rc < 0) {
+				GENL_SET_ERR_MSG(info,
+						 "failed to parse lnd tunables");
+				goto out;
+			}
+		} else if (nla_strcmp(settings, "CPT") == 0) {
+			struct nlattr *cpt;
+			int rem4;
+
+			settings = nla_next(settings, &rem3);
+			if (nla_type(settings) != LN_SCALAR_ATTR_LIST) {
+				GENL_SET_ERR_MSG(info,
+						 "CPT should be list");
+				rc = -EINVAL;
+				goto out;
+			}
+
+			nla_for_each_nested(cpt, settings, rem4) {
+				s64 core;
+
+				if (nla_type(cpt) !=
+				    LN_SCALAR_ATTR_INT_VALUE) {
+					GENL_SET_ERR_MSG(info,
+							 "invalid CPT config");
+					rc = -EINVAL;
+					goto out;
+				}
+
+				core = nla_get_s64(cpt);
+				if (core >= LNET_CPT_NUMBER) {
+					GENL_SET_ERR_MSG(info,
+							 "invalid CPT value");
+					rc = -ERANGE;
+					goto out;
+				}
+
+				conf->lic_cpts[conf->lic_ncpts] = core;
+				conf->lic_ncpts++;
+			}
+		}
+	}
+out:
+	return rc;
+}
+
+static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlmsghdr *nlh = nlmsg_hdr(skb);
+	struct genlmsghdr *gnlh = nlmsg_data(nlh);
+	struct nlattr *params = genlmsg_data(gnlh);
+	int msg_len, rem, rc = 0;
+	struct nlattr *attr;
+
+	msg_len = genlmsg_len(gnlh);
+	if (!msg_len) {
+		GENL_SET_ERR_MSG(info, "no configuration");
+		return -ENOMSG;
+	}
+
+	nla_for_each_attr(attr, params, msg_len, rem) {
+		struct lnet_ioctl_config_ni conf;
+		u32 net_id = LNET_NET_ANY;
+		struct nlattr *entry;
+		bool ni_list = false;
+		int rem2;
+
+		if (nla_type(attr) != LN_SCALAR_ATTR_LIST)
+			continue;
+
+		nla_for_each_nested(entry, attr, rem2) {
+			switch (nla_type(entry)) {
+			case LN_SCALAR_ATTR_VALUE: {
+				ssize_t len;
+
+				memset(&conf, 0, sizeof(conf));
+				if (nla_strcmp(entry, "ip2net") == 0) {
+					entry = nla_next(entry, &rem2);
+					if (nla_type(entry) !=
+					    LN_SCALAR_ATTR_VALUE) {
+						GENL_SET_ERR_MSG(info,
+								 "ip2net has invalid key");
+						rc = -EINVAL;
+						goto out;
+					}
+
+					len = nla_strlcpy(conf.lic_legacy_ip2nets,
+							  entry,
+							  sizeof(conf.lic_legacy_ip2nets));
+					if (len < 0) {
+						GENL_SET_ERR_MSG(info,
+								 "ip2net key string is invalid");
+						rc = len;
+						goto out;
+					}
+					ni_list = true;
+				} else if (nla_strcmp(entry, "net type") == 0) {
+					char tmp[LNET_NIDSTR_SIZE];
+
+					entry = nla_next(entry, &rem2);
+					if (nla_type(entry) !=
+					    LN_SCALAR_ATTR_VALUE) {
+						GENL_SET_ERR_MSG(info,
+								 "net type has invalid key");
+						rc = -EINVAL;
+						goto out;
+					}
+
+					len = nla_strlcpy(tmp, entry,
+							  sizeof(tmp));
+					if (len < 0) {
+						GENL_SET_ERR_MSG(info,
+								 "net type key string is invalid");
+						rc = len;
+						goto out;
+					}
+
+					net_id = libcfs_str2net(tmp);
+					if (!net_id) {
+						GENL_SET_ERR_MSG(info,
+								 "cannot parse net");
+						rc = -ENODEV;
+						goto out;
+					}
+					if (LNET_NETTYP(net_id) == LOLND) {
+						GENL_SET_ERR_MSG(info,
+								 "setting @lo not allowed");
+						rc = -ENODEV;
+						goto out;
+					}
+					conf.lic_legacy_ip2nets[0] = '\0';
+					conf.lic_ni_intf[0] = '\0';
+					ni_list = false;
+				}
+				if (rc < 0)
+					goto out;
+				break;
+			}
+			case LN_SCALAR_ATTR_LIST: {
+				bool create = info->nlhdr->nlmsg_flags &
+					      NLM_F_CREATE;
+				struct lnet_ioctl_config_lnd_tunables tun;
+
+				memset(&tun, 0, sizeof(tun));
+				tun.lt_cmn.lct_peer_timeout = -1;
+				conf.lic_ncpts = 0;
+
+				rc = lnet_genl_parse_local_ni(entry, info,
+							      net_id, &conf,
+							      &tun, &ni_list);
+				if (rc < 0)
+					goto out;
+
+				if (!create) {
+					struct lnet_net *net;
+					struct lnet_ni *ni;
+
+					rc = -ENODEV;
+					if (!strlen(conf.lic_ni_intf)) {
+						GENL_SET_ERR_MSG(info,
+								 "interface is missing");
+						goto out;
+					}
+
+					lnet_net_lock(LNET_LOCK_EX);
+					net = lnet_get_net_locked(net_id);
+					if (!net) {
+						GENL_SET_ERR_MSG(info,
+								 "LNet net doesn't exist");
+						goto out;
+					}
+					list_for_each_entry(ni, &net->net_ni_list,
+							    ni_netlist) {
+						if (!ni->ni_interface ||
+						    strncmp(ni->ni_interface,
+							    conf.lic_ni_intf,
+							    strlen(conf.lic_ni_intf)) != 0) {
+							ni = NULL;
+							continue;
+						}
+
+						lnet_net_unlock(LNET_LOCK_EX);
+						rc = lnet_dyn_del_ni(&ni->ni_nid);
+						lnet_net_lock(LNET_LOCK_EX);
+						if (rc < 0) {
+							GENL_SET_ERR_MSG(info,
+									 "cannot del LNet NI");
+							goto out;
+						}
+						break;
+					}
+
+					lnet_net_unlock(LNET_LOCK_EX);
+				} else {
+					rc = lnet_dyn_add_ni(&conf, net_id, &tun);
+					switch (rc) {
+					case -ENOENT:
+						GENL_SET_ERR_MSG(info,
+								 "cannot parse net");
+						break;
+					case -ERANGE:
+						GENL_SET_ERR_MSG(info,
+								 "invalid CPT set");
+					fallthrough;
+					default:
+						GENL_SET_ERR_MSG(info,
+								 "cannot add LNet NI");
+					case 0:
+						break;
+					}
+					if (rc < 0)
+						goto out;
+				}
+				break;
+			}
+			/* it is possible a newer version of the user land send
+			 * values older kernels doesn't handle. So silently
+			 * ignore these values
+			 */
+			default:
+				break;
+			}
+		}
+
+		/* Handle case of just sent NET with no list of NIDs */
+		if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE) && !ni_list) {
+			rc = lnet_dyn_del_net(net_id);
+			if (rc < 0) {
+				GENL_SET_ERR_MSG(info,
+						 "cannot del network");
+			}
+		}
+	}
+out:
+	return rc;
+}
+
+static const struct genl_multicast_group lnet_mcast_grps[] = {
+	{ .name	=	"ip2net",	},
+	{ .name =	"net",		},
+};
+
+static const struct genl_ops lnet_genl_ops[] = {
+	{
+		.cmd		= LNET_CMD_NETS,
+		.start		= lnet_net_show_start,
+		.dumpit		= lnet_net_show_dump,
+		.done		= lnet_net_show_done,
+		.doit		= lnet_net_cmd,
+	},
+};
+
+static struct genl_family lnet_family = {
+	.name		= LNET_GENL_NAME,
+	.version	= LNET_GENL_VERSION,
+	.module		= THIS_MODULE,
+	.netnsok	= true,
+	.ops		= lnet_genl_ops,
+	.n_ops		= ARRAY_SIZE(lnet_genl_ops),
+	.mcgrps		= lnet_mcast_grps,
+	.n_mcgrps	= ARRAY_SIZE(lnet_mcast_grps),
+};
+
 void LNetDebugPeer(struct lnet_processid *id)
 {
 	lnet_debug_peer(lnet_nid_to_nid4(&id->nid));
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index cebc725..4b2d776 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -367,8 +367,7 @@ struct lnet_net *
 	return net;
 }
 
-static int
-lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
+int lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
 {
 	if (!ni)
 		return -ENOMEM;
@@ -395,6 +394,7 @@ struct lnet_net *
 
 	return 0;
 }
+EXPORT_SYMBOL(lnet_ni_add_interface);
 
 static struct lnet_ni *
 lnet_ni_alloc_common(struct lnet_net *net, char *iface)
diff --git a/net/lnet/lnet/module.c b/net/lnet/lnet/module.c
index 9d7b39a..6e41e4b 100644
--- a/net/lnet/lnet/module.c
+++ b/net/lnet/lnet/module.c
@@ -41,8 +41,7 @@
 
 static DEFINE_MUTEX(lnet_config_mutex);
 
-static int
-lnet_configure(void *arg)
+int lnet_configure(void *arg)
 {
 	/* 'arg' only there so I can be passed to cfs_create_thread() */
 	int rc = 0;
@@ -68,8 +67,7 @@
 	return rc;
 }
 
-static int
-lnet_unconfigure(void)
+int lnet_unconfigure(void)
 {
 	int refcount;
 
@@ -134,16 +132,26 @@
 {
 	struct lnet_ioctl_config_ni *conf =
 		(struct lnet_ioctl_config_ni *)hdr;
-	int rc;
+	int rc = -EINVAL;
 
 	if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf))
-		return -EINVAL;
+		return rc;
 
 	mutex_lock(&lnet_config_mutex);
-	if (the_lnet.ln_niinit_self)
-		rc = lnet_dyn_add_ni(conf);
-	else
-		rc = -EINVAL;
+	if (the_lnet.ln_niinit_self) {
+		struct lnet_ioctl_config_lnd_tunables *tun = NULL;
+		struct lnet_nid nid;
+		u32 net_id;
+
+		/* get the tunables if they are available */
+		if (conf->lic_cfg_hdr.ioc_len >=
+		    sizeof(*conf) + sizeof(*tun))
+			tun = (struct lnet_ioctl_config_lnd_tunables *) conf->lic_bulk;
+
+		lnet_nid4_to_nid(conf->lic_nid, &nid);
+		net_id = LNET_NID_NET(&nid);
+		rc = lnet_dyn_add_ni(conf, net_id, tun);
+	}
 	mutex_unlock(&lnet_config_mutex);
 
 	return rc;
@@ -154,16 +162,16 @@
 {
 	struct lnet_ioctl_config_ni *conf =
 		(struct lnet_ioctl_config_ni *)hdr;
-	int rc;
+	struct lnet_nid nid;
+	int rc = EINVAL;
 
-	if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf))
-		return -EINVAL;
+	if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf) ||
+	    !the_lnet.ln_niinit_self)
+		return rc;
 
+	lnet_nid4_to_nid(conf->lic_nid, &nid);
 	mutex_lock(&lnet_config_mutex);
-	if (the_lnet.ln_niinit_self)
-		rc = lnet_dyn_del_ni(conf);
-	else
-		rc = -EINVAL;
+	rc = lnet_dyn_del_ni(&nid);
 	mutex_unlock(&lnet_config_mutex);
 
 	return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (5 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids James Simmons
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

There is a limit of the number of in-flight mod rpcs with a
complication that a 'close' rpc is always permitted if there are no
other close rpcs in flight, even if that would exceed the limit.

When a non-close-request complete, we just wake the first waiting
request and assume it will use the slot we released.  When a
close-request completes, the first waiting request may not find a slot
if the close was using the 'extra' slot.  So in that case we wake all
waiting requests and let them fit it out.  This is wasteful and
unfair.

To correct this we revise the wait/wake approach to use a dedicated
wakeup function which atomically checks if a given task can proceed,
and updates the counters when permission to proceed is given.  This
means that once a task has been woken, it has already been accounted
and it can proceed.

To minimise locking, cl_mod_rpcs_lock is discarded and
cl_mod_rpcs_waitq.lock is used to protect the counters.  For the
fast-path where the max has not been reached, this means we take and
release that spinlock just once.  We call wake_up_locked while still
holding the lock, and if that woke the process, then we don't drop the
spinlock to wait, but proceed directly to the remainder of the task.

When the last 'close' rpc completes, the wake function will iterate
the whole wait queue until it finds a task waiting to submit a close
request.  When any other rpc completes, the queue will only be
searched until the maximum is reached.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15947
Lustre-commit: 5243630b09d22e0b5 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44041
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h     |   1 -
 fs/lustre/ldlm/ldlm_lib.c   |   1 -
 fs/lustre/obdclass/genops.c | 158 ++++++++++++++++++++++++--------------------
 3 files changed, 88 insertions(+), 72 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 16f66ea..56e5641 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -326,7 +326,6 @@ struct client_obd {
 	/* modify rpcs in flight
 	 * currently used for metadata only
 	 */
-	spinlock_t		cl_mod_rpcs_lock;
 	u16			cl_max_mod_rpcs_in_flight;
 	u16			cl_mod_rpcs_in_flight;
 	u16			cl_close_rpcs_in_flight;
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 08aff4f..e4262c3 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -444,7 +444,6 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 	else
 		cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT;
 
-	spin_lock_init(&cli->cl_mod_rpcs_lock);
 	spin_lock_init(&cli->cl_mod_rpcs_hist.oh_lock);
 	cli->cl_max_mod_rpcs_in_flight = 0;
 	cli->cl_mod_rpcs_in_flight = 0;
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 2031320..6e4d240 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -1426,16 +1426,16 @@ int obd_set_max_mod_rpcs_in_flight(struct client_obd *cli, u16 max)
 		return -ERANGE;
 	}
 
-	spin_lock(&cli->cl_mod_rpcs_lock);
+	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
 
 	prev = cli->cl_max_mod_rpcs_in_flight;
 	cli->cl_max_mod_rpcs_in_flight = max;
 
 	/* wakeup waiters if limit has been increased */
 	if (cli->cl_max_mod_rpcs_in_flight > prev)
-		wake_up(&cli->cl_mod_rpcs_waitq);
+		wake_up_locked(&cli->cl_mod_rpcs_waitq);
 
-	spin_unlock(&cli->cl_mod_rpcs_lock);
+	spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock);
 
 	return 0;
 }
@@ -1446,7 +1446,7 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
 	unsigned long mod_tot = 0, mod_cum;
 	int i;
 
-	spin_lock(&cli->cl_mod_rpcs_lock);
+	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
 	lprocfs_stats_header(seq, ktime_get(), cli->cl_mod_rpcs_init, 25,
 			     ":", true);
 	seq_printf(seq, "modify_RPCs_in_flight:  %hu\n",
@@ -1469,13 +1469,13 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
 			break;
 	}
 
-	spin_unlock(&cli->cl_mod_rpcs_lock);
+	spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock);
 
 	return 0;
 }
 EXPORT_SYMBOL(obd_mod_rpc_stats_seq_show);
-/*
- * The number of modify RPCs sent in parallel is limited
+
+/* The number of modify RPCs sent in parallel is limited
  * because the server has a finite number of slots per client to
  * store request result and ensure reply reconstruction when needed.
  * On the client, this limit is stored in cl_max_mod_rpcs_in_flight
@@ -1484,34 +1484,55 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq)
  * On the MDC client, to avoid a potential deadlock (see Bugzilla 3462),
  * one close request is allowed above the maximum.
  */
-static inline bool obd_mod_rpc_slot_avail_locked(struct client_obd *cli,
-						 bool close_req)
+struct mod_waiter {
+	struct client_obd *cli;
+	bool close_req;
+	wait_queue_entry_t wqe;
+};
+static int claim_mod_rpc_function(wait_queue_entry_t *wq_entry,
+				  unsigned int mode, int flags, void *key)
 {
+	struct mod_waiter *w = container_of(wq_entry, struct mod_waiter, wqe);
+	struct client_obd *cli = w->cli;
+	bool close_req = w->close_req;
 	bool avail;
+	int ret;
+
+	/* As woken_wake_function() doesn't remove us from the wait_queue,
+	 * we could get called twice for the same thread - take care.
+	 */
+	if (wq_entry->flags & WQ_FLAG_WOKEN)
+		/* Already woke this thread, don't try again */
+		return 0;
 
 	/* A slot is available if
 	 * - number of modify RPCs in flight is less than the max
 	 * - it's a close RPC and no other close request is in flight
 	 */
 	avail = cli->cl_mod_rpcs_in_flight < cli->cl_max_mod_rpcs_in_flight ||
-		(close_req && !cli->cl_close_rpcs_in_flight);
-
-	return avail;
-}
-
-static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli,
-					  bool close_req)
-{
-	bool avail;
-
-	spin_lock(&cli->cl_mod_rpcs_lock);
-	avail = obd_mod_rpc_slot_avail_locked(cli, close_req);
-	spin_unlock(&cli->cl_mod_rpcs_lock);
-	return avail;
+		(close_req && cli->cl_close_rpcs_in_flight == 0);
+	if (avail) {
+		cli->cl_mod_rpcs_in_flight++;
+		if (w->close_req)
+			cli->cl_close_rpcs_in_flight++;
+		ret = woken_wake_function(wq_entry, mode, flags, key);
+	} else if (cli->cl_close_rpcs_in_flight)
+		/* No other waiter could be woken */
+		ret = -1;
+	else if (!key)
+		/* This was not a wakeup from a close completion, so there is no
+		 * point seeing if there are close waiters to be woken
+		 */
+		ret = -1;
+	else
+		/* There might be a close so we could wake, keep looking */
+		ret = 0;
+	return ret;
 }
 
 /* Get a modify RPC slot from the obd client @cli according
- * to the kind of operation @opc that is going to be sent.
+ * to the kind of operation @opc that is going to be sent
+ * and the intent @it of the operation if it applies.
  * If the maximum number of modify RPCs in flight is reached
  * the thread is put to sleep.
  * Returns the tag to be set in the request message. Tag 0
@@ -1519,51 +1540,51 @@ static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli,
  */
 u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc)
 {
-	bool close_req = false;
+	struct mod_waiter wait = {
+		.cli = cli,
+		.close_req = (opc == MDS_CLOSE),
+	};
 	u16 i, max;
 
-	if (opc == MDS_CLOSE)
-		close_req = true;
-
-	do {
-		spin_lock(&cli->cl_mod_rpcs_lock);
-		max = cli->cl_max_mod_rpcs_in_flight;
-		if (obd_mod_rpc_slot_avail_locked(cli, close_req)) {
-			/* there is a slot available */
-			cli->cl_mod_rpcs_in_flight++;
-			if (close_req)
-				cli->cl_close_rpcs_in_flight++;
-			lprocfs_oh_tally(&cli->cl_mod_rpcs_hist,
-					 cli->cl_mod_rpcs_in_flight);
-			/* find a free tag */
-			i = find_first_zero_bit(cli->cl_mod_tag_bitmap,
-						max + 1);
-			LASSERT(i < OBD_MAX_RIF_MAX);
-			LASSERT(!test_and_set_bit(i, cli->cl_mod_tag_bitmap));
-			spin_unlock(&cli->cl_mod_rpcs_lock);
-			/* tag 0 is reserved for non-modify RPCs */
-
-			CDEBUG(D_RPCTRACE,
-			       "%s: modify RPC slot %u is allocated opc %u, max %hu\n",
-			       cli->cl_import->imp_obd->obd_name,
-			       i + 1, opc, max);
-
-			return i + 1;
-		}
-		spin_unlock(&cli->cl_mod_rpcs_lock);
-
-		CDEBUG(D_RPCTRACE, "%s: sleeping for a modify RPC slot opc %u, max %hu\n",
-		       cli->cl_import->imp_obd->obd_name, opc, max);
+	init_wait(&wait.wqe);
+	wait.wqe.func = claim_mod_rpc_function;
 
-		wait_event_idle_exclusive(cli->cl_mod_rpcs_waitq,
-					  obd_mod_rpc_slot_avail(cli,
-								 close_req));
-	} while (true);
+	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
+	__add_wait_queue(&cli->cl_mod_rpcs_waitq, &wait.wqe);
+	/* This wakeup will only succeed if the maximums haven't
+	 * been reached.  If that happens, WQ_FLAG_WOKEN will be cleared
+	 * and there will be no need to wait.
+	 */
+	wake_up_locked(&cli->cl_mod_rpcs_waitq);
+	if (!(wait.wqe.flags & WQ_FLAG_WOKEN)) {
+		spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock);
+		wait_woken(&wait.wqe, TASK_UNINTERRUPTIBLE,
+			   MAX_SCHEDULE_TIMEOUT);
+		spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
+	}
+	__remove_wait_queue(&cli->cl_mod_rpcs_waitq, &wait.wqe);
+
+	max = cli->cl_max_mod_rpcs_in_flight;
+	lprocfs_oh_tally(&cli->cl_mod_rpcs_hist,
+			 cli->cl_mod_rpcs_in_flight);
+	/* find a free tag */
+	i = find_first_zero_bit(cli->cl_mod_tag_bitmap,
+				max + 1);
+	LASSERT(i < OBD_MAX_RIF_MAX);
+	LASSERT(!test_and_set_bit(i, cli->cl_mod_tag_bitmap));
+	spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock);
+	/* tag 0 is reserved for non-modify RPCs */
+
+	CDEBUG(D_RPCTRACE,
+	       "%s: modify RPC slot %u is allocated opc %u, max %hu\n",
+	       cli->cl_import->imp_obd->obd_name,
+	       i + 1, opc, max);
+
+	return i + 1;
 }
 EXPORT_SYMBOL(obd_get_mod_rpc_slot);
 
-/*
- * Put a modify RPC slot from the obd client @cli according
+/* Put a modify RPC slot from the obd client @cli according
  * to the kind of operation @opc that has been sent.
  */
 void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag)
@@ -1576,18 +1597,15 @@ void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag)
 	if (opc == MDS_CLOSE)
 		close_req = true;
 
-	spin_lock(&cli->cl_mod_rpcs_lock);
+	spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock);
 	cli->cl_mod_rpcs_in_flight--;
 	if (close_req)
 		cli->cl_close_rpcs_in_flight--;
 	/* release the tag in the bitmap */
 	LASSERT(tag - 1 < OBD_MAX_RIF_MAX);
 	LASSERT(test_and_clear_bit(tag - 1, cli->cl_mod_tag_bitmap) != 0);
-	spin_unlock(&cli->cl_mod_rpcs_lock);
-	/* LU-14741 - to prevent close RPCs stuck behind normal ones */
-	if (close_req)
-		wake_up_all(&cli->cl_mod_rpcs_waitq);
-	else
-		wake_up(&cli->cl_mod_rpcs_waitq);
+	__wake_up_locked_key(&cli->cl_mod_rpcs_waitq, TASK_NORMAL,
+			     (void *)close_req);
+	spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock);
 }
 EXPORT_SYMBOL(obd_put_mod_rpc_slot);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (6 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog James Simmons
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The ping packet has an array of fixed-size status entries that only
have room for a 4-byte-address nid.

This patches adds a feature flag which activates a list of variable
sized entries after the initial array.

Each entry contains a 4-byte status and then a nid, rounded to a
multiple of 4 bytes.  The total number of bytes of the ping_info
(header, first array, subsequent list) is stored in the ns_unused
field of the first entry in the array.

The user-space interfaces only see the initial array.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: db0fb8f2b649c0c38 ("LU-10391 lnet: allow ping packet to contain large nids")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44628
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h     |  39 +++++++++++
 include/uapi/linux/lnet/lnet-idl.h |  58 +++++++++++-----
 net/lnet/lnet/api-ni.c             | 131 +++++++++++++++++++++++--------------
 net/lnet/lnet/lib-msg.c            |   2 +-
 4 files changed, 165 insertions(+), 65 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 2d3b044..73d962f 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -684,6 +684,45 @@ struct lnet_ping_buffer {
 #define LNET_PING_INFO_TO_BUFFER(PINFO)	\
 	container_of((PINFO), struct lnet_ping_buffer, pb_info)
 
+static inline int
+lnet_ping_sts_size(const struct lnet_nid *nid)
+{
+	int size;
+
+	if (nid_is_nid4(nid))
+		return sizeof(struct lnet_ni_status);
+
+	size = offsetof(struct lnet_ni_large_status, ns_nid) +
+	       NID_BYTES(nid);
+
+	return round_up(size, 4);
+}
+
+static inline struct lnet_ni_large_status *
+lnet_ping_sts_next(const struct lnet_ni_large_status *nis)
+{
+	return (void *)nis + lnet_ping_sts_size(&nis->ns_nid);
+}
+
+static inline bool
+lnet_ping_at_least_two_entries(const struct lnet_ping_info *pi)
+{
+	/* Return true if we have at lease two entries.  There is always a
+	 * least one, a 4-byte lo0 interface.
+	 */
+	struct lnet_ni_large_status *lns;
+
+	if ((pi->pi_features & LNET_PING_FEAT_LARGE_ADDR) == 0)
+		return pi->pi_nnis <= 2;
+	/* There is at least 1 large-address entry */
+	if (pi->pi_nnis != 1)
+		return false;
+	lns = (void *)&pi->pi_ni[1];
+	lns = lnet_ping_sts_next(lns);
+
+	return ((void *)pi + lnet_ping_info_size(pi) <= (void *)lns);
+}
+
 struct lnet_nid_list {
 	struct list_head nl_list;
 	struct lnet_nid nl_nid;
diff --git a/include/uapi/linux/lnet/lnet-idl.h b/include/uapi/linux/lnet/lnet-idl.h
index 41bbb40..479e7fa 100644
--- a/include/uapi/linux/lnet/lnet-idl.h
+++ b/include/uapi/linux/lnet/lnet-idl.h
@@ -247,7 +247,6 @@ struct lnet_counters_common {
 	__u64	lcc_drop_length;
 } __attribute__((packed));
 
-
 #define LNET_NI_STATUS_UP	0x15aac0de
 #define LNET_NI_STATUS_DOWN	0xdeadface
 #define LNET_NI_STATUS_INVALID	0x00000000
@@ -255,19 +254,32 @@ struct lnet_counters_common {
 struct lnet_ni_status {
 	lnet_nid_t ns_nid;
 	__u32      ns_status;
-	__u32      ns_unused;
+	__u32      ns_msg_size;	/* represents ping buffer size if message
+				 * contains large NID addresses.
+				 */
 } __attribute__((packed));
 
-/*
- * NB: value of these features equal to LNET_PROTO_PING_VERSION_x
+/* When this appears in lnet_ping_info, it will be large
+ * enough to hold whatever nid is present, rounded up
+ * to a multiple of 4 bytes.
+ * NOTE: all users MUST check ns_nid.nid_size is usable.
+ */
+struct lnet_ni_large_status {
+	__u32		ns_status;
+	struct lnet_nid	ns_nid;
+} __attribute__((packed));
+
+/* NB: value of these features equal to LNET_PROTO_PING_VERSION_x
  * of old LNet, so there shouldn't be any compatibility issue
  */
 #define LNET_PING_FEAT_INVAL		(0)		/* no feature */
 #define LNET_PING_FEAT_BASE		(1 << 0)	/* just a ping */
 #define LNET_PING_FEAT_NI_STATUS	(1 << 1)	/* return NI status */
-#define LNET_PING_FEAT_RTE_DISABLED	(1 << 2)        /* Routing enabled */
-#define LNET_PING_FEAT_MULTI_RAIL	(1 << 3)        /* Multi-Rail aware */
+#define LNET_PING_FEAT_RTE_DISABLED	(1 << 2)	/* Routing enabled */
+#define LNET_PING_FEAT_MULTI_RAIL	(1 << 3)	/* Multi-Rail aware */
 #define LNET_PING_FEAT_DISCOVERY	(1 << 4)	/* Supports Discovery */
+#define LNET_PING_FEAT_LARGE_ADDR	(1 << 5)	/* Large addr nids present */
+#define LNET_PING_FEAT_PRIMARY_LARGE	(1 << 6)	/* Primary is first Large addr */
 
 /*
  * All ping feature bits fit to hit the wire.
@@ -277,17 +289,26 @@ struct lnet_ni_status {
  * New feature bits can be added, just be aware that this does change the
  * over-the-wire protocol.
  */
-#define LNET_PING_FEAT_BITS		(LNET_PING_FEAT_BASE | \
-					 LNET_PING_FEAT_NI_STATUS | \
-					 LNET_PING_FEAT_RTE_DISABLED | \
-					 LNET_PING_FEAT_MULTI_RAIL | \
-					 LNET_PING_FEAT_DISCOVERY)
-
+#define LNET_PING_FEAT_BITS		(LNET_PING_FEAT_BASE |		\
+					 LNET_PING_FEAT_NI_STATUS |	\
+					 LNET_PING_FEAT_RTE_DISABLED |	\
+					 LNET_PING_FEAT_MULTI_RAIL |	\
+					 LNET_PING_FEAT_DISCOVERY |	\
+					 LNET_PING_FEAT_LARGE_ADDR |	\
+					 LNET_PING_FEAT_PRIMARY_LARGE)
+
+/* NOTE:
+ * The first address in pi_ni *must* be the loop-back nid: LNET_NID_LO_0
+ * The second address must be the primary nid for the host unless
+ * LNET_PING_FEAT_PRIMARY_LARGE is set, then the first large address
+ * is the preferred primary.  However nodes that do not recognise that
+ * flag will quietly ignore it.
+ */
 struct lnet_ping_info {
 	__u32			pi_magic;
 	__u32			pi_features;
 	lnet_pid_t		pi_pid;
-	__u32			pi_nnis;
+	__u32			pi_nnis;	/* number of nid4 entries */
 	struct lnet_ni_status	pi_ni[0];
 } __attribute__((packed));
 
@@ -297,7 +318,14 @@ struct lnet_ping_info {
 	offsetof(struct lnet_ping_info, pi_ni[LNET_INTERFACES_MIN])
 #define LNET_PING_INFO_LONI(PINFO)      ((PINFO)->pi_ni[0].ns_nid)
 #define LNET_PING_INFO_SEQNO(PINFO)     ((PINFO)->pi_ni[0].ns_status)
-#define lnet_ping_info_size(pinfo)	\
-	offsetof(struct lnet_ping_info, pi_ni[(pinfo)->pi_nnis])
+/* If LNET_PING_FEAT_LARGE_ADDR set, pi_nnis is the number of nid4 entries
+ * and pi_ni[0].ns_msg_size is the total number of bytes, including header and
+ * lnet_ni_large_status entries which follow the lnet_ni_status entries.
+ * This must be a multiple of 4.
+ */
+#define lnet_ping_info_size(pinfo)				\
+	(((pinfo)->pi_features & LNET_PING_FEAT_LARGE_ADDR)	\
+	? ((pinfo)->pi_ni[0].ns_msg_size & ~3)			\
+	: offsetof(struct lnet_ping_info, pi_ni[(pinfo)->pi_nnis]))
 
 #endif
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index af875ba..935c848 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -823,8 +823,15 @@ static void lnet_assert_wire_constants(void)
 	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_nid) != 8);
 	BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_status) != 8);
 	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_status) != 4);
-	BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_unused) != 12);
-	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_unused) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_msg_size) != 12);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_msg_size) != 4);
+
+	/* Checks for struct lnet_ni_large_status */
+	BUILD_BUG_ON((int)sizeof(struct lnet_ni_large_status) != 24);
+	BUILD_BUG_ON((int)offsetof(struct lnet_ni_large_status, ns_status) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_large_status *)0)->ns_status) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_ni_large_status, ns_nid) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_ni_large_status *)0)->ns_nid) != 20);
 
 	/* Checks for struct lnet_ping_info and related constants */
 	BUILD_BUG_ON(LNET_PROTO_PING_MAGIC != 0x70696E67);
@@ -834,7 +841,9 @@ static void lnet_assert_wire_constants(void)
 	BUILD_BUG_ON(LNET_PING_FEAT_RTE_DISABLED != 4);
 	BUILD_BUG_ON(LNET_PING_FEAT_MULTI_RAIL != 8);
 	BUILD_BUG_ON(LNET_PING_FEAT_DISCOVERY != 16);
-	BUILD_BUG_ON(LNET_PING_FEAT_BITS != 31);
+	BUILD_BUG_ON(LNET_PING_FEAT_LARGE_ADDR != 32);
+	BUILD_BUG_ON(LNET_PING_FEAT_PRIMARY_LARGE != 64);
+	BUILD_BUG_ON(LNET_PING_FEAT_BITS != 127);
 
 	/* Checks for struct lnet_ping_info */
 	BUILD_BUG_ON((int)sizeof(struct lnet_ping_info) != 16);
@@ -1770,21 +1779,7 @@ struct lnet_ping_buffer *
 	int bytes = 0;
 
 	list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
-		if (nid_is_nid4(&ni->ni_nid))
-			bytes += sizeof(struct lnet_ni_status);
-
-	return bytes;
-}
-
-static inline int
-lnet_get_net_ni_bytes_pre(struct lnet_net *net)
-{
-	struct lnet_ni *ni;
-	int bytes = 0;
-
-	list_for_each_entry(ni, &net->net_ni_added, ni_netlist)
-		if (nid_is_nid4(&ni->ni_nid))
-			bytes += sizeof(struct lnet_ni_status);
+		bytes += lnet_ping_sts_size(&ni->ni_nid);
 
 	return bytes;
 }
@@ -1800,9 +1795,7 @@ struct lnet_ping_buffer *
 
 	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
 		list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
-			if (nid_is_nid4(&ni->ni_nid))
-				bytes += sizeof(struct lnet_ni_status);
-
+			bytes += lnet_ping_sts_size(&ni->ni_nid);
 	}
 
 	lnet_net_unlock(0);
@@ -1813,6 +1806,7 @@ struct lnet_ping_buffer *
 void
 lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf)
 {
+	struct lnet_ni_large_status *lstat, *lend;
 	struct lnet_ni_status *stat, *end;
 	int nnis;
 	int i;
@@ -1827,6 +1821,19 @@ struct lnet_ping_buffer *
 	for (i = 0; i < nnis && stat + 1 <= end; i++, stat++) {
 		__swab64s(&stat->ns_nid);
 		__swab32s(&stat->ns_status);
+		if (i == 0)
+			/* Might be total size */
+			__swab32s(&stat->ns_msg_size);
+	}
+	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_LARGE_ADDR))
+		return;
+
+	lstat = (struct lnet_ni_large_status *)stat;
+	lend = (void *)end;
+	while (lstat + 1 <= lend) {
+		__swab32s(&lstat->ns_status);
+		/* struct lnet_nid never needs to be swabed */
+		lstat = lnet_ping_sts_next(lstat);
 	}
 }
 
@@ -1954,6 +1961,7 @@ struct lnet_ping_buffer *
 static void
 lnet_ping_target_install_locked(struct lnet_ping_buffer *pbuf)
 {
+	struct lnet_ni_large_status *lns, *lend;
 	struct lnet_ni_status *ns, *end;
 	struct lnet_ni *ni;
 	struct lnet_net *net;
@@ -1964,8 +1972,14 @@ struct lnet_ping_buffer *
 	end = (void *)&pbuf->pb_info + pbuf->pb_nbytes;
 	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
 		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
-			if (!nid_is_nid4(&ni->ni_nid))
+			if (!nid_is_nid4(&ni->ni_nid)) {
+				if (ns == &pbuf->pb_info.pi_ni[1]) {
+					/* This is primary, and it is long */
+					pbuf->pb_info.pi_features |=
+						LNET_PING_FEAT_PRIMARY_LARGE;
+				}
 				continue;
+			}
 			LASSERT(ns + 1 <= end);
 			ns->ns_nid = lnet_nid_to_nid4(&ni->ni_nid);
 
@@ -1979,6 +1993,31 @@ struct lnet_ping_buffer *
 		}
 	}
 
+	lns = (void *)ns;
+	lend = (void *)end;
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (nid_is_nid4(&ni->ni_nid))
+				continue;
+			LASSERT(lns + 1 <= lend);
+
+			lns->ns_nid = ni->ni_nid;
+
+			lnet_ni_lock(ni);
+			ns->ns_status = lnet_ni_get_status_locked(ni);
+			ni->ni_status = &lns->ns_status;
+			lnet_ni_unlock(ni);
+
+			lns = lnet_ping_sts_next(lns);
+		}
+	}
+	if ((void *)lns > (void *)ns) {
+		/* Record total info size */
+		pbuf->pb_info.pi_ni[0].ns_msg_size =
+			(void *)lns - (void *)&pbuf->pb_info;
+		pbuf->pb_info.pi_features |= LNET_PING_FEAT_LARGE_ADDR;
+	}
+
 	/* We (ab)use the ns_status of the loopback interface to
 	 * transmit the sequence number. The first interface listed
 	 * must be the loopback interface.
@@ -3397,7 +3436,6 @@ static int lnet_add_net_common(struct lnet_net *net,
 	struct lnet_ping_buffer *pbuf;
 	struct lnet_remotenet *rnet;
 	struct lnet_ni *ni;
-	int net_ni_bytes;
 	u32 net_id;
 	int rc;
 
@@ -3415,39 +3453,32 @@ static int lnet_add_net_common(struct lnet_net *net,
 		return -EUSERS;
 	}
 
-	/*
-	 * make sure you calculate the correct number of slots in the ping
+	if (tun)
+		memcpy(&net->net_tunables,
+		       &tun->lt_cmn, sizeof(net->net_tunables));
+	else
+		memset(&net->net_tunables, -1, sizeof(net->net_tunables));
+
+	net_id = net->net_id;
+
+	rc = lnet_startup_lndnet(net, (tun ? &tun->lt_tun : NULL));
+	if (rc < 0)
+		return rc;
+
+	/* make sure you calculate the correct number of slots in the ping
 	 * buffer. Since the ping info is a flattened list of all the NIs,
 	 * we should allocate enough slots to accomodate the number of NIs
 	 * which will be added.
-	 *
-	 * since ni hasn't been configured yet, use
-	 * lnet_get_net_ni_bytes_pre() which checks the net_ni_added list
 	 */
-	net_ni_bytes = lnet_get_net_ni_bytes_pre(net);
-
 	rc = lnet_ping_target_setup(&pbuf, &ping_mdh,
 				    LNET_PING_INFO_HDR_SIZE +
-				    net_ni_bytes + lnet_get_ni_bytes(),
+				    lnet_get_ni_bytes(),
 				    false);
 	if (rc < 0) {
-		lnet_net_free(net);
+		lnet_shutdown_lndnet(net);
 		return rc;
 	}
 
-	if (tun)
-		memcpy(&net->net_tunables,
-		       &tun->lt_cmn, sizeof(net->net_tunables));
-	else
-		memset(&net->net_tunables, -1, sizeof(net->net_tunables));
-
-	net_id = net->net_id;
-
-	rc = lnet_startup_lndnet(net, (tun ?
-				     &tun->lt_tun : NULL));
-	if (rc < 0)
-		goto failed;
-
 	lnet_net_lock(LNET_LOCK_EX);
 	net = lnet_get_net_locked(net_id);
 	LASSERT(net);
@@ -3678,7 +3709,7 @@ int lnet_dyn_del_ni(struct lnet_nid *nid)
 	rc = lnet_ping_target_setup(&pbuf, &ping_mdh,
 				    (LNET_PING_INFO_HDR_SIZE +
 				     lnet_get_ni_bytes() -
-				     sizeof(pbuf->pb_info.pi_ni[0])),
+				     lnet_ping_sts_size(&ni->ni_nid)),
 				    false);
 	if (rc != 0)
 		goto unlock_api_mutex;
@@ -5428,10 +5459,12 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		goto fail_ping_buffer_decref;
 	}
 
-	/* Test if smaller than lnet_pinginfo with no pi_ni status info */
-	if (nob < LNET_PING_INFO_HDR_SIZE) {
+	/* Test if smaller than lnet_pinginfo with just one pi_ni status info.
+	 * That one might contain size when large nids are used.
+	 */
+	if (nob < LNET_PING_INFO_SIZE(1)) {
 		CERROR("%s: Short reply %d(%lu min)\n",
-		       libcfs_idstr(&id), nob, LNET_PING_INFO_HDR_SIZE);
+		       libcfs_idstr(&id), nob, LNET_PING_INFO_SIZE(1));
 		goto fail_ping_buffer_decref;
 	}
 
diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 9fb001e..898d867 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -831,7 +831,7 @@
 		 * I only have a single (non-lolnd) interface.
 		 */
 		pi = &the_lnet.ln_ping_target->pb_info;
-		if (pi->pi_nnis <= 2) {
+		if (lnet_ping_at_least_two_entries(pi)) {
 			handle_local_health = false;
 			attempt_local_resend = false;
 		}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (7 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled James Simmons
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

This patch is further development of idea to skip bad
corrupted) llogs data. If llog has fixed-size records
then it is possible to skip one record but not rest of
llog block.

Patch also fixes the skipping to the next chunk:
 - make sure to skip to the next block for partial chunk
   or it causes the same block re-read.
 - handle index == 0 as goal for the llog_next_block() as
   expected exclusion and just return requested block
 - set new index after block was skipped to the first one
   in block
 - don't create fake padding record in llog_osd_next_block()
   as the caller can handle it and would know about
 - restore test_8 functionality to check corruption handling

Fixes: b79e7c205e40 ("lustre: llog: add synchronization for the last record")
WC-bug-id: https://jira.whamcloud.com/browse/LU-16203
Lustre-commit: cf121b16685fe2a27 ("LU-16203 llog: skip bad records in llog")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48776
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/llog.c | 86 ++++++++++++++++++++++++++++-------------------
 1 file changed, 52 insertions(+), 34 deletions(-)

diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c
index eb8f7e5..90bb8bd 100644
--- a/fs/lustre/obdclass/llog.c
+++ b/fs/lustre/obdclass/llog.c
@@ -233,27 +233,26 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle,
 }
 EXPORT_SYMBOL(llog_init_handle);
 
+#define LLOG_ERROR_REC(lgh, rec, format, a...) \
+	CERROR("%s: "DFID" rec type=%x idx=%u len=%u, " format "\n", \
+	       loghandle2name(lgh), PLOGID(&lgh->lgh_id), (rec)->lrh_type, \
+	       (rec)->lrh_index, (rec)->lrh_len, ##a)
+
 int llog_verify_record(const struct llog_handle *llh, struct llog_rec_hdr *rec)
 {
 	int chunk_size = llh->lgh_hdr->llh_hdr.lrh_len;
 
-	if (rec->lrh_len == 0 || rec->lrh_len > chunk_size) {
-		CERROR("%s: record is too large: %d > %d\n",
-		       loghandle2name(llh), rec->lrh_len, chunk_size);
-		return -EINVAL;
-	}
-	if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr)) {
-		CERROR("%s: index is too high: %d\n",
-		       loghandle2name(llh), rec->lrh_index);
-		return -EINVAL;
-	}
-	if ((rec->lrh_type & LLOG_OP_MASK) != LLOG_OP_MAGIC) {
-		CERROR("%s: magic %x is bad\n",
-		       loghandle2name(llh), rec->lrh_type);
-		return -EINVAL;
-	}
+	if ((rec->lrh_type & LLOG_OP_MASK) != LLOG_OP_MAGIC)
+		LLOG_ERROR_REC(llh, rec, "magic is bad");
+	else if (rec->lrh_len == 0 || rec->lrh_len > chunk_size)
+		LLOG_ERROR_REC(llh, rec, "bad record len, chunk size is %d",
+			       chunk_size);
+	else if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr))
+		LLOG_ERROR_REC(llh, rec, "index is too high");
+	else
+		return 0;
 
-	return 0;
+	return -EINVAL;
 }
 
 static inline bool llog_is_index_skipable(int idx, struct llog_log_hdr *llh,
@@ -278,7 +277,6 @@ static int llog_process_thread(void *arg)
 	int saved_index = 0;
 	int last_called_index = 0;
 	bool repeated = false;
-	bool refresh_idx = false;
 
 	if (!llh)
 		return -EINVAL;
@@ -346,6 +344,11 @@ static int llog_process_thread(void *arg)
 			rc = 0;
 			goto out;
 		}
+		/* EOF while trying to skip to the next chunk */
+		if (!index && rc == -EBADR) {
+			rc = 0;
+			goto out;
+		}
 		if (rc)
 			goto out;
 
@@ -377,6 +380,15 @@ static int llog_process_thread(void *arg)
 			CDEBUG(D_OTHER, "after swabbing, type=%#x idx=%d\n",
 			       rec->lrh_type, rec->lrh_index);
 
+			/* start with first rec if block was skipped */
+			if (!index) {
+				CDEBUG(D_OTHER,
+				       "%s: skipping to the index %u\n",
+				       loghandle2name(loghandle),
+				       rec->lrh_index);
+				index = rec->lrh_index;
+			}
+
 			if (index == (synced_idx + 1) &&
 			    synced_idx == LLOG_HDR_TAIL(llh)->lrt_index) {
 				rc = 0;
@@ -399,11 +411,15 @@ static int llog_process_thread(void *arg)
 			 * it turns to
 			 * lh_last_idx != LLOG_HDR_TAIL(llh)->lrt_index
 			 * This exception is working for catalog only.
+			 * The last check is for the partial chunk boundary,
+			 * if it is reached then try to re-read for possible
+			 * new records once.
 			 */
 			if ((index == lh_last_idx && synced_idx != index) ||
 			    (index == (lh_last_idx + 1) &&
 			     lh_last_idx != LLOG_HDR_TAIL(llh)->lrt_index) ||
-			    (rec->lrh_index == 0 && !repeated)) {
+			    (((char *)rec - buf >= cur_offset - chunk_offset) &&
+			    !repeated)) {
 				/* save offset inside buffer for the re-read */
 				buf_offset = (char *)rec - (char *)buf;
 				cur_offset = chunk_offset;
@@ -415,24 +431,27 @@ static int llog_process_thread(void *arg)
 				CDEBUG(D_OTHER, "synced_idx: %d\n", synced_idx);
 				goto repeat;
 			}
-
 			repeated = false;
 
 			rc = llog_verify_record(loghandle, rec);
 			if (rc) {
-				CERROR("%s: invalid record in llog "DFID" record for index %d/%d: rc = %d\n",
-				       loghandle2name(loghandle),
-				       PLOGID(&loghandle->lgh_id),
-				       rec->lrh_len, index, rc);
+				CDEBUG(D_OTHER, "invalid record at index %d\n",
+				       index);
 				/*
-				 * the block seem to be corrupted, let's try
-				 * with the next one. reset rc to go to the
-				 * next chunk.
+				 * for fixed-sized llogs we can skip one record
+				 * by using llh_size from llog header.
+				 * Otherwise skip the next llog chunk.
 				 */
-				refresh_idx = true;
-				index = 0;
 				rc = 0;
-				goto repeat;
+				if (llh->llh_flags & LLOG_F_IS_FIXSIZE) {
+					rec->lrh_len = llh->llh_size;
+					goto next_rec;
+				}
+				/* make sure that is always next block */
+				cur_offset = chunk_offset + chunk_size;
+				/* no goal to find, just next block to read */
+				index = 0;
+				break;
 			}
 
 			if (rec->lrh_index < index) {
@@ -446,10 +465,9 @@ static int llog_process_thread(void *arg)
 				 * gap which can be result of old bugs, just
 				 * keep going
 				 */
-				CERROR("%s: "DFID" index %u, expected %u\n",
-				       loghandle2name(loghandle),
-				       PLOGID(&loghandle->lgh_id),
-				       rec->lrh_index, index);
+				LLOG_ERROR_REC(loghandle, rec,
+					       "gap in index, expected %u",
+					       index);
 				index = rec->lrh_index;
 			}
 
@@ -470,7 +488,7 @@ static int llog_process_thread(void *arg)
 				if (rc)
 					goto out;
 			}
-
+next_rec:
 			/* exit if the last index is reached */
 			if (index >= last_index) {
 				rc = 0;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled.
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (8 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way James Simmons
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

struct inet6_dev and struct inet6_ifaddr are not defined if IPv6
is not configured for the Linux kernel.

Fixes: 351a6df78c3 ("lnet: support IPv6 in lnet_inet_enumerate()")
WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 896cd5b7bcf94d4fd ("LU-10391 lnet: fix build issue when IPv6 is disabled.")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48990
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/lnet/config.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index 4b2d776..5bfae4e 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -1501,8 +1501,10 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6)
 		int flags = dev_get_flags(dev);
 		const struct in_ifaddr *ifa;
 		struct in_device *in_dev;
+#if IS_ENABLED(CONFIG_IPV6)
 		struct inet6_dev *in6_dev;
 		const struct inet6_ifaddr *ifa6;
+#endif
 		int node_id;
 		int cpt;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (9 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG James Simmons
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lei Feng, Lustre Development List

From: Lei Feng <flei@whamcloud.com>

Ensure jobid_interpret_string() fills jobid in an atomic way.
Make sure we use the proper length. The Linux native client
got this mostly right.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16251
Lustre-commit: 9a0a89520e8b57bd6 ("LU-16251 obdclass: fill jobid in a safe way")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48915
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/jobid.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c
index da1af51..77ea5b2 100644
--- a/fs/lustre/obdclass/jobid.c
+++ b/fs/lustre/obdclass/jobid.c
@@ -308,7 +308,8 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid,
  */
 int lustre_get_jobid(char *jobid, size_t joblen)
 {
-	char tmp_jobid[LUSTRE_JOBID_SIZE] = "";
+	char id[LUSTRE_JOBID_SIZE] = "";
+	int len = min_t(int, joblen, LUSTRE_JOBID_SIZE);
 
 	if (unlikely(joblen < 2)) {
 		if (joblen == 1)
@@ -324,14 +325,14 @@ int lustre_get_jobid(char *jobid, size_t joblen)
 	if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0 ||
 	    strnstr(obd_jobid_name, "%j", LUSTRE_JOBID_SIZE)) {
 		int rc2 = jobid_interpret_string(obd_jobid_name,
-						 tmp_jobid, joblen);
+						 id, len);
 		if (!rc2)
 			goto out_cache_jobid;
 	}
 
 	/* Use process name + fsuid as jobid */
 	if (strcmp(obd_jobid_var, JOBSTATS_PROCNAME_UID) == 0) {
-		snprintf(tmp_jobid, LUSTRE_JOBID_SIZE, "%s.%u",
+		snprintf(id, LUSTRE_JOBID_SIZE, "%s.%u",
 			 current->comm,
 			 from_kuid(&init_user_ns, current_fsuid()));
 		goto out_cache_jobid;
@@ -343,7 +344,7 @@ int lustre_get_jobid(char *jobid, size_t joblen)
 		rcu_read_lock();
 		jid = jobid_current();
 		if (jid)
-			strlcpy(tmp_jobid, jid, sizeof(tmp_jobid));
+			strlcpy(id, jid, sizeof(id));
 		rcu_read_unlock();
 		goto out_cache_jobid;
 	}
@@ -352,8 +353,8 @@ int lustre_get_jobid(char *jobid, size_t joblen)
 
 out_cache_jobid:
 	/* Only replace the job ID if it changed. */
-	if (strcmp(jobid, tmp_jobid) != 0)
-		strcpy(jobid, tmp_jobid);
+	if (strcmp(jobid, id) != 0)
+		strcpy(jobid, id);
 
 	return 0;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (10 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:16 ` [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink James Simmons
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

to make the corresponding messages single-line

WC-bug-id: https://jira.whamcloud.com/browse/LU-15825
Lustre-commit: 93784852c8f20b27c ("LU-15825 ldlm: remove linefeed from LDLM_DEBUG")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47219
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/namei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 5ac634c..93abec8 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -233,7 +233,7 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel)
 		 */
 		if (lock->l_resource->lr_lvb_inode)
 			LDLM_DEBUG(lock,
-				   "can't take inode for the lock (%sevicted)\n",
+				   "can't take inode for the lock (%sevicted)",
 				   lock->l_resource->lr_lvb_inode->i_state &
 				   I_FREEING ? "" : "not ");
 		return;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (11 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG James Simmons
@ 2022-11-20 14:16 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component James Simmons
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:16 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

The currently LNet selftest ioctl interface has a few issues which
can be resolved using Netlink. The first is the current API using
struct list_head is disliked by the Linux VFS maintainers. While
we technically don't need to use the struct list_head directly
its still confusing and passing pointers from userland to kernel
space is also frowned on.

Second issue that is exposed with debug kernels is that ioctl
handling done with the lstcon_ioctl_handler can easily end up
in a might_sleep state.

The new Netlink work is also needed for the IPv6 support. Update
the session handling to work with large NIDs. Internally use
struct lst_session_id which supports large NIDs instead of
struct lst_sid.

Lastly we have been wanting YAMl handling with LNet selftest
(LU-10975) which comes naturally with this work.

WC-bug-id: https://jira.whamcloud.com/browse/LU-8915
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43298
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
---
 include/uapi/linux/lnet/lnetst.h |  21 ++-
 net/lnet/selftest/conctl.c       | 349 +++++++++++++++++++++++++++++++--------
 net/lnet/selftest/conrpc.c       |  28 +++-
 net/lnet/selftest/console.c      |  81 +++------
 net/lnet/selftest/console.h      |  68 ++++----
 net/lnet/selftest/framework.c    |  43 +++--
 net/lnet/selftest/selftest.h     |  78 +++++++--
 7 files changed, 477 insertions(+), 191 deletions(-)

diff --git a/include/uapi/linux/lnet/lnetst.h b/include/uapi/linux/lnet/lnetst.h
index af0435f1..d04496d 100644
--- a/include/uapi/linux/lnet/lnetst.h
+++ b/include/uapi/linux/lnet/lnetst.h
@@ -84,8 +84,6 @@ struct lst_sid {
 	__s64		ses_stamp;	/* time stamp */
 };					/*** session id */
 
-extern struct lst_sid LST_INVALID_SID;
-
 struct lst_bid {
 	__u64	bat_id;		/* unique id in session */
 };				/*** batch id (group of tests) */
@@ -577,4 +575,23 @@ struct sfw_counters {
 	__u32 ping_errors;
 } __packed;
 
+#define LNET_SELFTEST_GENL_NAME		"lnet_selftest"
+#define LNET_SELFTEST_GENL_VERSION	0x1
+
+/* enum lnet_selftest_commands	      - Supported core LNet Selftest Netlink
+ *					commands
+ *
+ * @LNET_SELFTEST_CMD_UNSPEC:		unspecified command to catch errors
+ * @LNET_SELFTEST_CMD_SESSIONS:		command to manage sessions
+ */
+enum lnet_selftest_commands {
+	LNET_SELFTEST_CMD_UNSPEC	= 0,
+
+	LNET_SELFTEST_CMD_SESSIONS	= 1,
+
+	__LNET_SELFTEST_CMD_MAX_PLUS_ONE,
+};
+
+#define LNET_SELFTEST_CMD_MAX (__LNET_SELFTEST_CMD_MAX_PLUS_ONE - 1)
+
 #endif
diff --git a/net/lnet/selftest/conctl.c b/net/lnet/selftest/conctl.c
index ede7fe5..aa11885 100644
--- a/net/lnet/selftest/conctl.c
+++ b/net/lnet/selftest/conctl.c
@@ -40,67 +40,6 @@
 #include "console.h"
 
 static int
-lst_session_new_ioctl(struct lstio_session_new_args *args)
-{
-	char name[LST_NAME_SIZE + 1];
-	int rc;
-
-	if (!args->lstio_ses_idp ||	/* address for output sid */
-	    !args->lstio_ses_key ||	/* no key is specified */
-	    !args->lstio_ses_namep ||	/* session name */
-	    args->lstio_ses_nmlen <= 0 ||
-	    args->lstio_ses_nmlen > LST_NAME_SIZE)
-		return -EINVAL;
-
-	if (copy_from_user(name, args->lstio_ses_namep,
-			   args->lstio_ses_nmlen)) {
-		return -EFAULT;
-	}
-
-	name[args->lstio_ses_nmlen] = 0;
-
-	rc = lstcon_session_new(name,
-				args->lstio_ses_key,
-				args->lstio_ses_feats,
-				args->lstio_ses_timeout,
-				args->lstio_ses_force,
-				args->lstio_ses_idp);
-
-	return rc;
-}
-
-static int
-lst_session_end_ioctl(struct lstio_session_end_args *args)
-{
-	if (args->lstio_ses_key != console_session.ses_key)
-		return -EACCES;
-
-	return lstcon_session_end();
-}
-
-static int
-lst_session_info_ioctl(struct lstio_session_info_args *args)
-{
-	/* no checking of key */
-
-	if (!args->lstio_ses_idp ||	/* address for output sid */
-	    !args->lstio_ses_keyp ||	/* address for output key */
-	    !args->lstio_ses_featp ||	/* address for output features */
-	    !args->lstio_ses_ndinfo ||	/* address for output ndinfo */
-	    !args->lstio_ses_namep ||	/* address for output name */
-	    args->lstio_ses_nmlen <= 0 ||
-	    args->lstio_ses_nmlen > LST_NAME_SIZE)
-		return -EINVAL;
-
-	return lstcon_session_info(args->lstio_ses_idp,
-				   args->lstio_ses_keyp,
-				   args->lstio_ses_featp,
-				   args->lstio_ses_ndinfo,
-				   args->lstio_ses_namep,
-				   args->lstio_ses_nmlen);
-}
-
-static int
 lst_debug_ioctl(struct lstio_debug_args *args)
 {
 	char name[LST_NAME_SIZE + 1];
@@ -729,13 +668,11 @@ static int lst_test_add_ioctl(struct lstio_test_args *args)
 
 	switch (opc) {
 	case LSTIO_SESSION_NEW:
-		rc = lst_session_new_ioctl((struct lstio_session_new_args *)buf);
-		break;
+		fallthrough;
 	case LSTIO_SESSION_END:
-		rc = lst_session_end_ioctl((struct lstio_session_end_args *)buf);
-		break;
+		fallthrough;
 	case LSTIO_SESSION_INFO:
-		rc = lst_session_info_ioctl((struct lstio_session_info_args *)buf);
+		rc = -EOPNOTSUPP;
 		break;
 	case LSTIO_DEBUG:
 		rc = lst_debug_ioctl((struct lstio_debug_args *)buf);
@@ -797,3 +734,283 @@ static int lst_test_add_ioctl(struct lstio_test_args *args)
 
 	return notifier_from_ioctl_errno(rc);
 }
+
+static struct genl_family lst_family;
+
+static const struct ln_key_list lst_session_keys = {
+	.lkl_maxattr			= LNET_SELFTEST_SESSION_MAX,
+	.lkl_list			= {
+		[LNET_SELFTEST_SESSION_HDR]	= {
+			.lkp_value		= "session",
+			.lkp_key_format		= LNKF_MAPPING,
+			.lkp_data_type		= NLA_NUL_STRING,
+		},
+		[LNET_SELFTEST_SESSION_NAME]	= {
+			.lkp_value		= "name",
+			.lkp_data_type		= NLA_STRING,
+		},
+		[LNET_SELFTEST_SESSION_KEY]	= {
+			.lkp_value		= "key",
+			.lkp_data_type		= NLA_U32,
+		},
+		[LNET_SELFTEST_SESSION_TIMESTAMP] = {
+			.lkp_value		= "timestamp",
+			.lkp_data_type		= NLA_S64,
+		},
+		[LNET_SELFTEST_SESSION_NID]	= {
+			.lkp_value		= "nid",
+			.lkp_data_type		= NLA_STRING,
+		},
+		[LNET_SELFTEST_SESSION_NODE_COUNT] = {
+			.lkp_value		= "nodes",
+			.lkp_data_type		= NLA_U16,
+		},
+	},
+};
+
+static int lst_sessions_show_dump(struct sk_buff *msg,
+				  struct netlink_callback *cb)
+{
+	const struct ln_key_list *all[] = {
+		&lst_session_keys, NULL
+	};
+	struct netlink_ext_ack *extack = cb->extack;
+	int portid = NETLINK_CB(cb->skb).portid;
+	int seq = cb->nlh->nlmsg_seq;
+	unsigned int node_count = 0;
+	struct lstcon_ndlink *ndl;
+	int flag = NLM_F_MULTI;
+	int rc = 0;
+	void *hdr;
+
+	if (console_session.ses_state != LST_SESSION_ACTIVE) {
+		NL_SET_ERR_MSG(extack, "session is not active");
+		rc = -ESRCH;
+		goto out_unlock;
+	}
+
+	list_for_each_entry(ndl, &console_session.ses_ndl_list, ndl_link)
+		node_count++;
+
+	rc = lnet_genl_send_scalar_list(msg, portid, seq, &lst_family,
+					NLM_F_CREATE | NLM_F_MULTI,
+					LNET_SELFTEST_CMD_SESSIONS, all);
+	if (rc < 0) {
+		NL_SET_ERR_MSG(extack, "failed to send key table");
+		goto out_unlock;
+	}
+
+	if (console_session.ses_force)
+		flag |= NLM_F_REPLACE;
+
+	hdr = genlmsg_put(msg, portid, seq, &lst_family, flag,
+			  LNET_SELFTEST_CMD_SESSIONS);
+	if (!hdr) {
+		NL_SET_ERR_MSG(extack, "failed to send values");
+		genlmsg_cancel(msg, hdr);
+		rc = -EMSGSIZE;
+		goto out_unlock;
+	}
+
+	nla_put_string(msg, LNET_SELFTEST_SESSION_NAME,
+		       console_session.ses_name);
+	nla_put_u32(msg, LNET_SELFTEST_SESSION_KEY,
+		    console_session.ses_key);
+	nla_put_u64_64bit(msg, LNET_SELFTEST_SESSION_TIMESTAMP,
+			  console_session.ses_id.ses_stamp,
+			  LNET_SELFTEST_SESSION_PAD);
+	nla_put_string(msg, LNET_SELFTEST_SESSION_NID,
+		       libcfs_nidstr(&console_session.ses_id.ses_nid));
+	nla_put_u16(msg, LNET_SELFTEST_SESSION_NODE_COUNT,
+		    node_count);
+	genlmsg_end(msg, hdr);
+out_unlock:
+	return rc;
+}
+
+static int lst_sessions_cmd(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg = NULL;
+	int rc = 0;
+
+	mutex_lock(&console_session.ses_mutex);
+
+	console_session.ses_laststamp = ktime_get_real_seconds();
+
+	if (console_session.ses_shutdown) {
+		GENL_SET_ERR_MSG(info, "session is shutdown");
+		rc = -ESHUTDOWN;
+		goto out_unlock;
+	}
+
+	if (console_session.ses_expired)
+		lstcon_session_end();
+
+	if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE) &&
+	    console_session.ses_state == LST_SESSION_NONE) {
+		GENL_SET_ERR_MSG(info, "session is not active");
+		rc = -ESRCH;
+		goto out_unlock;
+	}
+
+	memset(&console_session.ses_trans_stat, 0,
+	       sizeof(struct lstcon_trans_stat));
+
+	if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE)) {
+		lstcon_session_end();
+		goto out_unlock;
+	}
+
+	if (info->attrs[LN_SCALAR_ATTR_LIST]) {
+		struct genlmsghdr *gnlh = nlmsg_data(info->nlhdr);
+		const struct ln_key_list *all[] = {
+			&lst_session_keys, NULL
+		};
+		char name[LST_NAME_SIZE];
+		struct nlmsghdr *nlh;
+		struct nlattr *item;
+		bool force = false;
+		s64 timeout = 300;
+		void *hdr;
+		int rem;
+
+		if (info->nlhdr->nlmsg_flags & NLM_F_REPLACE)
+			force = true;
+
+		nla_for_each_nested(item, info->attrs[LN_SCALAR_ATTR_LIST],
+				    rem) {
+			if (nla_type(item) != LN_SCALAR_ATTR_VALUE)
+				continue;
+
+			if (nla_strcmp(item, "name") == 0) {
+				ssize_t len;
+
+				item = nla_next(item, &rem);
+				if (nla_type(item) != LN_SCALAR_ATTR_VALUE) {
+					rc = -EINVAL;
+					goto err_conf;
+				}
+
+				len = nla_strlcpy(name, item, sizeof(name));
+				if (len < 0)
+					rc = len;
+			} else if (nla_strcmp(item, "timeout") == 0) {
+				item = nla_next(item, &rem);
+				if (nla_type(item) !=
+				    LN_SCALAR_ATTR_INT_VALUE) {
+					rc = -EINVAL;
+					goto err_conf;
+				}
+
+				timeout = nla_get_s64(item);
+				if (timeout < 0)
+					rc = -ERANGE;
+			}
+			if (rc < 0) {
+err_conf:
+				GENL_SET_ERR_MSG(info,
+						 "failed to get config");
+				goto out_unlock;
+			}
+		}
+
+		rc = lstcon_session_new(name, info->nlhdr->nlmsg_pid,
+					gnlh->version, timeout,
+					force);
+		if (rc < 0) {
+			GENL_SET_ERR_MSG(info, "new session creation failed");
+			lstcon_session_end();
+			goto out_unlock;
+		}
+
+		msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+		if (!msg) {
+			GENL_SET_ERR_MSG(info, "msg allocation failed");
+			rc = -ENOMEM;
+			goto out_unlock;
+		}
+
+		rc = lnet_genl_send_scalar_list(msg, info->snd_portid,
+						info->snd_seq, &lst_family,
+						NLM_F_CREATE | NLM_F_MULTI,
+						LNET_SELFTEST_CMD_SESSIONS,
+						all);
+		if (rc < 0) {
+			GENL_SET_ERR_MSG(info, "failed to send key table");
+			goto out_unlock;
+		}
+
+		hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+				  &lst_family, NLM_F_MULTI,
+				  LNET_SELFTEST_CMD_SESSIONS);
+		if (!hdr) {
+			GENL_SET_ERR_MSG(info, "failed to send values");
+			genlmsg_cancel(msg, hdr);
+			rc = -EMSGSIZE;
+			goto out_unlock;
+		}
+
+		nla_put_string(msg, LNET_SELFTEST_SESSION_NAME,
+			       console_session.ses_name);
+		nla_put_u32(msg, LNET_SELFTEST_SESSION_KEY,
+			    console_session.ses_key);
+		nla_put_u64_64bit(msg, LNET_SELFTEST_SESSION_TIMESTAMP,
+				  console_session.ses_id.ses_stamp,
+				  LNET_SELFTEST_SESSION_PAD);
+		nla_put_string(msg, LNET_SELFTEST_SESSION_NID,
+			       libcfs_nidstr(&console_session.ses_id.ses_nid));
+		nla_put_u16(msg, LNET_SELFTEST_SESSION_NODE_COUNT, 0);
+
+		genlmsg_end(msg, hdr);
+
+		nlh = nlmsg_put(msg, info->snd_portid, info->snd_seq,
+				NLMSG_DONE, 0, NLM_F_MULTI);
+		if (!nlh) {
+			GENL_SET_ERR_MSG(info, "failed to complete message");
+			genlmsg_cancel(msg, hdr);
+			rc = -ENOMEM;
+			goto out_unlock;
+		}
+		rc = genlmsg_reply(msg, info);
+		if (rc)
+			GENL_SET_ERR_MSG(info, "failed to send reply");
+	}
+out_unlock:
+	if (rc < 0 && msg)
+		nlmsg_free(msg);
+	mutex_unlock(&console_session.ses_mutex);
+	return rc;
+}
+
+static const struct genl_multicast_group lst_mcast_grps[] = {
+	{ .name = "sessions",		},
+};
+
+static const struct genl_ops lst_genl_ops[] = {
+	{
+		.cmd		= LNET_SELFTEST_CMD_SESSIONS,
+		.dumpit		= lst_sessions_show_dump,
+		.doit		= lst_sessions_cmd,
+	},
+};
+
+static struct genl_family lst_family = {
+	.name		= LNET_SELFTEST_GENL_NAME,
+	.version	= LNET_SELFTEST_GENL_VERSION,
+	.maxattr	= LN_SCALAR_MAX,
+	.module		= THIS_MODULE,
+	.ops		= lst_genl_ops,
+	.n_ops		= ARRAY_SIZE(lst_genl_ops),
+	.mcgrps		= lst_mcast_grps,
+	.n_mcgrps	= ARRAY_SIZE(lst_mcast_grps),
+};
+
+int lstcon_init_netlink(void)
+{
+	return genl_register_family(&lst_family);
+}
+
+void lstcon_fini_netlink(void)
+{
+	genl_unregister_family(&lst_family);
+}
diff --git a/net/lnet/selftest/conrpc.c b/net/lnet/selftest/conrpc.c
index 0170219..8096c46 100644
--- a/net/lnet/selftest/conrpc.c
+++ b/net/lnet/selftest/conrpc.c
@@ -602,7 +602,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 			return rc;
 
 		msrq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.mksn_reqst;
-		msrq->mksn_sid = console_session.ses_id;
+		msrq->mksn_sid.ses_stamp = console_session.ses_id.ses_stamp;
+		msrq->mksn_sid.ses_nid =
+			lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 		msrq->mksn_force = console_session.ses_force;
 		strlcpy(msrq->mksn_name, console_session.ses_name,
 			sizeof(msrq->mksn_name));
@@ -615,7 +617,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 			return rc;
 
 		rsrq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.rmsn_reqst;
-		rsrq->rmsn_sid = console_session.ses_id;
+		rsrq->rmsn_sid.ses_stamp = console_session.ses_id.ses_stamp;
+		rsrq->rmsn_sid.ses_nid =
+			lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 		break;
 
 	default:
@@ -638,7 +642,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 
 	drq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.dbg_reqst;
 
-	drq->dbg_sid = console_session.ses_id;
+	drq->dbg_sid.ses_stamp = console_session.ses_id.ses_stamp;
+	drq->dbg_sid.ses_nid =
+		lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 	drq->dbg_flags = 0;
 
 	return rc;
@@ -658,7 +664,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 
 	brq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.bat_reqst;
 
-	brq->bar_sid = console_session.ses_id;
+	brq->bar_sid.ses_stamp = console_session.ses_id.ses_stamp;
+	brq->bar_sid.ses_nid =
+		lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 	brq->bar_bid = tsb->tsb_id;
 	brq->bar_testidx = tsb->tsb_index;
 	brq->bar_opc = transop == LST_TRANS_TSBRUN ? SRPC_BATCH_OPC_RUN :
@@ -690,7 +698,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 
 	srq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.stat_reqst;
 
-	srq->str_sid = console_session.ses_id;
+	srq->str_sid.ses_stamp = console_session.ses_id.ses_stamp;
+	srq->str_sid.ses_nid =
+		lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 	srq->str_type = 0; /* XXX remove it */
 
 	return 0;
@@ -877,7 +887,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 		trq->tsr_loop = test->tes_loop;
 	}
 
-	trq->tsr_sid = console_session.ses_id;
+	trq->tsr_sid.ses_stamp = console_session.ses_id.ses_stamp;
+	trq->tsr_sid.ses_nid =
+		lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 	trq->tsr_bid = test->tes_hdr.tsb_id;
 	trq->tsr_concur = test->tes_concur;
 	trq->tsr_is_client = (transop == LST_TRANS_TSBCLIADD) ? 1 : 0;
@@ -1259,7 +1271,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *,
 
 		drq = &crpc->crp_rpc->crpc_reqstmsg.msg_body.dbg_reqst;
 
-		drq->dbg_sid = console_session.ses_id;
+		drq->dbg_sid.ses_stamp = console_session.ses_id.ses_stamp;
+		drq->dbg_sid.ses_nid =
+			lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 		drq->dbg_flags = 0;
 
 		lstcon_rpc_trans_addreq(trans, crpc);
diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c
index 85e9300..1ed6191 100644
--- a/net/lnet/selftest/console.c
+++ b/net/lnet/selftest/console.c
@@ -1679,27 +1679,32 @@ static void lstcon_group_ndlink_release(struct lstcon_group *,
 }
 
 int
-lstcon_session_match(struct lst_sid sid)
+lstcon_session_match(struct lst_sid id)
 {
-	return (console_session.ses_id.ses_nid == sid.ses_nid &&
-		console_session.ses_id.ses_stamp == sid.ses_stamp) ? 1 : 0;
+	struct lst_session_id sid;
+
+	sid.ses_stamp = id.ses_stamp;
+	lnet_nid4_to_nid(id.ses_nid, &sid.ses_nid);
+
+	return (nid_same(&console_session.ses_id.ses_nid, &sid.ses_nid) &&
+		console_session.ses_id.ses_stamp == sid.ses_stamp) ?  1 : 0;
 }
 
 static void
-lstcon_new_session_id(struct lst_sid *sid)
+lstcon_new_session_id(struct lst_session_id *sid)
 {
 	struct lnet_processid id;
 
 	LASSERT(console_session.ses_state == LST_SESSION_NONE);
 
 	LNetGetId(1, &id);
-	sid->ses_nid = lnet_nid_to_nid4(&id.nid);
+	sid->ses_nid = id.nid;
 	sid->ses_stamp = div_u64(ktime_get_ns(), NSEC_PER_MSEC);
 }
 
 int
 lstcon_session_new(char *name, int key, unsigned int feats,
-		   int timeout, int force, struct lst_sid __user *sid_up)
+		   int timeout, int force)
 {
 	int rc = 0;
 	int i;
@@ -1731,7 +1736,6 @@ static void lstcon_group_ndlink_release(struct lstcon_group *,
 	lstcon_new_session_id(&console_session.ses_id);
 
 	console_session.ses_key = key;
-	console_session.ses_state = LST_SESSION_ACTIVE;
 	console_session.ses_force = !!force;
 	console_session.ses_features = feats;
 	console_session.ses_feats_updated = 0;
@@ -1757,52 +1761,12 @@ static void lstcon_group_ndlink_release(struct lstcon_group *,
 		return rc;
 	}
 
-	if (!copy_to_user(sid_up, &console_session.ses_id,
-			  sizeof(struct lst_sid)))
-		return rc;
-
-	lstcon_session_end();
-
-	return -EFAULT;
-}
-
-int
-lstcon_session_info(struct lst_sid __user *sid_up, int __user *key_up,
-		    unsigned __user *featp,
-		    struct lstcon_ndlist_ent __user *ndinfo_up,
-		    char __user *name_up, int len)
-{
-	struct lstcon_ndlist_ent *entp;
-	struct lstcon_ndlink *ndl;
-	int rc = 0;
-
-	if (console_session.ses_state != LST_SESSION_ACTIVE)
-		return -ESRCH;
-
-	entp = kzalloc(sizeof(*entp), GFP_NOFS);
-	if (!entp)
-		return -ENOMEM;
-
-	list_for_each_entry(ndl, &console_session.ses_ndl_list, ndl_link)
-		LST_NODE_STATE_COUNTER(ndl->ndl_node, entp);
-
-	if (copy_to_user(sid_up, &console_session.ses_id,
-			 sizeof(*sid_up)) ||
-	    copy_to_user(key_up, &console_session.ses_key,
-			 sizeof(*key_up)) ||
-	    copy_to_user(featp, &console_session.ses_features,
-			 sizeof(*featp)) ||
-	    copy_to_user(ndinfo_up, entp, sizeof(*entp)) ||
-	    copy_to_user(name_up, console_session.ses_name, len))
-		rc = -EFAULT;
-
-	kfree(entp);
+	console_session.ses_state = LST_SESSION_ACTIVE;
 
 	return rc;
 }
 
-int
-lstcon_session_end(void)
+int lstcon_session_end(void)
 {
 	struct lstcon_rpc_trans *trans;
 	struct lstcon_group *grp;
@@ -1907,9 +1871,10 @@ static void lstcon_group_ndlink_release(struct lstcon_group *,
 
 	mutex_lock(&console_session.ses_mutex);
 
-	jrep->join_sid = console_session.ses_id;
+	jrep->join_sid.ses_stamp = console_session.ses_id.ses_stamp;
+	jrep->join_sid.ses_nid = lnet_nid_to_nid4(&console_session.ses_id.ses_nid);
 
-	if (console_session.ses_id.ses_nid == LNET_NID_ANY) {
+	if (LNET_NID_IS_ANY(&console_session.ses_id.ses_nid)) {
 		jrep->join_status = ESRCH;
 		goto out;
 	}
@@ -2041,14 +2006,21 @@ static void lstcon_init_acceptor_service(void)
 		goto out;
 	}
 
+	rc = lstcon_init_netlink();
+	if (rc < 0)
+		goto out;
+
 	rc = blocking_notifier_chain_register(&libcfs_ioctl_list,
 					      &lstcon_ioctl_handler);
 
-	if (!rc) {
-		lstcon_rpc_module_init();
-		return 0;
+	if (rc < 0) {
+		lstcon_fini_netlink();
+		goto out;
 	}
 
+	lstcon_rpc_module_init();
+	return 0;
+
 out:
 	srpc_shutdown_service(&lstcon_acceptor_service);
 	srpc_remove_service(&lstcon_acceptor_service);
@@ -2067,6 +2039,7 @@ static void lstcon_init_acceptor_service(void)
 
 	blocking_notifier_chain_unregister(&libcfs_ioctl_list,
 					   &lstcon_ioctl_handler);
+	lstcon_fini_netlink();
 
 	mutex_lock(&console_session.ses_mutex);
 
diff --git a/net/lnet/selftest/console.h b/net/lnet/selftest/console.h
index 93aa515..dd416dc 100644
--- a/net/lnet/selftest/console.h
+++ b/net/lnet/selftest/console.h
@@ -136,36 +136,34 @@ struct lstcon_test {
 #define LST_CONSOLE_TIMEOUT	300	/* default console timeout */
 
 struct lstcon_session {
-	struct mutex	   ses_mutex;		/* only 1 thread in session */
-	struct lst_sid	   ses_id;		/* global session id */
-	int		   ses_key;		/* local session key */
-	int		   ses_state;		/* state of session */
-	int		   ses_timeout;		/* timeout in seconds */
-	time64_t	   ses_laststamp;	/* last operation stamp (secs) */
-	unsigned int	   ses_features;	/* tests features of the session */
-	unsigned int	   ses_feats_updated:1; /* features are synced with
-						 * remote test nodes
-						 */
-	unsigned int	   ses_force:1;		/* force creating */
-	unsigned int	   ses_shutdown:1;	/* session is shutting down */
-	unsigned int	   ses_expired:1;	/* console is timedout */
-	u64		   ses_id_cookie;	/* batch id cookie */
-	char		   ses_name[LST_NAME_SIZE];/* session name */
-	struct lstcon_rpc_trans
-			   *ses_ping;		/* session pinger */
-	struct stt_timer   ses_ping_timer;	/* timer for pinger */
-	struct lstcon_trans_stat
-			   ses_trans_stat;	/* transaction stats */
-
-	struct list_head   ses_trans_list;	/* global list of transaction */
-	struct list_head   ses_grp_list;	/* global list of groups */
-	struct list_head   ses_bat_list;	/* global list of batches */
-	struct list_head   ses_ndl_list;	/* global list of nodes */
-	struct list_head   *ses_ndl_hash;	/* hash table of nodes */
-
-	spinlock_t	   ses_rpc_lock;	/* serialize */
-	atomic_t	   ses_rpc_counter;	/* # of initialized RPCs */
-	struct list_head   ses_rpc_freelist;	/* idle console rpc */
+	struct mutex		ses_mutex;	/* only 1 thread in session */
+	struct lst_session_id	ses_id;		/* global session id */
+	u32			ses_key;	/* local session key */
+	int			ses_state;	/* state of session */
+	int			ses_timeout;	/* timeout in seconds */
+	time64_t		ses_laststamp;	/* last operation stamp (secs) */
+	unsigned int		ses_features;	/* tests features of the session */
+	unsigned int		ses_feats_updated:1; /* features are synced with
+						      * remote test nodes
+						      */
+	unsigned int		ses_force:1;	/* force creating */
+	unsigned int		ses_shutdown:1;	/* session is shutting down */
+	unsigned int		ses_expired:1;	/* console is timedout */
+	u64			ses_id_cookie;	/* batch id cookie */
+	char			ses_name[LST_NAME_SIZE];/* session name */
+	struct lstcon_rpc_trans *ses_ping;	/* session pinger */
+	struct stt_timer	ses_ping_timer;	/* timer for pinger */
+	struct lstcon_trans_stat ses_trans_stat;/* transaction stats */
+
+	struct list_head	ses_trans_list;	/* global list of transaction */
+	struct list_head	ses_grp_list;	/* global list of groups */
+	struct list_head	ses_bat_list;	/* global list of batches */
+	struct list_head	ses_ndl_list;	/* global list of nodes */
+	struct list_head	*ses_ndl_hash;	/* hash table of nodes */
+
+	spinlock_t		ses_rpc_lock;	/* serialize */
+	atomic_t		ses_rpc_counter;/* # of initialized RPCs */
+	struct list_head	ses_rpc_freelist;/* idle console rpc */
 }; /* session descriptor */
 
 extern struct lstcon_session	console_session;
@@ -186,14 +184,16 @@ struct lstcon_session {
 
 int lstcon_ioctl_entry(struct notifier_block *nb,
 		       unsigned long cmd, void *vdata);
+
+int lstcon_init_netlink(void);
+void lstcon_fini_netlink(void);
+
 int lstcon_console_init(void);
 int lstcon_console_fini(void);
+
 int lstcon_session_match(struct lst_sid sid);
 int lstcon_session_new(char *name, int key, unsigned int version,
-		       int timeout, int flags, struct lst_sid __user *sid_up);
-int lstcon_session_info(struct lst_sid __user *sid_up, int __user *key,
-			unsigned __user *verp, struct lstcon_ndlist_ent __user *entp,
-			char __user *name_up, int len);
+		       int timeout, int flags);
 int lstcon_session_end(void);
 int lstcon_session_debug(int timeout, struct list_head __user *result_up);
 int lstcon_session_feats_check(unsigned int feats);
diff --git a/net/lnet/selftest/framework.c b/net/lnet/selftest/framework.c
index e84904e..0dd0421 100644
--- a/net/lnet/selftest/framework.c
+++ b/net/lnet/selftest/framework.c
@@ -39,7 +39,7 @@
 
 #include "selftest.h"
 
-struct lst_sid LST_INVALID_SID = { .ses_nid = LNET_NID_ANY, .ses_stamp = -1 };
+struct lst_session_id LST_INVALID_SID = { .ses_nid = LNET_ANY_NID, .ses_stamp = -1};
 
 static int session_timeout = 100;
 module_param(session_timeout, int, 0444);
@@ -244,7 +244,7 @@
 	LASSERT(sn == sfw_data.fw_session);
 
 	CWARN("Session expired! sid: %s-%llu, name: %s\n",
-	      libcfs_nid2str(sn->sn_id.ses_nid),
+	      libcfs_nidstr(&sn->sn_id.ses_nid),
 	      sn->sn_id.ses_stamp, &sn->sn_name[0]);
 
 	sn->sn_timer_active = 0;
@@ -268,7 +268,8 @@
 	strlcpy(&sn->sn_name[0], name, sizeof(sn->sn_name));
 
 	sn->sn_timer_active = 0;
-	sn->sn_id = sid;
+	sn->sn_id.ses_stamp = sid.ses_stamp;
+	lnet_nid4_to_nid(sid.ses_nid, &sn->sn_id.ses_nid);
 	sn->sn_features = features;
 	sn->sn_timeout = session_timeout;
 	sn->sn_started = ktime_get();
@@ -357,6 +358,18 @@
 	return bat;
 }
 
+static struct lst_sid get_old_sid(struct sfw_session *sn)
+{
+	struct lst_sid sid = { .ses_nid = LNET_NID_ANY, .ses_stamp = -1 };
+
+	if (sn) {
+		sid.ses_stamp = sn->sn_id.ses_stamp;
+		sid.ses_nid = lnet_nid_to_nid4(&sn->sn_id.ses_nid);
+	}
+
+	return sid;
+}
+
 static int
 sfw_get_stats(struct srpc_stat_reqst *request, struct srpc_stat_reply *reply)
 {
@@ -364,7 +377,7 @@
 	struct sfw_counters *cnt = &reply->str_fw;
 	struct sfw_batch *bat;
 
-	reply->str_sid = !sn ? LST_INVALID_SID : sn->sn_id;
+	reply->str_sid = get_old_sid(sn);
 
 	if (request->str_sid.ses_nid == LNET_NID_ANY) {
 		reply->str_status = EINVAL;
@@ -407,14 +420,14 @@
 	int cplen = 0;
 
 	if (request->mksn_sid.ses_nid == LNET_NID_ANY) {
-		reply->mksn_sid = !sn ? LST_INVALID_SID : sn->sn_id;
+		reply->mksn_sid = get_old_sid(sn);
 		reply->mksn_status = EINVAL;
 		return 0;
 	}
 
 	if (sn) {
 		reply->mksn_status = 0;
-		reply->mksn_sid = sn->sn_id;
+		reply->mksn_sid = get_old_sid(sn);
 		reply->mksn_timeout = sn->sn_timeout;
 
 		if (sfw_sid_equal(request->mksn_sid, sn->sn_id)) {
@@ -464,7 +477,7 @@
 	spin_unlock(&sfw_data.fw_lock);
 
 	reply->mksn_status = 0;
-	reply->mksn_sid = sn->sn_id;
+	reply->mksn_sid = get_old_sid(sn);
 	reply->mksn_timeout = sn->sn_timeout;
 	return 0;
 }
@@ -475,7 +488,7 @@
 {
 	struct sfw_session *sn = sfw_data.fw_session;
 
-	reply->rmsn_sid = !sn ? LST_INVALID_SID : sn->sn_id;
+	reply->rmsn_sid = get_old_sid(sn);
 
 	if (request->rmsn_sid.ses_nid == LNET_NID_ANY) {
 		reply->rmsn_status = EINVAL;
@@ -497,7 +510,7 @@
 	spin_unlock(&sfw_data.fw_lock);
 
 	reply->rmsn_status = 0;
-	reply->rmsn_sid = LST_INVALID_SID;
+	reply->rmsn_sid = get_old_sid(NULL);
 	LASSERT(!sfw_data.fw_session);
 	return 0;
 }
@@ -510,12 +523,12 @@
 
 	if (!sn) {
 		reply->dbg_status = ESRCH;
-		reply->dbg_sid = LST_INVALID_SID;
+		reply->dbg_sid = get_old_sid(NULL);
 		return 0;
 	}
 
 	reply->dbg_status = 0;
-	reply->dbg_sid = sn->sn_id;
+	reply->dbg_sid = get_old_sid(sn);
 	reply->dbg_timeout = sn->sn_timeout;
 	if (strlcpy(reply->dbg_name, &sn->sn_name[0], sizeof(reply->dbg_name))
 	    >= sizeof(reply->dbg_name))
@@ -1119,7 +1132,7 @@
 	struct sfw_batch *bat;
 
 	request = &rpc->srpc_reqstbuf->buf_msg.msg_body.tes_reqst;
-	reply->tsr_sid = !sn ? LST_INVALID_SID : sn->sn_id;
+	reply->tsr_sid = get_old_sid(sn);
 
 	if (!request->tsr_loop ||
 	    !request->tsr_concur ||
@@ -1187,7 +1200,7 @@
 	int rc = 0;
 	struct sfw_batch *bat;
 
-	reply->bar_sid = !sn ? LST_INVALID_SID : sn->sn_id;
+	reply->bar_sid = get_old_sid(sn);
 
 	if (!sn || !sfw_sid_equal(request->bar_sid, sn->sn_id)) {
 		reply->bar_status = ESRCH;
@@ -1266,7 +1279,9 @@
 			CNETERR("Features of framework RPC don't match features of current session: %x/%x\n",
 				request->msg_ses_feats, sn->sn_features);
 			reply->msg_body.reply.status = EPROTO;
-			reply->msg_body.reply.sid = sn->sn_id;
+			reply->msg_body.reply.sid.ses_stamp = sn->sn_id.ses_stamp;
+			reply->msg_body.reply.sid.ses_nid =
+				lnet_nid_to_nid4(&sn->sn_id.ses_nid);
 			goto out;
 		}
 
diff --git a/net/lnet/selftest/selftest.h b/net/lnet/selftest/selftest.h
index 223a432..5bffe73 100644
--- a/net/lnet/selftest/selftest.h
+++ b/net/lnet/selftest/selftest.h
@@ -49,6 +49,39 @@
 #define MADE_WITHOUT_COMPROMISE
 #endif
 
+/* enum lnet_selftest_session_attrs   - LNet selftest session Netlink
+ *					attributes
+ *
+ * @LNET_SELFTEST_SESSION_UNSPEC:	unspecified attribute to catch errors
+ * @LNET_SELFTEST_SESSION_PAD:		padding for 64-bit attributes, ignore
+ *
+ * @LENT_SELFTEST_SESSION_HDR:		Netlink group this data is for
+ *					(NLA_NUL_STRING)
+ * @LNET_SELFTEST_SESSION_NAME:	name of this session (NLA_STRING)
+ * @LNET_SELFTEST_SESSION_KEY:		key used to represent the session
+ *					(NLA_U32)
+ * @LNET_SELFTEST_SESSION_TIMESTAMP:	timestamp when the session was created
+ *					(NLA_S64)
+ * @LNET_SELFTEST_SESSION_NID:		NID of the node selftest ran on
+ *					(NLA_STRING)
+ * @LNET_SELFTEST_SESSION_NODE_COUNT:	Number of nodes in use (NLA_U16)
+ */
+enum lnet_selftest_session_attrs {
+	LNET_SELFTEST_SESSION_UNSPEC = 0,
+	LNET_SELFTEST_SESSION_PAD = LNET_SELFTEST_SESSION_UNSPEC,
+
+	LNET_SELFTEST_SESSION_HDR,
+	LNET_SELFTEST_SESSION_NAME,
+	LNET_SELFTEST_SESSION_KEY,
+	LNET_SELFTEST_SESSION_TIMESTAMP,
+	LNET_SELFTEST_SESSION_NID,
+	LNET_SELFTEST_SESSION_NODE_COUNT,
+
+	__LNET_SELFTEST_SESSION_MAX_PLUS_ONE,
+};
+
+#define LNET_SELFTEST_SESSION_MAX	(__LNET_SELFTEST_SESSION_MAX_PLUS_ONE - 1)
+
 #define SWI_STATE_NEWBORN		0
 #define SWI_STATE_REPLY_SUBMITTED	1
 #define SWI_STATE_REPLY_SENT		2
@@ -318,23 +351,40 @@ struct srpc_service {
 	int (*sv_bulk_ready)(struct srpc_server_rpc *, int);
 };
 
+struct lst_session_id {
+	s64			ses_stamp;	/* time stamp in milliseconds */
+	struct lnet_nid		ses_nid;	/* nid of console node */
+};						/*** session id (large addr) */
+
+extern struct lst_session_id LST_INVALID_SID;
+
 struct sfw_session {
-	struct list_head sn_list;    /* chain on fw_zombie_sessions */
-	struct lst_sid	 sn_id;      /* unique identifier */
-	unsigned int	 sn_timeout; /* # seconds' inactivity to expire */
-	int		 sn_timer_active;
-	unsigned int	 sn_features;
-	struct stt_timer      sn_timer;
-	struct list_head sn_batches; /* list of batches */
-	char		 sn_name[LST_NAME_SIZE];
-	atomic_t	 sn_refcount;
-	atomic_t	 sn_brw_errors;
-	atomic_t	 sn_ping_errors;
-	ktime_t		 sn_started;
+	/* chain on fw_zombie_sessions */
+	struct list_head	sn_list;
+	struct lst_session_id	sn_id;		/* unique identifier */
+	/* # seconds' inactivity to expire */
+	unsigned int		sn_timeout;
+	int			sn_timer_active;
+	unsigned int		sn_features;
+	struct stt_timer	sn_timer;
+	struct list_head	sn_batches; /* list of batches */
+	char			sn_name[LST_NAME_SIZE];
+	atomic_t		sn_refcount;
+	atomic_t		sn_brw_errors;
+	atomic_t		sn_ping_errors;
+	ktime_t			sn_started;
 };
 
-#define sfw_sid_equal(sid0, sid1)     ((sid0).ses_nid == (sid1).ses_nid && \
-				       (sid0).ses_stamp == (sid1).ses_stamp)
+static inline int sfw_sid_equal(struct lst_sid sid0,
+				struct lst_session_id sid1)
+{
+	struct lnet_nid ses_nid;
+
+	lnet_nid4_to_nid(sid0.ses_nid, &ses_nid);
+
+	return ((sid0.ses_stamp == sid1.ses_stamp) &&
+		nid_same(&ses_nid, &sid1.ses_nid));
+}
 
 struct sfw_batch {
 	struct list_head bat_list;	/* chain on sn_batches */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (12 preceding siblings ...)
  2022-11-20 14:16 ` [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply James Simmons
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vitaly Fertman, Lustre Development List

From: Vitaly Fertman <vitaly.fertman@hpe.com>

should return an error, but it fails now with a BUG below
because @rc of lov_io_layout_at() is not checked for < 0

BUG: unable to handle kernel paging request at ffff99d3c2f74030
    Call Trace:
      lov_stripe_number+0x19/0x40 [lov]
      lov_page_init_composite+0x103/0x5f0 [lov]
      ? kmem_cache_alloc+0x12e/0x270
      cl_page_alloc+0x19f/0x660 [obdclass]
      cl_page_find+0x1a0/0x250 [obdclass]
      ll_write_begin+0x1f7/0xfb0 [lustre]

HPE-bug-id: LUS-11075
WC-bug-id: https://jira.whamcloud.com/browse/LU-16281
Lustre-commit: 8fdeca3b6faf22c72 ("LU-16281 clio: append to non-existent component")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/161123
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48994
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_page.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c
index a22b71f..6e28e62 100644
--- a/fs/lustre/lov/lov_page.c
+++ b/fs/lustre/lov/lov_page.c
@@ -84,6 +84,8 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj,
 		suboff = lio->lis_cached_suboff + offset - lio->lis_cached_off;
 	} else {
 		entry = lov_io_layout_at(lio, offset);
+		if (entry < 0)
+			return -ENODATA;
 
 		stripe = lov_stripe_number(loo->lo_lsm, entry, offset);
 		rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (13 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix James Simmons
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

The message in lnet_discovery_event_reply currently says
"Peer X has discovery disabled" even though the same path
may be taken if discovery is disabled locally.
Change the debug message to indicate whether discovery is
disabled on the peer side or locally.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16282
Lustre-commit: 9f45a79e983c11def ("LU-16282 lnet: fix debug message in lnet_discovery_event_reply")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48997
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 52ad791..35b135e 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2592,6 +2592,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 	struct lnet_ping_buffer *pbuf;
 	int infobytes;
 	int rc;
+	bool ping_feat_disc;
 
 	spin_lock(&lp->lp_lock);
 
@@ -2629,14 +2630,15 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 		goto out;
 	}
 
-	/*
-	 * The peer may have discovery disabled at its end. Set
+	/* The peer may have discovery disabled at its end. Set
 	 * NO_DISCOVERY as appropriate.
 	 */
-	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) &&
-	    lnet_peer_discovery_disabled) {
-		CDEBUG(D_NET, "Peer %s has discovery enabled\n",
-		       libcfs_nidstr(&lp->lp_primary_nid));
+	ping_feat_disc = pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY;
+	if (!ping_feat_disc || lnet_peer_discovery_disabled) {
+		CDEBUG(D_NET, "Peer %s has discovery %s, local discovery %s\n",
+		       libcfs_nidstr(&lp->lp_primary_nid),
+		       ping_feat_disc ? "enabled" : "disabled",
+		       lnet_peer_discovery_disabled ? "disabled" : "enabled");
 
 		/* Detect whether this peer has toggled discovery from on to
 		 * off and whether we can delete and re-create the peer. Peers
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (14 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure James Simmons
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Vitaly Fertman, Lustre Development List

From: Vitaly Fertman <vitaly.fertman@hpe.com>

The original LU-9964 fix had a problem because with many pages in
memory grouplock unlock takes 10+ seconds just to discard them.

The current patch makes grouplock unlock thread to be not atomic, but
makes a new grouplock enqueue to wait until previous CBPENDING lock
gets destroyed.

HPE-bug-id: LUS-10644

WC-bug-id: https://jira.whamcloud.com/browse/LU-16046
Lustre-commit: 3dc261c06434eceee ("LU-16046 ldlm: group lock unlock fix")
Lustre-commit: 62fd8f9b498ae3d16 ("Revert "LU-16046 revert: "LU-9964 llite: prevent mulitple group locks"")
Lustre-commit: dd609c6f31adeadab ("Revert "LU-16046 ldlm: group lock fix")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/161411
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49008
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h   |   1 +
 fs/lustre/include/lustre_osc.h   |  15 ----
 fs/lustre/ldlm/ldlm_lock.c       |  28 ++++++-
 fs/lustre/llite/file.c           |  76 ++++++++++++-------
 fs/lustre/llite/llite_internal.h |   3 +
 fs/lustre/llite/llite_lib.c      |   3 +
 fs/lustre/mdc/mdc_dev.c          |  58 ++++-----------
 fs/lustre/osc/osc_lock.c         | 157 ++-------------------------------------
 fs/lustre/osc/osc_object.c       |  16 ----
 fs/lustre/osc/osc_request.c      |  14 ++--
 10 files changed, 110 insertions(+), 261 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 6053e01..d08c48f 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -855,6 +855,7 @@ enum ldlm_match_flags {
 	LDLM_MATCH_AST		= BIT(1),
 	LDLM_MATCH_AST_ANY	= BIT(2),
 	LDLM_MATCH_RIGHT	= BIT(3),
+	LDLM_MATCH_GROUP	= BIT(4),
 };
 
 /**
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index a0f1afc..d15f46b 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -319,11 +319,6 @@ struct osc_object {
 
 	const struct osc_object_operations *oo_obj_ops;
 	bool			oo_initialized;
-
-	wait_queue_head_t	oo_group_waitq;
-	struct mutex		oo_group_mutex;
-	u64			oo_group_users;
-	unsigned long		oo_group_gid;
 };
 
 static inline void osc_build_res_name(struct osc_object *osc,
@@ -660,16 +655,6 @@ int osc_object_glimpse(const struct lu_env *env, const struct cl_object *obj,
 int osc_object_find_cbdata(const struct lu_env *env, struct cl_object *obj,
 			   ldlm_iterator_t iter, void *data);
 int osc_object_prune(const struct lu_env *env, struct cl_object *obj);
-void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock);
-void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock);
-int osc_grouplock_enqueue_init(const struct lu_env *env,
-			       struct osc_object *obj,
-			       struct osc_lock *oscl,
-			       struct lustre_handle *lh);
-void osc_grouplock_enqueue_fini(const struct lu_env *env,
-				struct osc_object *obj,
-				struct osc_lock *oscl,
-				struct lustre_handle *lh);
 
 /* osc_request.c */
 void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd);
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 39ab2a0..8659aa5 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -324,6 +324,7 @@ static int ldlm_lock_destroy_internal(struct ldlm_lock *lock)
 		return 0;
 	}
 	ldlm_set_destroyed(lock);
+	wake_up(&lock->l_waitq);
 
 	ldlm_lock_remove_from_lru(lock);
 	class_handle_unhash(&lock->l_handle);
@@ -1067,10 +1068,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 	 * can still happen.
 	 */
 	if (ldlm_is_cbpending(lock) &&
-	    !(data->lmd_flags & LDLM_FL_CBPENDING))
+	    !(data->lmd_flags & LDLM_FL_CBPENDING) &&
+	    !(data->lmd_match & LDLM_MATCH_GROUP))
 		return false;
 
-	if (!(data->lmd_match & LDLM_MATCH_UNREF) && ldlm_is_cbpending(lock) &&
+	if (!(data->lmd_match & (LDLM_MATCH_UNREF | LDLM_MATCH_GROUP)) &&
+	    ldlm_is_cbpending(lock) &&
 	    !lock->l_readers && !lock->l_writers)
 		return false;
 
@@ -1136,7 +1139,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata)
 		return false;
 
 matched:
-	if (data->lmd_flags & LDLM_FL_TEST_LOCK) {
+	/**
+	 * In case the lock is a CBPENDING grouplock, just pin it and return,
+	 * we need to wait until it gets to DESTROYED.
+	 */
+	if ((data->lmd_flags & LDLM_FL_TEST_LOCK) ||
+	    (ldlm_is_cbpending(lock) && (data->lmd_match & LDLM_MATCH_GROUP))) {
 		LDLM_LOCK_GET(lock);
 		ldlm_lock_touch_in_lru(lock);
 	} else {
@@ -1296,6 +1304,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 	};
 	struct ldlm_resource *res;
 	struct ldlm_lock *lock;
+	struct ldlm_lock *group_lock;
 	int matched;
 
 	if (!ns) {
@@ -1314,6 +1323,8 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 		return 0;
 	}
 
+repeat:
+	group_lock = NULL;
 	LDLM_RESOURCE_ADDREF(res);
 	lock_res(res);
 	if (res->lr_type == LDLM_EXTENT)
@@ -1323,8 +1334,19 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns,
 	if (!lock && !(flags & LDLM_FL_BLOCK_GRANTED))
 		lock = search_queue(&res->lr_waiting, &data);
 	matched = lock ? mode : 0;
+
+	if (lock && ldlm_is_cbpending(lock) &&
+	    (data.lmd_match & LDLM_MATCH_GROUP))
+		group_lock = lock;
 	unlock_res(res);
 	LDLM_RESOURCE_DELREF(res);
+
+	if (group_lock) {
+		l_wait_event_abortable(group_lock->l_waitq,
+				       ldlm_is_destroyed(lock));
+		LDLM_LOCK_RELEASE(lock);
+		goto repeat;
+	}
 	ldlm_resource_putref(res);
 
 	if (lock) {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 34a449e..dac829f 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2522,15 +2522,30 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 	if (ll_file_nolock(file))
 		return -EOPNOTSUPP;
 
-	read_lock(&lli->lli_lock);
+retry:
+	if (file->f_flags & O_NONBLOCK) {
+		if (!mutex_trylock(&lli->lli_group_mutex))
+			return -EAGAIN;
+	} else
+		mutex_lock(&lli->lli_group_mutex);
+
 	if (fd->fd_flags & LL_FILE_GROUP_LOCKED) {
 		CWARN("group lock already existed with gid %lu\n",
 		      fd->fd_grouplock.lg_gid);
-		read_unlock(&lli->lli_lock);
-		return -EINVAL;
+		rc = -EINVAL;
+		goto out;
+	}
+	if (arg != lli->lli_group_gid && lli->lli_group_users != 0) {
+		if (file->f_flags & O_NONBLOCK) {
+			rc = -EAGAIN;
+			goto out;
+		}
+		mutex_unlock(&lli->lli_group_mutex);
+		wait_var_event(&lli->lli_group_users, !lli->lli_group_users);
+		rc = 0;
+		goto retry;
 	}
 	LASSERT(!fd->fd_grouplock.lg_lock);
-	read_unlock(&lli->lli_lock);
 
 	/**
 	 * XXX: group lock needs to protect all OST objects while PFL
@@ -2549,8 +2564,10 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 		u16 refcheck;
 
 		env = cl_env_get(&refcheck);
-		if (IS_ERR(env))
-			return PTR_ERR(env);
+		if (IS_ERR(env)) {
+			rc = PTR_ERR(env);
+			goto out;
+		}
 
 		rc = cl_object_layout_get(env, obj, &cl);
 		if (rc >= 0 && cl.cl_is_composite)
@@ -2559,28 +2576,26 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 
 		cl_env_put(env, &refcheck);
 		if (rc < 0)
-			return rc;
+			goto out;
 	}
 
 	rc = cl_get_grouplock(ll_i2info(inode)->lli_clob,
 			      arg, (file->f_flags & O_NONBLOCK), &grouplock);
-	if (rc)
-		return rc;
 
-	write_lock(&lli->lli_lock);
-	if (fd->fd_flags & LL_FILE_GROUP_LOCKED) {
-		write_unlock(&lli->lli_lock);
-		CERROR("another thread just won the race\n");
-		cl_put_grouplock(&grouplock);
-		return -EINVAL;
-	}
+	if (rc)
+		goto out;
 
 	fd->fd_flags |= LL_FILE_GROUP_LOCKED;
 	fd->fd_grouplock = grouplock;
-	write_unlock(&lli->lli_lock);
+	if (lli->lli_group_users == 0)
+		lli->lli_group_gid = grouplock.lg_gid;
+	lli->lli_group_users++;
 
 	CDEBUG(D_INFO, "group lock %lu obtained\n", arg);
-	return 0;
+out:
+	mutex_unlock(&lli->lli_group_mutex);
+
+	return rc;
 }
 
 static int ll_put_grouplock(struct inode *inode, struct file *file,
@@ -2589,31 +2604,40 @@ static int ll_put_grouplock(struct inode *inode, struct file *file,
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct ll_file_data *fd = file->private_data;
 	struct ll_grouplock grouplock;
+	int rc;
 
-	write_lock(&lli->lli_lock);
+	mutex_lock(&lli->lli_group_mutex);
 	if (!(fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
-		write_unlock(&lli->lli_lock);
 		CWARN("no group lock held\n");
-		return -EINVAL;
+		rc = -EINVAL;
+		goto out;
 	}
-
 	LASSERT(fd->fd_grouplock.lg_lock);
 
 	if (fd->fd_grouplock.lg_gid != arg) {
 		CWARN("group lock %lu doesn't match current id %lu\n",
 		      arg, fd->fd_grouplock.lg_gid);
-		write_unlock(&lli->lli_lock);
-		return -EINVAL;
+		rc = -EINVAL;
+		goto out;
 	}
 
 	grouplock = fd->fd_grouplock;
 	memset(&fd->fd_grouplock, 0, sizeof(fd->fd_grouplock));
 	fd->fd_flags &= ~LL_FILE_GROUP_LOCKED;
-	write_unlock(&lli->lli_lock);
 
 	cl_put_grouplock(&grouplock);
+
+	lli->lli_group_users--;
+	if (lli->lli_group_users == 0) {
+		lli->lli_group_gid = 0;
+		wake_up_var(&lli->lli_group_users);
+	}
 	CDEBUG(D_INFO, "group lock %lu released\n", arg);
-	return 0;
+	rc = 0;
+out:
+	mutex_unlock(&lli->lli_group_mutex);
+
+	return rc;
 }
 
 /**
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index d245dd8..998eed8 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -253,6 +253,9 @@ struct ll_inode_info {
 			u64				lli_pcc_generation;
 			enum pcc_dataset_flags		lli_pcc_dsflags;
 			struct pcc_inode		*lli_pcc_inode;
+			struct mutex			lli_group_mutex;
+			u64				lli_group_users;
+			unsigned long			lli_group_gid;
 
 			u64				lli_attr_valid;
 			u64				lli_lazysize;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 3dc0030..176e61b5 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1194,6 +1194,9 @@ void ll_lli_init(struct ll_inode_info *lli)
 		lli->lli_pcc_inode = NULL;
 		lli->lli_pcc_dsflags = PCC_DATASET_INVALID;
 		lli->lli_pcc_generation = 0;
+		mutex_init(&lli->lli_group_mutex);
+		lli->lli_group_users = 0;
+		lli->lli_group_gid = 0;
 	}
 	mutex_init(&lli->lli_layout_mutex);
 	memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid));
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 978fee3..e0f5b45 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -330,7 +330,6 @@ static int mdc_dlm_canceling(const struct lu_env *env,
 	 */
 	if (obj) {
 		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
-		void *data;
 
 		/* Destroy pages covered by the extent of the DLM lock */
 		result = mdc_lock_flush(env, cl2osc(obj), cl_index(obj, 0),
@@ -340,17 +339,12 @@ static int mdc_dlm_canceling(const struct lu_env *env,
 		 */
 		/* losing a lock, update kms */
 		lock_res_and_lock(dlmlock);
-		data = dlmlock->l_ast_data;
 		dlmlock->l_ast_data = NULL;
 		cl_object_attr_lock(obj);
 		attr->cat_kms = 0;
 		cl_object_attr_update(env, obj, attr, CAT_KMS);
 		cl_object_attr_unlock(obj);
 		unlock_res_and_lock(dlmlock);
-
-		/* Skip dec in case mdc_object_ast_clear() did it */
-		if (data && dlmlock->l_req_mode == LCK_GROUP)
-			osc_grouplock_dec(cl2osc(obj), dlmlock);
 		cl_object_put(env, obj);
 	}
 	return result;
@@ -457,7 +451,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc,
 }
 
 static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
-			     struct lustre_handle *lockh, int errcode)
+			     struct lustre_handle *lockh)
 {
 	struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj);
 	struct ldlm_lock *dlmlock;
@@ -510,9 +504,6 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 	LASSERT(oscl->ols_state != OLS_GRANTED);
 	oscl->ols_state = OLS_GRANTED;
-
-	if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP)
-		osc_grouplock_inc_locked(osc, dlmlock);
 }
 
 /**
@@ -544,7 +535,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 
 	CDEBUG(D_INODE, "rc %d, err %d\n", rc, errcode);
 	if (rc == 0)
-		mdc_lock_granted(env, oscl, lockh, errcode);
+		mdc_lock_granted(env, oscl, lockh);
 
 	/* Error handling, some errors are tolerable. */
 	if (oscl->ols_glimpse && rc == -ENAVAIL) {
@@ -706,7 +697,8 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 	struct ldlm_intent *lit;
 	enum ldlm_mode mode;
 	bool glimpse = *flags & LDLM_FL_HAS_INTENT;
-	u64 match_flags = *flags;
+	u64 search_flags = *flags;
+	u64 match_flags = 0;
 	LIST_HEAD(cancels);
 	int rc, count;
 	int lvb_size;
@@ -716,11 +708,14 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
 	if (einfo->ei_mode == LCK_PR)
 		mode |= LCK_PW;
 
-	match_flags |= LDLM_FL_LVB_READY;
+	search_flags |= LDLM_FL_LVB_READY;
 	if (glimpse)
-		match_flags |= LDLM_FL_BLOCK_GRANTED;
-	mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id,
-			       einfo->ei_type, policy, mode, &lockh);
+		search_flags |= LDLM_FL_BLOCK_GRANTED;
+	if (mode == LCK_GROUP)
+		match_flags = LDLM_MATCH_GROUP;
+	mode = ldlm_lock_match_with_skip(obd->obd_namespace, search_flags, 0,
+					 res_id, einfo->ei_type, policy, mode,
+					 &lockh, match_flags);
 	if (mode) {
 		struct ldlm_lock *matched;
 
@@ -833,9 +828,9 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
  *
  * This function does not wait for the network communication to complete.
  */
-static int __mdc_lock_enqueue(const struct lu_env *env,
-			      const struct cl_lock_slice *slice,
-			      struct cl_io *unused, struct cl_sync_io *anchor)
+static int mdc_lock_enqueue(const struct lu_env *env,
+			    const struct cl_lock_slice *slice,
+			    struct cl_io *unused, struct cl_sync_io *anchor)
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct osc_io *oio = osc_env_io(env);
@@ -921,28 +916,6 @@ static int __mdc_lock_enqueue(const struct lu_env *env,
 	return result;
 }
 
-static int mdc_lock_enqueue(const struct lu_env *env,
-			    const struct cl_lock_slice *slice,
-			    struct cl_io *unused, struct cl_sync_io *anchor)
-{
-	struct osc_object *obj = cl2osc(slice->cls_obj);
-	struct osc_lock	*oscl = cl2osc_lock(slice);
-	struct lustre_handle lh = { 0 };
-	int rc;
-
-	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) {
-		rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh);
-		if (rc < 0)
-			return rc;
-	}
-
-	rc = __mdc_lock_enqueue(env, slice, unused, anchor);
-
-	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP)
-		osc_grouplock_enqueue_fini(env, obj, oscl, &lh);
-	return rc;
-}
-
 static const struct cl_lock_operations mdc_lock_lockless_ops = {
 	.clo_fini	= osc_lock_fini,
 	.clo_enqueue	= mdc_lock_enqueue,
@@ -1468,9 +1441,6 @@ static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data)
 		memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb));
 		cl_object_attr_unlock(&osc->oo_cl);
 		ldlm_clear_lvb_cached(lock);
-
-		if (lock->l_req_mode == LCK_GROUP)
-			osc_grouplock_dec(osc, lock);
 	}
 	return LDLM_ITER_CONTINUE;
 }
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index a3e72a6..3b22688 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -198,7 +198,7 @@ void osc_lock_lvb_update(const struct lu_env *env,
 }
 
 static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
-			     struct lustre_handle *lockh, int errcode)
+			     struct lustre_handle *lockh)
 {
 	struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj);
 	struct ldlm_lock *dlmlock;
@@ -254,126 +254,7 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 	LASSERT(oscl->ols_state != OLS_GRANTED);
 	oscl->ols_state = OLS_GRANTED;
-
-	if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP)
-		osc_grouplock_inc_locked(osc, dlmlock);
-}
-
-void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock)
-{
-	LASSERT(lock->l_req_mode == LCK_GROUP);
-
-	if (osc->oo_group_users == 0)
-		osc->oo_group_gid = lock->l_policy_data.l_extent.gid;
-	osc->oo_group_users++;
-
-	LDLM_DEBUG(lock, "users %llu gid %llu\n",
-		   osc->oo_group_users,
-		   lock->l_policy_data.l_extent.gid);
-}
-EXPORT_SYMBOL(osc_grouplock_inc_locked);
-
-void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock)
-{
-	LASSERT(lock->l_req_mode == LCK_GROUP);
-
-	mutex_lock(&osc->oo_group_mutex);
-
-	LASSERT(osc->oo_group_users > 0);
-	osc->oo_group_users--;
-	if (osc->oo_group_users == 0) {
-		osc->oo_group_gid = 0;
-		wake_up_all(&osc->oo_group_waitq);
-	}
-	mutex_unlock(&osc->oo_group_mutex);
-
-	LDLM_DEBUG(lock, "users %llu gid %lu\n",
-		   osc->oo_group_users, osc->oo_group_gid);
 }
-EXPORT_SYMBOL(osc_grouplock_dec);
-
-int osc_grouplock_enqueue_init(const struct lu_env *env,
-			       struct osc_object *obj,
-			       struct osc_lock *oscl,
-			       struct lustre_handle *lh)
-{
-	struct cl_lock_descr *need = &oscl->ols_cl.cls_lock->cll_descr;
-	int rc = 0;
-
-	LASSERT(need->cld_mode == CLM_GROUP);
-
-	while (true) {
-		bool check_gid = true;
-
-		if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT) {
-			if (!mutex_trylock(&obj->oo_group_mutex))
-				return -EAGAIN;
-		} else {
-			mutex_lock(&obj->oo_group_mutex);
-		}
-
-		/**
-		 * If a grouplock of the same gid already exists, match it
-		 * here in advance. Otherwise, if that lock is being cancelled
-		 * there is a chance to get 2 grouplocks for the same file.
-		 */
-		if (obj->oo_group_users &&
-		    obj->oo_group_gid == need->cld_gid) {
-			struct osc_thread_info *info = osc_env_info(env);
-			struct ldlm_res_id *resname = &info->oti_resname;
-			union ldlm_policy_data *policy = &info->oti_policy;
-			struct cl_lock *lock = oscl->ols_cl.cls_lock;
-			u64 flags = oscl->ols_flags | LDLM_FL_BLOCK_GRANTED;
-			struct ldlm_namespace *ns;
-			enum ldlm_mode mode;
-
-			ns = osc_export(obj)->exp_obd->obd_namespace;
-			ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname);
-			osc_lock_build_policy(env, lock, policy);
-			mode = ldlm_lock_match(ns, flags, resname,
-					       oscl->ols_einfo.ei_type, policy,
-					       oscl->ols_einfo.ei_mode, lh);
-			if (mode)
-				oscl->ols_flags |= LDLM_FL_MATCH_LOCK;
-			else
-				check_gid = false;
-		}
-
-		/**
-		 * If a grouplock exists but cannot be matched, let it to flush
-		 * and wait just for zero users for now.
-		 */
-		if (obj->oo_group_users == 0 ||
-		    (check_gid && obj->oo_group_gid == need->cld_gid))
-			break;
-
-		mutex_unlock(&obj->oo_group_mutex);
-		if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT)
-			return -EAGAIN;
-
-		rc = l_wait_event_abortable(obj->oo_group_waitq,
-					    !obj->oo_group_users);
-		if (rc)
-			return rc;
-	}
-
-	return 0;
-}
-EXPORT_SYMBOL(osc_grouplock_enqueue_init);
-
-void osc_grouplock_enqueue_fini(const struct lu_env *env,
-				struct osc_object *obj,
-				struct osc_lock *oscl,
-				struct lustre_handle *lh)
-{
-	LASSERT(oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP);
-
-	/* If a user was added on enqueue_init, decref it */
-	if (lustre_handle_is_used(lh))
-		ldlm_lock_decref(lh, oscl->ols_einfo.ei_mode);
-	mutex_unlock(&obj->oo_group_mutex);
-}
-EXPORT_SYMBOL(osc_grouplock_enqueue_fini);
 
 /**
  * Lock upcall function that is executed either when a reply to ENQUEUE rpc is
@@ -403,7 +284,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 	}
 
 	if (rc == 0)
-		osc_lock_granted(env, oscl, lockh, errcode);
+		osc_lock_granted(env, oscl, lockh);
 
 	/* Error handling, some errors are tolerable. */
 	if (oscl->ols_glimpse && rc == -ENAVAIL) {
@@ -540,7 +421,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		struct ldlm_extent *extent = &dlmlock->l_policy_data.l_extent;
 		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
 		u64 old_kms;
-		void *data;
 
 		/* Destroy pages covered by the extent of the DLM lock */
 		result = osc_lock_flush(cl2osc(obj),
@@ -553,7 +433,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		/* clearing l_ast_data after flushing data,
 		 * to let glimpse ast find the lock and the object
 		 */
-		data = dlmlock->l_ast_data;
 		dlmlock->l_ast_data = NULL;
 		cl_object_attr_lock(obj);
 		/* Must get the value under the lock to avoid race. */
@@ -567,9 +446,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		cl_object_attr_unlock(obj);
 		unlock_res_and_lock(dlmlock);
 
-		/* Skip dec in case osc_object_ast_clear() did it */
-		if (data && dlmlock->l_req_mode == LCK_GROUP)
-			osc_grouplock_dec(cl2osc(obj), dlmlock);
 		cl_object_put(env, obj);
 	}
 	return result;
@@ -1055,9 +931,9 @@ int osc_lock_enqueue_wait(const struct lu_env *env, struct osc_object *obj,
  *
  * This function does not wait for the network communication to complete.
  */
-static int __osc_lock_enqueue(const struct lu_env *env,
-			      const struct cl_lock_slice *slice,
-			      struct cl_io *unused, struct cl_sync_io *anchor)
+static int osc_lock_enqueue(const struct lu_env *env,
+			    const struct cl_lock_slice *slice,
+			    struct cl_io *unused, struct cl_sync_io *anchor)
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct osc_io *oio = osc_env_io(env);
@@ -1177,29 +1053,6 @@ static int __osc_lock_enqueue(const struct lu_env *env,
 	return result;
 }
 
-static int osc_lock_enqueue(const struct lu_env *env,
-			    const struct cl_lock_slice *slice,
-			    struct cl_io *unused, struct cl_sync_io *anchor)
-{
-	struct osc_object *obj = cl2osc(slice->cls_obj);
-	struct osc_lock	*oscl = cl2osc_lock(slice);
-	struct lustre_handle lh = { 0 };
-	int rc;
-
-	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) {
-		rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh);
-		if (rc < 0)
-			return rc;
-	}
-
-	rc = __osc_lock_enqueue(env, slice, unused, anchor);
-
-	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP)
-		osc_grouplock_enqueue_fini(env, obj, oscl, &lh);
-
-	return rc;
-}
-
 /**
  * Breaks a link between osc_lock and dlm_lock.
  */
diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c
index c3667a3..efb0533 100644
--- a/fs/lustre/osc/osc_object.c
+++ b/fs/lustre/osc/osc_object.c
@@ -74,10 +74,6 @@ int osc_object_init(const struct lu_env *env, struct lu_object *obj,
 
 	atomic_set(&osc->oo_nr_ios, 0);
 	init_waitqueue_head(&osc->oo_io_waitq);
-	init_waitqueue_head(&osc->oo_group_waitq);
-	mutex_init(&osc->oo_group_mutex);
-	osc->oo_group_users = 0;
-	osc->oo_group_gid = 0;
 
 	osc->oo_root.rb_node = NULL;
 	INIT_LIST_HEAD(&osc->oo_hp_exts);
@@ -117,7 +113,6 @@ void osc_object_free(const struct lu_env *env, struct lu_object *obj)
 	LASSERT(atomic_read(&osc->oo_nr_writes) == 0);
 	LASSERT(list_empty(&osc->oo_ol_list));
 	LASSERT(!atomic_read(&osc->oo_nr_ios));
-	LASSERT(!osc->oo_group_users);
 
 	lu_object_fini(obj);
 	/* osc doen't contain an lu_object_header, so we don't need call_rcu */
@@ -230,17 +225,6 @@ static int osc_object_ast_clear(struct ldlm_lock *lock, void *data)
 		memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb));
 		cl_object_attr_unlock(&osc->oo_cl);
 		ldlm_clear_lvb_cached(lock);
-
-		/**
-		 * Object is being destroyed and gets unlinked from the lock,
-		 * IO is finished and no cached data is left under the lock. As
-		 * grouplock is immediately marked CBPENDING it is not reused.
-		 * It will also be not possible to flush data later due to a
-		 * NULL l_ast_data - enough conditions to let new grouplocks to
-		 * be enqueued even if the lock still exists on client.
-		 */
-		if (lock->l_req_mode == LCK_GROUP)
-			osc_grouplock_dec(osc, lock);
 	}
 	return LDLM_ITER_CONTINUE;
 }
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 7577fad..5a3f418 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -3009,7 +3009,8 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	struct lustre_handle lockh = { 0 };
 	struct ptlrpc_request *req = NULL;
 	int intent = *flags & LDLM_FL_HAS_INTENT;
-	u64 match_flags = *flags;
+	u64 search_flags = *flags;
+	u64 match_flags = 0;
 	enum ldlm_mode mode;
 	int rc;
 
@@ -3040,11 +3041,14 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id,
 	 * because they will not actually use the lock.
 	 */
 	if (!speculative)
-		match_flags |= LDLM_FL_LVB_READY;
+		search_flags |= LDLM_FL_LVB_READY;
 	if (intent != 0)
-		match_flags |= LDLM_FL_BLOCK_GRANTED;
-	mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id,
-			       einfo->ei_type, policy, mode, &lockh);
+		search_flags |= LDLM_FL_BLOCK_GRANTED;
+	if (mode == LCK_GROUP)
+		match_flags = LDLM_MATCH_GROUP;
+	mode = ldlm_lock_match_with_skip(obd->obd_namespace, search_flags, 0,
+					 res_id, einfo->ei_type, policy, mode,
+					 &lockh, match_flags);
 	if (mode) {
 		struct ldlm_lock *matched;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (15 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info() James Simmons
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Call complete() on the ping_data::completion if we get
LNET_EVENT_SEND with non-zero status. Otherwise the thread which
issued the ping is stuck waiting for the full ping timeout.

A pd_unlinked member is added to struct ping_data to indicate whether
the associated MD has been unlinked. This is checked by lnet_ping() to
determine whether it needs to explicitly called LNetMDUnlink().

Lastly, in cases where we do not receive a reply, we now return the
value of pd.rc, if it is non-zero, rather than -EIO. This can provide
more information about the underlying ping failure.

HPE-bug-id: LUS-11317
WC-bug-id: https://jira.whamcloud.com/browse/LU-16290
Lustre-commit: 48c34c71de65e8a25 ("LU-16290 lnet: Signal completion on ping send failure")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49020
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 935c848..8b53adf 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -5333,6 +5333,7 @@ void LNetDebugPeer(struct lnet_processid *id)
 struct ping_data {
 	int rc;
 	int replied;
+	int pd_unlinked;
 	struct lnet_handle_md mdh;
 	struct completion completion;
 };
@@ -5353,7 +5354,12 @@ struct ping_data {
 		pd->replied = 1;
 		pd->rc = event->mlength;
 	}
+
 	if (event->unlinked)
+		pd->pd_unlinked = 1;
+
+	if (event->unlinked ||
+	    (event->type == LNET_EVENT_SEND && event->status))
 		complete(&pd->completion);
 }
 
@@ -5424,13 +5430,14 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		/* NB must wait for the UNLINK event below... */
 	}
 
-	if (wait_for_completion_timeout(&pd.completion, timeout) == 0) {
-		/* Ensure completion in finite time... */
+	/* Ensure completion in finite time... */
+	wait_for_completion_timeout(&pd.completion, timeout);
+	if (!pd.pd_unlinked) {
 		LNetMDUnlink(pd.mdh);
 		wait_for_completion(&pd.completion);
 	}
 	if (!pd.replied) {
-		rc = -EIO;
+		rc = pd.rc ?: -EIO;
 		goto fail_ping_buffer_decref;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info()
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (16 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer James Simmons
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_is_nid_in_ping_info() now checks the ping_info for both
nid4 and larger nids.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 56bcfbf22d91b96c3 ("LU-10391 lnet: extend lnet_is_nid_in_ping_info()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44629
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  9 ++++++
 net/lnet/lnet/peer.c          | 71 +++++++++++++++++++++++++++++++++++++------
 2 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 13ce2bf..7ce6cff 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -886,6 +886,15 @@ static inline void lnet_ping_buffer_decref(struct lnet_ping_buffer *pbuf)
 	}
 }
 
+struct lnet_ping_iter {
+	struct lnet_ping_info	*pinfo;
+	void			*pos, *end;
+};
+
+u32 *ping_iter_first(struct lnet_ping_iter *pi, struct lnet_ping_buffer *pbuf,
+		     struct lnet_nid *nid);
+u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid);
+
 static inline int lnet_push_target_resize_needed(void)
 {
 	return the_lnet.ln_push_target->pb_nbytes < the_lnet.ln_push_target_nbytes;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 35b135e..b33d6ac 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2875,6 +2875,56 @@ static void lnet_discovery_event_handler(struct lnet_event *event)
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
+u32 *ping_iter_first(struct lnet_ping_iter *pi,
+		     struct lnet_ping_buffer *pbuf,
+		     struct lnet_nid *nid)
+{
+	pi->pinfo = &pbuf->pb_info;
+	pi->pos = &pbuf->pb_info.pi_ni;
+	pi->end = (void *)pi->pinfo +
+		  min_t(int, pbuf->pb_nbytes,
+			lnet_ping_info_size(pi->pinfo));
+	/* lnet_ping_info_validiate ensures there will be one
+	 * lnet_ni_status at the start
+	 */
+	if (nid)
+		lnet_nid4_to_nid(pbuf->pb_info.pi_ni[0].ns_nid, nid);
+	return &pbuf->pb_info.pi_ni[0].ns_status;
+}
+
+u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid)
+{
+	int off = offsetof(struct lnet_ping_info, pi_ni[pi->pinfo->pi_nnis]);
+
+	if (pi->pos < ((void *)pi->pinfo + off)) {
+		struct lnet_ni_status *ns = pi->pos;
+
+		pi->pos = ns + 1;
+		if (pi->pos > pi->end)
+			return NULL;
+		if (nid)
+			lnet_nid4_to_nid(ns->ns_nid, nid);
+		return &ns->ns_status;
+	}
+
+	while (pi->pinfo->pi_features & LNET_PING_FEAT_LARGE_ADDR) {
+		struct lnet_ni_large_status *lns = pi->pos;
+
+		if (pi->pos + 8 > pi->end)
+			/* Not safe to examine next */
+			return NULL;
+		pi->pos = lnet_ping_sts_next(lns);
+		if (pi->pos > pi->end)
+			return NULL;
+		if (NID_BYTES(&lns->ns_nid) > sizeof(struct lnet_nid))
+			continue;
+		if (nid)
+			*nid = lns->ns_nid;
+		return &lns->ns_status;
+	}
+	return NULL;
+}
+
 /*
  * Build a peer from incoming data.
  *
@@ -3140,16 +3190,18 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	return 0;
 }
 
-static bool lnet_is_nid_in_ping_info(lnet_nid_t nid,
-				     struct lnet_ping_info *pinfo)
+static bool lnet_is_nid_in_ping_info(struct lnet_nid *nid,
+				     struct lnet_ping_buffer *pbuf)
 {
-	int i;
-
-	for (i = 0; i < pinfo->pi_nnis; i++) {
-		if (pinfo->pi_ni[i].ns_nid == nid)
+	struct lnet_ping_iter pi;
+	struct lnet_nid pnid;
+	u32 *st;
+
+	for (st = ping_iter_first(&pi, pbuf, &pnid);
+	     st;
+	     st = ping_iter_next(&pi, &pnid))
+		if (nid_same(nid, &pnid))
 			return true;
-	}
-
 	return false;
 }
 
@@ -3308,8 +3360,7 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 	 * recorded in that peer.
 	 */
 	} else if (nid_same(&lp->lp_primary_nid, &nid) ||
-		   (lnet_is_nid_in_ping_info(lnet_nid_to_nid4(&lp->lp_primary_nid),
-					     &pbuf->pb_info) &&
+		   (lnet_is_nid_in_ping_info(&lp->lp_primary_nid, pbuf) &&
 		    lnet_is_discovery_disabled(lp))) {
 		rc = lnet_peer_merge_data(lp, pbuf);
 	} else {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (17 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info() James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid James Simmons
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

If the peer has a large-address for the primary, it can now be found.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 022b46d887603f703 ("LU-10391 lnet: find correct primary for peer")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44632
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index b33d6ac..a1305b6 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2585,11 +2585,40 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 	       libcfs_nidstr(&lp->lp_primary_nid), ev->status);
 }
 
+static bool find_primary(struct lnet_nid *nid,
+			 struct lnet_ping_buffer *pbuf)
+{
+	struct lnet_ping_info *pi = &pbuf->pb_info;
+	struct lnet_ping_iter piter;
+	u32 *stp;
+
+	if (pi->pi_features & LNET_PING_FEAT_PRIMARY_LARGE) {
+		/* First large nid is primary */
+		for (stp = ping_iter_first(&piter, pbuf, nid);
+		     stp;
+		     stp = ping_iter_next(&piter, nid)) {
+			if (nid_is_nid4(nid))
+				continue;
+			/* nid has already been copied in */
+			return true;
+		}
+		/* no large nids ... weird ... ignore the flag
+		 * and use first nid.
+		 */
+	}
+	/* pi_nids[1] is primary */
+	if (pi->pi_nnis < 2)
+		return false;
+	lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, nid);
+	return true;
+}
+
 /* Handle a Reply message. This is the reply to a Ping message. */
 static void
 lnet_discovery_event_reply(struct lnet_peer *lp, struct lnet_event *ev)
 {
 	struct lnet_ping_buffer *pbuf;
+	struct lnet_nid primary;
 	int infobytes;
 	int rc;
 	bool ping_feat_disc;
@@ -2731,9 +2760,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
 	 * available if the reply came from a Multi-Rail peer.
 	 */
 	if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL &&
-	    pbuf->pb_info.pi_nnis > 1 &&
-	    lnet_nid_to_nid4(&lp->lp_primary_nid) ==
-	    pbuf->pb_info.pi_ni[1].ns_nid) {
+	    find_primary(&primary, pbuf) &&
+	    nid_same(&lp->lp_primary_nid, &primary)) {
 		if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno)
 			CDEBUG(D_NET,
 			       "peer %s: seq# got %u have %u. peer rebooted?\n",
@@ -3081,11 +3109,11 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	 * peer's lp_peer_nets list, and the peer NI for the primary NID should
 	 * be the first entry in its peer net's lpn_peer_nis list.
 	 */
-	lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
+	find_primary(&nid, pbuf);
 	lpni = lnet_peer_ni_find_locked(&nid);
 	if (!lpni) {
 		CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n",
-		       libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid));
+		       libcfs_nidstr(&nid));
 		goto out;
 	}
 
@@ -3341,11 +3369,10 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 	 * primary NID to the correct value here. Moreover, this peer
 	 * can show up with only the loopback NID in the ping buffer.
 	 */
-	if (pbuf->pb_info.pi_nnis <= 1) {
+	if (!find_primary(&nid, pbuf)) {
 		lnet_ping_buffer_decref(pbuf);
 		goto out;
 	}
-	lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
 	if (nid_is_lo0(&lp->lp_primary_nid)) {
 		rc = lnet_peer_set_primary_nid(lp, &nid, flags);
 		if (rc)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (18 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*() James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid James Simmons
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_notify() now takes a 'struct lnet_nid *' instead of a
lnet_nid_t.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 4a88236f40a47c05d ("LU-10391 lnet: change lnet_notify() to take struct lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44633
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  4 ++--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 10 +++++++---
 net/lnet/klnds/socklnd/socklnd.c    |  2 +-
 net/lnet/lnet/api-ni.c              |  3 ++-
 net/lnet/lnet/router.c              | 15 +++++++--------
 5 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 7ce6cff..3bcea11 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -574,8 +574,8 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid,
 
 void lnet_mt_event_handler(struct lnet_event *event);
 
-int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, bool alive, bool reset,
-		time64_t when);
+int lnet_notify(struct lnet_ni *ni, struct lnet_nid *peer, bool alive,
+		bool reset, time64_t when);
 void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive,
 			time64_t when);
 int lnet_add_route(u32 net, u32 hops, struct lnet_nid *gateway,
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index d4de326..451363b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1967,9 +1967,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
 
-	if (error)
-		lnet_notify(peer_ni->ibp_ni,
-			    peer_ni->ibp_nid, false, false, last_alive);
+	if (error != 0) {
+		struct lnet_nid nid;
+
+		lnet_nid4_to_nid(peer_ni->ibp_nid, &nid);
+		lnet_notify(peer_ni->ibp_ni, &nid,
+			    false, false, last_alive);
+	}
 }
 
 void
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 21fccfa..d8d1071 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -1424,7 +1424,7 @@ struct ksock_peer_ni *
 
 	if (notify)
 		lnet_notify(peer_ni->ksnp_ni,
-			    lnet_nid_to_nid4(&peer_ni->ksnp_id.nid),
+			    &peer_ni->ksnp_id.nid,
 			    false, false, last_alive);
 }
 
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 8b53adf..5be2aff 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4372,7 +4372,8 @@ u32 lnet_get_dlc_seq_locked(void)
 		 * that deadline to the wall clock.
 		 */
 		deadline += ktime_get_seconds();
-		return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, false,
+		lnet_nid4_to_nid(data->ioc_nid, &nid);
+		return lnet_notify(NULL, &nid, data->ioc_flags, false,
 				   deadline);
 	}
 
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ee4f1d8..358c3f1 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1672,26 +1672,25 @@ bool lnet_router_checker_active(void)
  * when: notificaiton time.
  */
 int
-lnet_notify(struct lnet_ni *ni, lnet_nid_t nid4, bool alive, bool reset,
+lnet_notify(struct lnet_ni *ni, struct lnet_nid *nid, bool alive, bool reset,
 	    time64_t when)
 {
 	struct lnet_peer_ni *lpni = NULL;
 	struct lnet_route *route;
 	struct lnet_peer *lp;
 	time64_t now = ktime_get_seconds();
-	struct lnet_nid nid;
 	int cpt;
 
 	LASSERT(!in_interrupt());
 
 	CDEBUG(D_NET, "%s notifying %s: %s\n",
 	       !ni ? "userspace" : libcfs_nidstr(&ni->ni_nid),
-	       libcfs_nidstr(&nid), alive ? "up" : "down");
+	       libcfs_nidstr(nid), alive ? "up" : "down");
 
 	if (ni &&
-	    LNET_NID_NET(&ni->ni_nid) != LNET_NID_NET(&nid)) {
+	    LNET_NID_NET(&ni->ni_nid) != LNET_NID_NET(nid)) {
 		CWARN("Ignoring notification of %s %s by %s (different net)\n",
-		      libcfs_nidstr(&nid), alive ? "birth" : "death",
+		      libcfs_nidstr(nid), alive ? "birth" : "death",
 		      libcfs_nidstr(&ni->ni_nid));
 		return -EINVAL;
 	}
@@ -1700,7 +1699,7 @@ bool lnet_router_checker_active(void)
 	if (when > now) {
 		CWARN("Ignoring prediction from %s of %s %s %lld seconds in the future\n",
 		      ni ? libcfs_nidstr(&ni->ni_nid) : "userspace",
-		      libcfs_nidstr(&nid), alive ? "up" : "down", when - now);
+		      libcfs_nidstr(nid), alive ? "up" : "down", when - now);
 		return -EINVAL;
 	}
 
@@ -1718,11 +1717,11 @@ bool lnet_router_checker_active(void)
 		return -ESHUTDOWN;
 	}
 
-	lpni = lnet_peer_ni_find_locked(&nid);
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (!lpni) {
 		/* nid not found */
 		lnet_net_unlock(0);
-		CDEBUG(D_NET, "%s not found\n", libcfs_nidstr(&nid));
+		CDEBUG(D_NET, "%s not found\n", libcfs_nidstr(nid));
 		return 0;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*()
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (19 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid James Simmons
@ 2022-11-20 14:17 ` James Simmons
  2022-11-20 14:17 ` [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid James Simmons
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

These 'struct lnet_ni' lookup functions which take a nid4, are
discarded in favour of the versions which take a 'struct lnet_nid'.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: cbfbe6d132c6d0fe5 ("LU-10391 lnet: discard lnet_nid2ni_*()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44634
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  2 --
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  9 +++++----
 net/lnet/lnet/api-ni.c              | 33 +++------------------------------
 3 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 3bcea11..a2d5adc 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -542,9 +542,7 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid,
 int lnet_cpt_of_nid_locked(struct lnet_nid *nid, struct lnet_ni *ni);
 int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
 int lnet_nid2cpt(struct lnet_nid *nid, struct lnet_ni *ni);
-struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
 struct lnet_ni *lnet_nid_to_ni_locked(struct lnet_nid *nid, int cpt);
-struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid);
 struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt);
 struct lnet_ni *lnet_net2ni_addref(u32 net);
 struct lnet_ni *lnet_nid_to_ni_addref(struct lnet_nid *nid);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 451363b..6fc1730 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2397,8 +2397,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct kib_peer_ni *peer_ni;
 	struct kib_peer_ni *peer2;
 	struct kib_conn *conn;
-	struct lnet_ni *ni  = NULL;
+	struct lnet_ni *ni = NULL;
 	struct kib_net *net = NULL;
+	struct lnet_nid destnid;
 	lnet_nid_t nid;
 	struct rdma_conn_param cp;
 	struct kib_rej rej;
@@ -2461,7 +2462,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	}
 
 	nid = reqmsg->ibm_srcnid;
-	ni = lnet_nid2ni_addref(reqmsg->ibm_dstnid);
+	lnet_nid4_to_nid(reqmsg->ibm_dstnid, &destnid);
+	ni  = lnet_nid_to_ni_addref(&destnid);
 
 	if (ni) {
 		net = (struct kib_net *)ni->ni_data;
@@ -2469,8 +2471,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	}
 
 	if (!ni ||				/* no matching net */
-	    lnet_nid_to_nid4(&ni->ni_nid) !=
-	    reqmsg->ibm_dstnid ||		/* right NET, wrong NID! */
+	    !nid_same(&ni->ni_nid, &destnid) ||	/* right NET, wrong NID! */
 	    net->ibn_dev != ibdev) {		/* wrong device */
 		CERROR("Can't accept conn from %s on %s (%s:%d:%pI4h): bad dst nid %s\n",
 		       libcfs_nid2str(nid),
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 5be2aff..0146509 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1654,33 +1654,6 @@ struct lnet_ni *
 	return NULL;
 }
 
-struct lnet_ni  *
-lnet_nid2ni_locked(lnet_nid_t nid4, int cpt)
-{
-	struct lnet_nid nid;
-
-	lnet_nid4_to_nid(nid4, &nid);
-	return lnet_nid_to_ni_locked(&nid, cpt);
-}
-
-struct lnet_ni *
-lnet_nid2ni_addref(lnet_nid_t nid4)
-{
-	struct lnet_ni *ni;
-	struct lnet_nid nid;
-
-	lnet_nid4_to_nid(nid4, &nid);
-
-	lnet_net_lock(0);
-	ni = lnet_nid_to_ni_locked(&nid, 0);
-	if (ni)
-		lnet_ni_addref_locked(ni, 0);
-	lnet_net_unlock(0);
-
-	return ni;
-}
-EXPORT_SYMBOL(lnet_nid2ni_addref);
-
 struct lnet_ni *
 lnet_nid_to_ni_addref(struct lnet_nid *nid)
 {
@@ -3918,11 +3891,11 @@ u32 lnet_get_dlc_seq_locked(void)
 {
 	int cpt, rc = 0;
 	struct lnet_ni *ni;
-	lnet_nid_t nid = stats->hlni_nid;
+	struct lnet_nid nid;
 
+	lnet_nid4_to_nid(stats->hlni_nid, &nid);
 	cpt = lnet_net_lock_current();
-	ni = lnet_nid2ni_locked(nid, cpt);
-
+	ni = lnet_nid_to_ni_locked(&nid, cpt);
 	if (!ni) {
 		rc = -ENOENT;
 		goto unlock;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid
  2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
                   ` (20 preceding siblings ...)
  2022-11-20 14:17 ` [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*() James Simmons
@ 2022-11-20 14:17 ` James Simmons
  21 siblings, 0 replies; 23+ messages in thread
From: James Simmons @ 2022-11-20 14:17 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_debug_peer() now takes 'struct lnet_nid *'.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: e834ad5992adef598 ("LU-10391 lnet: change lnet_debug_peer() to struct lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44635
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  2 +-
 net/lnet/lnet/api-ni.c        |  2 +-
 net/lnet/lnet/peer.c          | 10 ++++------
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index a2d5adc..ba68d50 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -936,7 +936,7 @@ void lnet_peer_primary_nid_locked(struct lnet_nid *nid,
 void lnet_peer_tables_cleanup(struct lnet_net *net);
 void lnet_peer_uninit(void);
 int lnet_peer_tables_create(void);
-void lnet_debug_peer(lnet_nid_t nid);
+void lnet_debug_peer(struct lnet_nid *nid);
 struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer,
 					       u32 net_id);
 bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni,
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 0146509..e400de7 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -5257,7 +5257,7 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info)
 
 void LNetDebugPeer(struct lnet_processid *id)
 {
-	lnet_debug_peer(lnet_nid_to_nid4(&id->nid));
+	lnet_debug_peer(&id->nid);
 }
 EXPORT_SYMBOL(LNetDebugPeer);
 
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index a1305b6..8c603c9 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3966,21 +3966,19 @@ void lnet_peer_discovery_stop(void)
 /* Debugging */
 
 void
-lnet_debug_peer(lnet_nid_t nid4)
+lnet_debug_peer(struct lnet_nid *nid)
 {
 	char *aliveness = "NA";
 	struct lnet_peer_ni *lp;
-	struct lnet_nid nid;
 	int cpt;
 
-	lnet_nid4_to_nid(nid4, &nid);
-	cpt = lnet_nid2cpt(&nid, NULL);
+	cpt = lnet_nid2cpt(nid, NULL);
 	lnet_net_lock(cpt);
 
-	lp = lnet_peerni_by_nid_locked(&nid, NULL, cpt);
+	lp = lnet_peerni_by_nid_locked(nid, NULL, cpt);
 	if (IS_ERR(lp)) {
 		lnet_net_unlock(cpt);
-		CDEBUG(D_WARNING, "No peer %s\n", libcfs_nidstr(&nid));
+		CDEBUG(D_WARNING, "No peer %s\n", libcfs_nidstr(nid));
 		return;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-11-20 14:47 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 14:16 [lustre-devel] [PATCH 00/22] lustre: backport OpenSFS work as of Nov 20, 2022 James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG James Simmons
2022-11-20 14:16 ` [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info() James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*() James Simmons
2022-11-20 14:17 ` [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).