lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1
@ 2021-04-05  0:50 James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering James Simmons
                   ` (40 more replies)
  0 siblings, 41 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Backport various patches for the start of 2.15 development cycle.
This covers the USDP feature landing. I did hold back the fake
symlink work as well as IPv6 LNet support. Newer kernels have
isses with the ports which I need to investigate.

Alex Zhuravlev (1):
  lustre: obdclass: try to skip corrupted llog records

Amir Shehata (9):
  lnet: UDSP storage and marshalled structs
  lnet: foundation patch for selection mod
  lnet: Preferred gateway selection
  lnet: Select NI/peer NI with highest prio
  lnet: select best peer and local net
  lnet: UDSP handling
  lnet: Apply UDSP on local and remote NIs
  lnet: Add the kernel level Marshalling API
  lnet: ioctl handler for get policy info

Andreas Dilger (4):
  lustre: lov: avoid NULL dereference in cleanup
  lustre: llite: quiet spurious ioctl warning
  lsutre: ldlm: return error from ldlm_namespace_new()
  lustre: llite: remove unused ll_teardown_mmaps()

Andrew Perepechko (1):
  lustre: llite: data corruption due to RPC reordering

Andriy Skulysh (1):
  lustre: lov: grant deadlock if same OSC in two components

Bobi Jam (1):
  lustre: lov: fix layout generation inc for mirror split

James Simmons (1):
  lnet: place wire protocol data int own headers

John L. Hammond (1):
  lustre: change EWOULDBLOCK to EAGAIN

Mr NeilBrown (11):
  lustre: lov: style cleanups in lov_set_osc_active()
  lustre: change various operations structs to const
  lustre: mark strings in char arrays as const
  lustre: convert snprintf to scnprintf as appropriate
  lustre: remove non-static 'inline' markings.
  lustre: llite: use is_root_inode()
  lnet: libcfs: discard cfs_firststr
  lnet: libcfs: use wait_event_timeout() in tracefiled().
  lnet: use init_wait() rather than init_waitqueue_entry()
  lnet: discard LNET_MD_PHYS
  lnet: o2iblnd: convert peers hash table to hashtable.h

Oleg Drokin (2):
  lustre: update version to 2.14.0
  lustre: update version to 2.14.50

Rahul Deshmkuh (1):
  lustre: lov: fixes bitfield in lod qos code

Sebastien Buisson (3):
  lustre: ptlrpc: do not output error when imp_sec is freed
  lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch
  lustre: sec: file ioctls to handle encryption policies

Serguei Smirnov (1):
  lnet: modify assertion in lnet_post_send_locked

Sonia Sharma (3):
  lnet: Add the kernel level De-Marshalling API
  lnet: Add the ioctl handler for "add policy"
  lnet: ioctl handler for "delete policy"

Wang Shilong (1):
  lustre: llite: make readahead aware of hints

 fs/lustre/fld/fld_request.c                |    2 +-
 fs/lustre/include/cl_object.h              |   18 +-
 fs/lustre/include/lu_object.h              |   17 +-
 fs/lustre/include/lustre_log.h             |   10 +-
 fs/lustre/include/lustre_net.h             |    2 +-
 fs/lustre/include/lustre_osc.h             |    2 +
 fs/lustre/include/obd_cksum.h              |    2 +-
 fs/lustre/{llite => include}/range_lock.h  |    0
 fs/lustre/ldlm/ldlm_lib.c                  |   10 +-
 fs/lustre/ldlm/ldlm_resource.c             |   32 +-
 fs/lustre/llite/Makefile                   |    2 +-
 fs/lustre/llite/crypto.c                   |    8 +-
 fs/lustre/llite/file.c                     |   33 +-
 fs/lustre/llite/glimpse.c                  |    2 +-
 fs/lustre/llite/llite_internal.h           |    5 +-
 fs/lustre/llite/llite_mmap.c               |   59 +-
 fs/lustre/llite/lproc_llite.c              |   32 +-
 fs/lustre/llite/pcc.c                      |   38 +-
 fs/lustre/llite/rw.c                       |   20 +-
 fs/lustre/llite/super25.c                  |    2 +-
 fs/lustre/llite/vvp_page.c                 |    2 +-
 fs/lustre/lmv/lmv_obd.c                    |    2 +-
 fs/lustre/lmv/lproc_lmv.c                  |    6 +-
 fs/lustre/lov/lov_ea.c                     |    4 +-
 fs/lustre/lov/lov_io.c                     |    4 +
 fs/lustre/lov/lov_obd.c                    |   17 +-
 fs/lustre/lov/lov_object.c                 |    3 +-
 fs/lustre/lov/lov_pack.c                   |    8 -
 fs/lustre/mdc/mdc_dev.c                    |    3 +-
 fs/lustre/obdclass/Makefile                |    3 +-
 fs/lustre/obdclass/cl_io.c                 |   14 +-
 fs/lustre/obdclass/llog.c                  |   80 +-
 fs/lustre/obdclass/llog_cat.c              |   14 +-
 fs/lustre/obdclass/llog_internal.h         |    5 +
 fs/lustre/obdclass/llog_obd.c              |    2 +-
 fs/lustre/obdclass/lu_tgt_descs.c          |   38 +-
 fs/lustre/obdclass/obd_sysfs.c             |    6 +-
 fs/lustre/{llite => obdclass}/range_lock.c |    6 +-
 fs/lustre/osc/osc_io.c                     |   17 +-
 fs/lustre/osc/osc_lock.c                   |    2 +-
 fs/lustre/osc/osc_request.c                |    2 +-
 fs/lustre/ptlrpc/client.c                  |    2 +-
 fs/lustre/ptlrpc/errno.c                   |    4 +-
 fs/lustre/ptlrpc/llog_client.c             |    2 +-
 fs/lustre/ptlrpc/pers.c                    |    1 -
 fs/lustre/ptlrpc/sec.c                     |   14 +-
 include/linux/lnet/lib-lnet.h              |   45 +
 include/linux/lnet/lib-types.h             |   72 +-
 include/linux/lnet/udsp.h                  |  144 +++
 include/uapi/linux/lnet/libcfs_debug.h     |    1 +
 include/uapi/linux/lnet/libcfs_ioctl.h     |    8 +-
 include/uapi/linux/lnet/lnet-dlc.h         |   91 +-
 include/uapi/linux/lnet/lnet-idl.h         |  241 +++++
 include/uapi/linux/lnet/lnet-types.h       |  220 +---
 include/uapi/linux/lnet/lnetctl.h          |    1 +
 include/uapi/linux/lnet/lnetst.h           |    2 +
 include/uapi/linux/lnet/nidstr.h           |    5 +
 include/uapi/linux/lnet/socklnd.h          |    1 +
 include/uapi/linux/lustre/lustre_user.h    |    2 +-
 include/uapi/linux/lustre/lustre_ver.h     |    6 +-
 net/lnet/klnds/o2iblnd/o2iblnd-idl.h       |  157 +++
 net/lnet/klnds/o2iblnd/o2iblnd.c           |  267 +++--
 net/lnet/klnds/o2iblnd/o2iblnd.h           |  133 +--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c        |   18 +-
 net/lnet/klnds/socklnd/socklnd_cb.c        |    4 +-
 net/lnet/libcfs/libcfs_string.c            |   28 -
 net/lnet/libcfs/tracefile.c                |   37 +-
 net/lnet/lnet/Makefile                     |    2 +-
 net/lnet/lnet/api-ni.c                     |  278 ++++-
 net/lnet/lnet/config.c                     |    4 +
 net/lnet/lnet/lib-move.c                   |  399 ++++---
 net/lnet/lnet/nidstrings.c                 |   66 ++
 net/lnet/lnet/peer.c                       |  267 +++--
 net/lnet/lnet/udsp.c                       | 1553 ++++++++++++++++++++++++++++
 74 files changed, 3744 insertions(+), 865 deletions(-)
 rename fs/lustre/{llite => include}/range_lock.h (100%)
 rename fs/lustre/{llite => obdclass}/range_lock.c (96%)
 create mode 100644 include/linux/lnet/udsp.h
 create mode 100644 include/uapi/linux/lnet/lnet-idl.h
 create mode 100644 net/lnet/klnds/o2iblnd/o2iblnd-idl.h
 create mode 100644 net/lnet/lnet/udsp.c

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 02/41] lustre: llite: make readahead aware of hints James Simmons
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andrew Perepechko, Lustre Development List

From: Andrew Perepechko <c17827@cray.com>

Without read-only cache, it is possible that a client
resends a BRW RPC, receives a reply from the original
BRW RPC, modifies the same data and sends a new BRW
RPC, however, because of RPC reordering stale data
gets to disk.

Let's use range locking to protect against this race.
For Linux clients it just a simple move of the range
lock code to obdclass.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10958
Lustre-commit: 35679a730bf0b7a ("LU-10958 ofd: data corruption due to RPC reordering")
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Cray-bug-id: LUS-5578,LUS-8943
Reviewed-on: https://review.whamcloud.com/32281
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/{llite => include}/range_lock.h  | 0
 fs/lustre/llite/Makefile                   | 2 +-
 fs/lustre/llite/llite_internal.h           | 2 +-
 fs/lustre/obdclass/Makefile                | 3 ++-
 fs/lustre/{llite => obdclass}/range_lock.c | 6 +++++-
 5 files changed, 9 insertions(+), 4 deletions(-)
 rename fs/lustre/{llite => include}/range_lock.h (100%)
 rename fs/lustre/{llite => obdclass}/range_lock.c (96%)

diff --git a/fs/lustre/llite/range_lock.h b/fs/lustre/include/range_lock.h
similarity index 100%
rename from fs/lustre/llite/range_lock.h
rename to fs/lustre/include/range_lock.h
diff --git a/fs/lustre/llite/Makefile b/fs/lustre/llite/Makefile
index aa388bb6..3bad19c 100644
--- a/fs/lustre/llite/Makefile
+++ b/fs/lustre/llite/Makefile
@@ -3,7 +3,7 @@ ccflags-y += -I$(srctree)/$(src)/../include
 
 obj-$(CONFIG_LUSTRE_FS) += lustre.o
 lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \
-	    rw.o rw26.o namei.o symlink.o llite_mmap.o range_lock.o \
+	    rw.o rw26.o namei.o symlink.o llite_mmap.o \
 	    xattr.o xattr_cache.o xattr_security.o \
 	    super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \
 	    vvp_dev.o vvp_page.o vvp_io.o vvp_object.o \
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 797dfea..0fe0b562 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -45,12 +45,12 @@
 #include <lustre_intent.h>
 #include <linux/compat.h>
 #include <lustre_crypto.h>
+#include <range_lock.h>
 #include <linux/namei.h>
 #include <linux/xattr.h>
 #include <linux/posix_acl_xattr.h>
 
 #include "vvp_internal.h"
-#include "range_lock.h"
 #include "pcc.h"
 
 /** Only used on client-side for indicating the tail of dir hash/offset. */
diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile
index 9693a5e..de37a89 100644
--- a/fs/lustre/obdclass/Makefile
+++ b/fs/lustre/obdclass/Makefile
@@ -8,4 +8,5 @@ obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \
 	      lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \
 	      obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \
 	      cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \
-	      jobid.o integrity.o obd_cksum.o lu_tgt_descs.o
+	      jobid.o integrity.o obd_cksum.o lu_tgt_descs.o \
+	      range_lock.o
diff --git a/fs/lustre/llite/range_lock.c b/fs/lustre/obdclass/range_lock.c
similarity index 96%
rename from fs/lustre/llite/range_lock.c
rename to fs/lustre/obdclass/range_lock.c
index 772b8ac..2af6385 100644
--- a/fs/lustre/llite/range_lock.c
+++ b/fs/lustre/obdclass/range_lock.c
@@ -35,8 +35,8 @@
  * Author: Bobi Jam <bobijam.xu@intel.com>
  */
 #include <linux/sched/signal.h>
-#include "range_lock.h"
 #include <uapi/linux/lustre/lustre_idl.h>
+#include <range_lock.h>
 #include <linux/libcfs/libcfs.h>
 #include <linux/interval_tree_generic.h>
 
@@ -59,6 +59,7 @@ void range_lock_tree_init(struct range_lock_tree *tree)
 	tree->rlt_sequence = 0;
 	spin_lock_init(&tree->rlt_lock);
 }
+EXPORT_SYMBOL(range_lock_tree_init);
 
 /**
  * Initialize a range lock node
@@ -86,6 +87,7 @@ int range_lock_init(struct range_lock *lock, u64 start, u64 end)
 	lock->rl_sequence = 0;
 	return 0;
 }
+EXPORT_SYMBOL(range_lock_init);
 
 /**
  * Unlock a range lock, wake up locks blocked by this lock.
@@ -117,6 +119,7 @@ void range_unlock(struct range_lock_tree *tree, struct range_lock *lock)
 
 	spin_unlock(&tree->rlt_lock);
 }
+EXPORT_SYMBOL(range_unlock);
 
 /**
  * Lock a region
@@ -167,3 +170,4 @@ int range_lock(struct range_lock_tree *tree, struct range_lock *lock)
 out:
 	return rc;
 }
+EXPORT_SYMBOL(range_lock);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 02/41] lustre: llite: make readahead aware of hints
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 03/41] lustre: lov: avoid NULL dereference in cleanup James Simmons
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

Calling madvise(MADV_SEQUENTIAL) and madvise(MADV_RANDOM) sets the
VM_SEQ_READ and VM_RAND_READ hints in vma->vm_flags.  These should
be used to guide the Lustre readahead for better performance.

Disable the kernel readahead for mmap() pages and use the llite
readahead instead.  There was also a bug in __ll_fault() that would
set both VM_SEQ_READ and VM_RAND_READ at the same time, which was
confusing the detection of the VM_SEQ_READ case, since VM_RAND_READ
was being checked first.

This changes the readahead for mmap from submitting mostly 4KB RPCs
to a large number of 1MB RPCs for the application profiled:

  llite.*.read_ahead_stats     before        patched
  ------------------------     ------        -------
  hits                           2408         135924 samples [pages]
  misses                        34160           2384 samples [pages]

  osc.*.rpc_stats           read before    read patched
  ---------------          -------------  --------------
  pages per rpc            rpcs   % cum%   rpcs   % cum%
     1:                    6542  95  95     351  55  55
     2:                     224   3  99      76  12  67
     4:                      32   0  99      28   4  72
     8:                       2   0  99       9   1  73
    16:                      25   0  99      32   5  78
    32:                       0   0  99       8   1  80
    64:                       0   0  99       5   0  80
   128:                       0   0  99      15   2  83
   256:                       2   0  99     102  16  99
   512:                       0   0  99       0   0  99
  1024:                       1   0 100       3   0 100

Readahead hit rate improved from 6% to 98%, and 4KB RPCs dropped from
95% to 55% and 1MB+ RPCs increased from 0% to 16% (79% of all pages).

Add debug to ll_file_mmap(), ll_fault() and ll_fault_io_init() to
allow tracing VMA state functions for future IO optimizations.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13669
Lustre-commit: 7542820698696ed ("LU-13669 llite: make readahead aware of hints")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/41228
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 10 +++++++++-
 fs/lustre/llite/file.c        |  2 ++
 fs/lustre/llite/llite_mmap.c  | 42 ++++++++++++++++++++++--------------------
 fs/lustre/llite/rw.c          | 20 ++++++++++++++++----
 4 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 4f34e5d..739fe5b 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1974,7 +1974,15 @@ struct cl_io {
 	 * the read IO will check to-be-read OSCs' status, and make fast-switch
 	 * another mirror if some of the OSTs are not healthy.
 	 */
-				ci_tried_all_mirrors:1;
+				ci_tried_all_mirrors:1,
+	/**
+	 * Random read hints, readahead will be disabled.
+	 */
+				ci_rand_read:1,
+	/**
+	 * Sequential read hints.
+	 */
+				ci_seq_read:1;
 	/**
 	 * Bypass quota check
 	 */
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 7c7ac01..fd01e14 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -736,6 +736,8 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
 	file->private_data = fd;
 	ll_readahead_init(inode, &fd->fd_ras);
 	fd->fd_omode = it->it_flags & (FMODE_READ | FMODE_WRITE | FMODE_EXEC);
+	/* turn off the kernel's read-ahead */
+	file->f_ra.ra_pages = 0;
 
 	/* ll_cl_context initialize */
 	rwlock_init(&fd->fd_lock);
diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index f0be7ba..b9a73e0 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -84,13 +84,11 @@ struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
  * @vma		virtual memory area addressed to page fault
  * @env		corespondent lu_env to processing
  * @index	page index corespondent to fault.
- * @ra_flags	vma readahead flags.
  *
- * \return error codes from cl_io_init.
+ * RETURN	error codes from cl_io_init.
  */
 static struct cl_io *
-ll_fault_io_init(struct lu_env *env, struct vm_area_struct *vma,
-		 pgoff_t index, unsigned long *ra_flags)
+ll_fault_io_init(struct lu_env *env, struct vm_area_struct *vma, pgoff_t index)
 {
 	struct file *file = vma->vm_file;
 	struct inode *inode = file_inode(file);
@@ -110,18 +108,15 @@ struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
 	fio->ft_index = index;
 	fio->ft_executable = vma->vm_flags & VM_EXEC;
 
-	/*
-	 * disable VM_SEQ_READ and use VM_RAND_READ to make sure that
-	 * the kernel will not read other pages not covered by ldlm in
-	 * filemap_nopage. we do our readahead in ll_readpage.
-	 */
-	if (ra_flags)
-		*ra_flags = vma->vm_flags & (VM_RAND_READ | VM_SEQ_READ);
-	vma->vm_flags &= ~VM_SEQ_READ;
-	vma->vm_flags |= VM_RAND_READ;
+	CDEBUG(D_MMAP,
+	       DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n",
+	       PFID(&ll_i2info(inode)->lli_fid), vma, vma->vm_start,
+	       vma->vm_end, vma->vm_flags, fio->ft_index);
 
-	CDEBUG(D_MMAP, "vm_flags: %lx (%lu %d)\n", vma->vm_flags,
-	       fio->ft_index, fio->ft_executable);
+	if (vma->vm_flags & VM_SEQ_READ)
+		io->ci_seq_read = 1;
+	else if (vma->vm_flags & VM_RAND_READ)
+		io->ci_rand_read = 1;
 
 	rc = cl_io_init(env, io, CIT_FAULT, io->ci_obj);
 	if (rc == 0) {
@@ -161,7 +156,7 @@ static int __ll_page_mkwrite(struct vm_area_struct *vma, struct page *vmpage,
 	if (IS_ERR(env))
 		return PTR_ERR(env);
 
-	io = ll_fault_io_init(env, vma, vmpage->index, NULL);
+	io = ll_fault_io_init(env, vma, vmpage->index);
 	if (IS_ERR(io)) {
 		result = PTR_ERR(io);
 		goto out;
@@ -277,7 +272,6 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct cl_io *io;
 	struct vvp_io *vio = NULL;
 	struct page *vmpage;
-	unsigned long ra_flags;
 	int result = 0;
 	vm_fault_t fault_ret = 0;
 	u16 refcheck;
@@ -314,7 +308,7 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		fault_ret = 0;
 	}
 
-	io = ll_fault_io_init(env, vma, vmf->pgoff, &ra_flags);
+	io = ll_fault_io_init(env, vma, vmf->pgoff);
 	if (IS_ERR(io)) {
 		fault_ret = to_fault_error(PTR_ERR(io));
 		goto out;
@@ -350,8 +344,6 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 	cl_io_fini(env, io);
 
-	vma->vm_flags |= ra_flags;
-
 out:
 	cl_env_put(env, &refcheck);
 	if (result != 0 && !(fault_ret & VM_FAULT_RETRY))
@@ -375,6 +367,10 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
 	if (cached)
 		goto out;
 
+	CDEBUG(D_MMAP, DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx\n",
+	       PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid),
+	       vma, vma->vm_start, vma->vm_end, vma->vm_flags);
+
 	/* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite
 	 * so that it can be killed by admin but not cause segfault by
 	 * other signals.
@@ -385,6 +381,7 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
 	/* make sure offset is not a negative number */
 	if (vmf->pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
 		return VM_FAULT_SIGBUS;
+
 restart:
 	result = __ll_fault(vmf->vma, vmf);
 	if (vmf->page &&
@@ -545,6 +542,11 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma)
 	bool cached;
 	int rc;
 
+	CDEBUG(D_VFSTRACE | D_MMAP,
+	       "VFS_Op: fid="DFID" vma=%p start=%#lx end=%#lx vm_flags=%#lx\n",
+	       PFID(&ll_i2info(inode)->lli_fid),
+	       vma, vma->vm_start, vma->vm_end, vma->vm_flags);
+
 	if (ll_file_nolock(file))
 		return -EOPNOTSUPP;
 
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 096e015..8bba97f 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1255,7 +1255,7 @@ static bool index_in_stride_window(struct ll_readahead_state *ras,
  */
 static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 		       struct ll_readahead_state *ras, pgoff_t index,
-		       enum ras_update_flags flags)
+		       enum ras_update_flags flags, struct cl_io *io)
 {
 	struct ll_ra_info *ra = &sbi->ll_ra_info;
 	bool hit = flags & LL_RAS_HIT;
@@ -1276,6 +1276,18 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
 	if (ras->ras_no_miss_check)
 		goto out_unlock;
 
+	if (io && io->ci_rand_read)
+		goto out_unlock;
+
+	if (io && io->ci_seq_read) {
+		if (!hit) {
+			/* to avoid many small read RPC here */
+			ras->ras_window_pages = sbi->ll_ra_info.ra_range_pages;
+			ll_ra_stats_inc_sbi(sbi, RA_STAT_MMAP_RANGE_READ);
+		}
+		goto skip;
+	}
+
 	if (flags & LL_RAS_MMAP) {
 		unsigned long ra_pages;
 
@@ -1594,7 +1606,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 			flags |= LL_RAS_HIT;
 		if (!vio->vui_ra_valid)
 			flags |= LL_RAS_MMAP;
-		ras_update(sbi, inode, ras, vvp_index(vpg), flags);
+		ras_update(sbi, inode, ras, vvp_index(vpg), flags, io);
 	}
 
 	cl_2queue_init(queue);
@@ -1613,7 +1625,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	io_start_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos);
 	io_end_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos +
 				io->u.ci_rw.crw_count - 1);
-	if (ll_readahead_enabled(sbi) && ras) {
+	if (ll_readahead_enabled(sbi) && ras && !io->ci_rand_read) {
 		pgoff_t skip_index = 0;
 
 		if (ras->ras_next_readahead_idx < vvp_index(vpg))
@@ -1802,7 +1814,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 			 * if the page is hit in cache because non cache page
 			 * case will be handled by slow read later.
 			 */
-			ras_update(sbi, inode, ras, vvp_index(vpg), flags);
+			ras_update(sbi, inode, ras, vvp_index(vpg), flags, io);
 			/* avoid duplicate ras_update() call */
 			vpg->vpg_ra_updated = 1;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 03/41] lustre: lov: avoid NULL dereference in cleanup
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 02/41] lustre: llite: make readahead aware of hints James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 04/41] lustre: llite: quiet spurious ioctl warning James Simmons
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Running racer concurrently with file migration crashes easily
when the layout changes for a file in an unexpected way:

  lov_init_composite() lustre-clilov: DOM entries with different sizes
  lov_layout_change() lustre-clilov: cannot apply new layout on
    [0x200000402:0x3e6a:0x0] : rc = -22
    BUG: unable to handle kernel NULL pointer dereference at 0x00000014
    IP: [<ffffffffa08baef4>] lov_delete_composite+0x104/0x540 [lov]
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 20227 Comm: ln

Avoid the NULL dereference if the entry is not fully initialized
during cleanup.

Fixes: 3219c662a46 ("lustre: flr: skip unknown FLR component types")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14389
Lustre-commit: 5da049d9ef1d26e ("LU-14389 lov: avoid NULL dereference in cleanup")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41398
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 5d0e257..5d618c1 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -860,7 +860,7 @@ static int lov_delete_composite(const struct lu_env *env,
 	lov_layout_wait(env, lov);
 	if (comp->lo_entries)
 		lov_foreach_layout_entry(lov, entry) {
-			if (lsme_is_foreign(entry->lle_lsme))
+			if (entry->lle_lsme && lsme_is_foreign(entry->lle_lsme))
 				continue;
 
 			lov_delete_raid0(env, lov, entry);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 04/41] lustre: llite: quiet spurious ioctl warning
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (2 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 03/41] lustre: lov: avoid NULL dereference in cleanup James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 05/41] lustre: ptlrpc: do not output error when imp_sec is freed James Simmons
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Calling "lfs setstripe" prints a suprious warning about using the old
ioctl(LL_IOC_LOV_GETSTRIPE) when that is not actually the case.

Remove the ioctl warning for now and deal with related issues later.

Fixes: 1288417cb488 ("lustre: llite: restore ll_file_getstripe in ll_lov_setstripe")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14316
Lustre-commit: c6f65d8af11647 ("LU-14316 llite: quiet spurious ioctl warning")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41427
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_pack.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/lustre/lov/lov_pack.c b/fs/lustre/lov/lov_pack.c
index ffe9687..1962472 100644
--- a/fs/lustre/lov/lov_pack.c
+++ b/fs/lustre/lov/lov_pack.c
@@ -351,7 +351,6 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 	struct lov_user_md_v1 lum;
 	size_t lmmk_size, lum_size = 0;
 	ssize_t lmm_size;
-	static bool printed;
 	int rc = 0;
 
 	if (lsm->lsm_magic != LOV_MAGIC_V1 && lsm->lsm_magic != LOV_MAGIC_V3 &&
@@ -363,13 +362,6 @@ int lov_getstripe(const struct lu_env *env, struct lov_object *obj,
 		goto out;
 	}
 
-	if (!printed) {
-		LCONSOLE_WARN("%s: using old ioctl(LL_IOC_LOV_GETSTRIPE) on " DFID ", use llapi_layout_get_by_path()\n",
-			      current->comm,
-			      PFID(&obj->lo_cl.co_lu.lo_header->loh_fid));
-		printed = true;
-	}
-
 	lmmk_size = lov_comp_md_size(lsm);
 	lmmk = kvzalloc(lmmk_size, GFP_KERNEL);
 	if (!lmmk) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 05/41] lustre: ptlrpc: do not output error when imp_sec is freed
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (3 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 04/41] lustre: llite: quiet spurious ioctl warning James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 06/41] lustre: update version to 2.14.0 James Simmons
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

There is a race condition on client reconnect when the import is being
destroyed.  Some outstanding client bound requests are being processed
when the imp_sec has already been freed.
Ensure to output the error message in import_sec_validate_get() only
if import is not already in the zombie work queue.

Fixes: 1a6cc3b1a9 ("staging: lustre: obdclass: use workqueue for zombie management")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14355
Lustre-commit: 20cbbb084b671a1 ("LU-14355 ptlrpc: do not output error when imp_sec is freed")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/41310
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/sec.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 43d4f76..5b5faac 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -345,8 +345,11 @@ static int import_sec_validate_get(struct obd_import *imp,
 
 	*sec = sptlrpc_import_sec_ref(imp);
 	if (!*sec) {
-		CERROR("import %p (%s) with no sec\n",
-		       imp, ptlrpc_import_state_name(imp->imp_state));
+		/* Only output an error when the import is still active */
+		if (!test_bit(WORK_STRUCT_PENDING_BIT,
+			      work_data_bits(&imp->imp_zombie_work)))
+			CERROR("import %p (%s) with no sec\n",
+			       imp, ptlrpc_import_state_name(imp->imp_state));
 		return -EACCES;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 06/41] lustre: update version to 2.14.0
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (4 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 05/41] lustre: ptlrpc: do not output error when imp_sec is freed James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 07/41] lnet: UDSP storage and marshalled structs James Simmons
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.0

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index 2a6c050..c02a322 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -2,10 +2,10 @@
 #define _LUSTRE_VER_H_
 
 #define LUSTRE_MAJOR 2
-#define LUSTRE_MINOR 13
-#define LUSTRE_PATCH 57
+#define LUSTRE_MINOR 14
+#define LUSTRE_PATCH 0
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.13.57"
+#define LUSTRE_VERSION_STRING "2.14.0"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 07/41] lnet: UDSP storage and marshalled structs
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (5 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 06/41] lustre: update version to 2.14.0 James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 08/41] lnet: foundation patch for selection mod James Simmons
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Commit the structures which will be used by kernel space
to store the UDSPs. This commit also adds the IOCTL structures
which are used for marshalling the UDSPs between user and
kernel space.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 6427da97ef6e90e5 ("LU-9121 lnet: UDSP storage and marshalled structs")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34253
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h       | 45 ++++++++++++++++++
 include/uapi/linux/lnet/lnet-dlc.h   | 90 ++++++++++++++++++++++++++++++++++--
 include/uapi/linux/lnet/lnet-types.h | 14 ++++++
 net/lnet/lnet/api-ni.c               |  1 +
 4 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 7c9d7e2..a8bd5a5 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -995,6 +995,49 @@ struct lnet_msg_container {
 	void			**msc_resenders;
 };
 
+/* This UDSP structures need to match the user space liblnetconfig structures
+ * in order for the marshall and unmarshall functions to be common.
+ */
+
+/* Net is described as a
+ *  1. net type
+ *  2. num range
+ */
+struct lnet_ud_net_descr {
+	u32 udn_net_type;
+	struct list_head udn_net_num_range;
+};
+
+/* each NID range is defined as
+ *  1. net descriptor
+ *  2. address range descriptor
+ */
+struct lnet_ud_nid_descr {
+	struct lnet_ud_net_descr ud_net_id;
+	struct list_head ud_addr_range;
+	u32 ud_mem_size;
+};
+
+/* a UDSP rule can have up to three user defined NID descriptors
+ *	- src: defines the local NID range for the rule
+ *	- dst: defines the peer NID range for the rule
+ *	- rte: defines the router NID range for the rule
+ *
+ * An action union defines the action to take when the rule
+ * is matched
+ */
+struct lnet_udsp {
+	struct list_head udsp_on_list;
+	u32 udsp_idx;
+	struct lnet_ud_nid_descr udsp_src;
+	struct lnet_ud_nid_descr udsp_dst;
+	struct lnet_ud_nid_descr udsp_rte;
+	enum lnet_udsp_action_type udsp_action_type;
+	union {
+		u32 udsp_priority;
+	} udsp_action;
+};
+
 /* Peer Discovery states */
 #define LNET_DC_STATE_SHUTDOWN		0	/* not started */
 #define LNET_DC_STATE_RUNNING		1	/* started up OK */
@@ -1176,6 +1219,8 @@ struct lnet {
 	 * work loops
 	 */
 	struct completion		ln_started;
+	/* UDSP list */
+	struct list_head		ln_udsp_list;
 };
 
 #endif
diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index 2a87071..ca1f8ae 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -26,8 +26,8 @@
  * Author: Amir Shehata <amir.shehata@intel.com>
  */
 
-#ifndef LNET_DLC_H
-#define LNET_DLC_H
+#ifndef _LNET_DLC_H_
+#define _LNET_DLC_H_
 
 #include <linux/lnet/libcfs_ioctl.h>
 #include <linux/lnet/lnet-types.h>
@@ -292,4 +292,88 @@ struct lnet_ioctl_lnet_stats {
 	struct lnet_counters st_cntrs;
 };
 
-#endif /* LNET_DLC_H */
+/* An IP, numeric NID or a Net number is composed of 1 or more of these
+ * descriptor structures.
+ */
+struct lnet_range_expr {
+	__u32 re_lo;
+	__u32 re_hi;
+	__u32 re_stride;
+};
+
+/* le_count identifies the number of lnet_range_expr in the bulk
+ * which follows
+ */
+struct lnet_expressions {
+	__u32 le_count;
+};
+
+/* A net descriptor has the net type, IE: O2IBLND, SOCKLND, etc and an
+ * expression describing a net number range.
+ */
+struct lnet_ioctl_udsp_net_descr {
+	__u32 ud_net_type;
+	struct lnet_expressions ud_net_num_expr;
+};
+
+/* The UDSP descriptor header contains the type of matching criteria, SRC,
+ * DST, RTE, etc and how many lnet_expressions compose the LNet portion of
+ * the LNet NID. For example an IP can be
+ * composed of 4 lnet_expressions , a gni can be composed of 1
+ */
+struct lnet_ioctl_udsp_descr_hdr {
+	/* The literals SRC, DST and RTE are encoded
+	 * here.
+	 */
+	__u32 ud_descr_type;
+	__u32 ud_descr_count;
+};
+
+/* each matching expression in the UDSP is described with this.
+ * The bulk format is as follows:
+ *	1. 1x struct lnet_ioctl_udsp_net_descr
+ *		-> the net part of the NID
+ *	2. >=0 struct lnet_expressions
+ *		-> the address part of the NID
+ */
+struct lnet_ioctl_udsp_descr {
+	struct lnet_ioctl_udsp_descr_hdr iud_src_hdr;
+	struct lnet_ioctl_udsp_net_descr iud_net;
+};
+
+/* The cumulative UDSP descriptor
+ * The bulk format is as follows:
+ *	1. >=1 struct lnet_ioctl_udsp_descr
+ *
+ * The size indicated in iou_hdr is the total size of the UDSP.
+ *
+ */
+struct lnet_ioctl_udsp {
+	struct libcfs_ioctl_hdr iou_hdr;
+	__s32 iou_idx;
+	__u32 iou_action_type;
+	__u32 iou_bulk_size;
+	union {
+		__u32 priority;
+	} iou_action;
+	void __user *iou_bulk;
+};
+
+/* structure used to request udsp instantiation information on the
+ * specified construct.
+ *   cud_nid: the NID of the local or remote NI to pull info on.
+ *   cud_nid_priority: NID prio of the requested NID.
+ *   cud_net_priority: net prio of network of the requested NID.
+ *   cud_pref_nid: array of preferred NIDs if it exists.
+ */
+struct lnet_ioctl_construct_udsp_info {
+	struct libcfs_ioctl_hdr cud_hdr;
+	__u32 cud_peer:1;
+	lnet_nid_t cud_nid;
+	__u32 cud_nid_priority;
+	__u32 cud_net_priority;
+	lnet_nid_t cud_pref_nid[LNET_MAX_SHOW_NUM_NID];
+	lnet_nid_t cud_pref_rtr_nid[LNET_MAX_SHOW_NUM_NID];
+};
+
+#endif /* _LNET_DLC_H_ */
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 3324792..5bf9917 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -709,6 +709,20 @@ enum lnet_ack_req {
 	/** Request that no acknowledgment should be generated. */
 	LNET_NOACK_REQ
 };
+
+/**
+ * UDSP action types. There are two available actions:
+ *	1. PRIORITY - set priority of matching LNet constructs
+ *	2. PREFERRED LIST - set preferred list of matching LNet constructs
+ */
+enum lnet_udsp_action_type {
+	EN_LNET_UDSP_ACTION_NONE = 0,
+	/** assign a priority to matching constructs */
+	EN_LNET_UDSP_ACTION_PRIORITY = 1,
+	/** assign a preferred list of NIDs to matching constructs */
+	EN_LNET_UDSP_ACTION_PREFERRED_LIST = 2,
+};
+
 /** @} lnet_data */
 
 /** @} lnet */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index c3bf444..3acc86e 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1154,6 +1154,7 @@ struct list_head **
 	INIT_LIST_HEAD(&the_lnet.ln_dc_expired);
 	INIT_LIST_HEAD(&the_lnet.ln_mt_localNIRecovq);
 	INIT_LIST_HEAD(&the_lnet.ln_mt_peerNIRecovq);
+	INIT_LIST_HEAD(&the_lnet.ln_udsp_list);
 	init_waitqueue_head(&the_lnet.ln_dc_waitq);
 	the_lnet.ln_mt_handler = NULL;
 	init_completion(&the_lnet.ln_started);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 08/41] lnet: foundation patch for selection mod
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (6 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 07/41] lnet: UDSP storage and marshalled structs James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 09/41] lnet: Preferred gateway selection James Simmons
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Add the priority and preferred NIDs fields in the lnet_ni,
lnet_net, lnet_peer_net and lnet_peer_ni. Switched
the implementation of the preferred NIDs list to list_head
instead of array, because the code is more straight forward.
There is more memory overhead due to list_head, but these lists
are expected to be small, so I chose code simplicity over memory.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 51b2c0f75f727f0 ("LU-9121 lnet: foundation patch for selection mod")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34350
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h |  24 +++++++-
 net/lnet/lnet/config.c         |   4 ++
 net/lnet/lnet/peer.c           | 134 ++++++++++++++++++++++-------------------
 3 files changed, 100 insertions(+), 62 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index a8bd5a5..187e1f3 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -58,6 +58,7 @@
  * All local and peer NIs created have their health default to this value.
  */
 #define LNET_MAX_HEALTH_VALUE 1000
+#define LNET_MAX_SELECTION_PRIORITY UINT_MAX
 
 /* forward refs */
 struct lnet_libmd;
@@ -364,6 +365,9 @@ struct lnet_net {
 	/* cumulative CPTs of all NIs in this net */
 	u32			*net_cpts;
 
+	/* relative net selection priority */
+	u32			net_sel_priority;
+
 	/* network tunables */
 	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
 
@@ -388,6 +392,9 @@ struct lnet_net {
 
 	/* protects access to net_last_alive */
 	spinlock_t		net_lock;
+
+	/* list of router nids preferred for this network */
+	struct list_head	net_rtr_pref_nids;
 };
 
 struct lnet_ni {
@@ -466,6 +473,9 @@ struct lnet_ni {
 	 */
 	atomic_t		ni_fatal_error_on;
 
+	/* the relative selection priority of this NI */
+	u32			ni_sel_priority;
+
 	/*
 	 * equivalent interfaces to use
 	 * This is an array because socklnd bonding can still be configured
@@ -498,6 +508,11 @@ struct lnet_ping_buffer {
 #define LNET_PING_INFO_TO_BUFFER(PINFO)	\
 	container_of((PINFO), struct lnet_ping_buffer, pb_info)
 
+struct lnet_nid_list {
+	struct list_head nl_list;
+	lnet_nid_t nl_nid;
+};
+
 struct lnet_peer_ni {
 	/* chain on lpn_peer_nis */
 	struct list_head	 lpni_peer_nis;
@@ -557,8 +572,12 @@ struct lnet_peer_ni {
 	/* preferred local nids: if only one, use lpni_pref.nid */
 	union lpni_pref {
 		lnet_nid_t	 nid;
-		lnet_nid_t	*nids;
+		struct list_head nids;
 	} lpni_pref;
+	/* list of router nids preferred for this peer NI */
+	struct list_head	lpni_rtr_pref_nids;
+	/* The relative selection priority of this peer NI */
+	u32			lpni_sel_priority;
 	/* number of preferred NIDs in lnpi_pref_nids */
 	u32			 lpni_pref_nnids;
 };
@@ -752,6 +771,9 @@ struct lnet_peer_net {
 	/* selection sequence number */
 	u32			lpn_seq;
 
+	/* relative peer net selection priority */
+	u32			lpn_sel_priority;
+
 	/* reference count */
 	atomic_t		lpn_refcount;
 };
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index b078bc8..10a7fe9 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -366,11 +366,14 @@ struct lnet_net *
 	INIT_LIST_HEAD(&net->net_ni_list);
 	INIT_LIST_HEAD(&net->net_ni_added);
 	INIT_LIST_HEAD(&net->net_ni_zombie);
+	INIT_LIST_HEAD(&net->net_rtr_pref_nids);
 	spin_lock_init(&net->net_lock);
 
 	net->net_id = net_id;
 	net->net_last_alive = ktime_get_real_seconds();
 
+	net->net_sel_priority = LNET_MAX_SELECTION_PRIORITY;
+
 	/* initialize global paramters to undefiend */
 	net->net_tunables.lct_peer_timeout = -1;
 	net->net_tunables.lct_max_tx_credits = -1;
@@ -470,6 +473,7 @@ struct lnet_net *
 		ni->ni_net_ns = get_net(&init_net);
 
 	ni->ni_state = LNET_NI_STATE_INIT;
+	ni->ni_sel_priority = LNET_MAX_SELECTION_PRIORITY;
 	list_add_tail(&ni->ni_netlist, &net->net_ni_added);
 
 	/*
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 70df37a..60e6b51 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -124,8 +124,10 @@
 	INIT_LIST_HEAD(&lpni->lpni_peer_nis);
 	INIT_LIST_HEAD(&lpni->lpni_recovery);
 	INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list);
+	INIT_LIST_HEAD(&lpni->lpni_rtr_pref_nids);
 	LNetInvalidateMDHandle(&lpni->lpni_recovery_ping_mdh);
 	atomic_set(&lpni->lpni_refcount, 1);
+	lpni->lpni_sel_priority = LNET_MAX_SELECTION_PRIORITY;
 
 	spin_lock_init(&lpni->lpni_lock);
 
@@ -175,6 +177,7 @@
 	INIT_LIST_HEAD(&lpn->lpn_peer_nets);
 	INIT_LIST_HEAD(&lpn->lpn_peer_nis);
 	lpn->lpn_net_id = net_id;
+	lpn->lpn_sel_priority = LNET_MAX_SELECTION_PRIORITY;
 
 	CDEBUG(D_NET, "%p net %s\n", lpn, libcfs_net2str(lpn->lpn_net_id));
 
@@ -899,14 +902,14 @@ struct lnet_peer_ni *
 bool
 lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid)
 {
-	int i;
+	struct lnet_nid_list *ne;
 
 	if (lpni->lpni_pref_nnids == 0)
 		return false;
 	if (lpni->lpni_pref_nnids == 1)
 		return lpni->lpni_pref.nid == nid;
-	for (i = 0; i < lpni->lpni_pref_nnids; i++) {
-		if (lpni->lpni_pref.nids[i] == nid)
+	list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) {
+		if (ne->nl_nid == nid)
 			return true;
 	}
 	return false;
@@ -978,11 +981,10 @@ struct lnet_peer_ni *
 int
 lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid)
 {
-	lnet_nid_t *nids = NULL;
-	lnet_nid_t *oldnids = NULL;
 	struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer;
-	int size;
-	int i;
+	struct lnet_nid_list *ne1 = NULL;
+	struct lnet_nid_list *ne2 = NULL;
+	lnet_nid_t tmp_nid = LNET_NID_ANY;
 	int rc = 0;
 
 	if (nid == LNET_NID_ANY) {
@@ -996,29 +998,46 @@ struct lnet_peer_ni *
 	}
 
 	/* A non-MR node may have only one preferred NI per peer_ni */
-	if (lpni->lpni_pref_nnids > 0) {
-		if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) {
-			rc = -EPERM;
-			goto out;
-		}
+	if (lpni->lpni_pref_nnids > 0 &&
+	    !(lp->lp_state & LNET_PEER_MULTI_RAIL)) {
+		rc = -EPERM;
+		goto out;
 	}
 
+	/* add the new preferred nid to the list of preferred nids */
 	if (lpni->lpni_pref_nnids != 0) {
-		size = sizeof(*nids) * (lpni->lpni_pref_nnids + 1);
-		nids = kzalloc_cpt(size, GFP_KERNEL, lpni->lpni_cpt);
-		if (!nids) {
+		size_t alloc_size = sizeof(*ne1);
+
+		if (lpni->lpni_pref_nnids == 1) {
+			tmp_nid = lpni->lpni_pref.nid;
+			INIT_LIST_HEAD(&lpni->lpni_pref.nids);
+		}
+
+		list_for_each_entry(ne1, &lpni->lpni_pref.nids, nl_list) {
+			if (ne1->nl_nid == nid) {
+				rc = -EEXIST;
+				goto out;
+			}
+		}
+
+		ne1 = kzalloc_cpt(alloc_size, GFP_KERNEL, lpni->lpni_cpt);
+		if (!ne1) {
 			rc = -ENOMEM;
 			goto out;
 		}
-		for (i = 0; i < lpni->lpni_pref_nnids; i++) {
-			if (lpni->lpni_pref.nids[i] == nid) {
-				kfree(nids);
-				rc = -EEXIST;
+
+		/* move the originally stored nid to the list */
+		if (lpni->lpni_pref_nnids == 1) {
+			ne2 = kzalloc_cpt(alloc_size, GFP_KERNEL,
+					  lpni->lpni_cpt);
+			if (!ne2) {
+				rc = -ENOMEM;
 				goto out;
 			}
-			nids[i] = lpni->lpni_pref.nids[i];
+			INIT_LIST_HEAD(&ne2->nl_list);
+			ne2->nl_nid = tmp_nid;
 		}
-		nids[i] = nid;
+		ne1->nl_nid = nid;
 	}
 
 	lnet_net_lock(LNET_LOCK_EX);
@@ -1026,15 +1045,15 @@ struct lnet_peer_ni *
 	if (lpni->lpni_pref_nnids == 0) {
 		lpni->lpni_pref.nid = nid;
 	} else {
-		oldnids = lpni->lpni_pref.nids;
-		lpni->lpni_pref.nids = nids;
+		if (ne2)
+			list_add_tail(&ne2->nl_list, &lpni->lpni_pref.nids);
+		list_add_tail(&ne1->nl_list, &lpni->lpni_pref.nids);
 	}
 	lpni->lpni_pref_nnids++;
 	lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF;
 	spin_unlock(&lpni->lpni_lock);
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	kfree(oldnids);
 out:
 	if (rc == -EEXIST && (lpni->lpni_state & LNET_PEER_NI_NON_MR_PREF)) {
 		spin_lock(&lpni->lpni_lock);
@@ -1049,11 +1068,8 @@ struct lnet_peer_ni *
 int
 lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid)
 {
-	lnet_nid_t *nids = NULL;
-	lnet_nid_t *oldnids = NULL;
 	struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer;
-	int size;
-	int i, j;
+	struct lnet_nid_list *ne = NULL;
 	int rc = 0;
 
 	if (lpni->lpni_pref_nnids == 0) {
@@ -1066,52 +1082,41 @@ struct lnet_peer_ni *
 			rc = -ENOENT;
 			goto out;
 		}
-	} else if (lpni->lpni_pref_nnids == 2) {
-		if (lpni->lpni_pref.nids[0] != nid &&
-		    lpni->lpni_pref.nids[1] != nid) {
-			rc = -ENOENT;
-			goto out;
-		}
 	} else {
-		size = sizeof(*nids) * (lpni->lpni_pref_nnids - 1);
-		nids = kzalloc_cpt(size, GFP_KERNEL, lpni->lpni_cpt);
-		if (!nids) {
-			rc = -ENOMEM;
-			goto out;
-		}
-		for (i = 0, j = 0; i < lpni->lpni_pref_nnids; i++) {
-			if (lpni->lpni_pref.nids[i] != nid)
-				continue;
-			nids[j++] = lpni->lpni_pref.nids[i];
-		}
-		/* Check if we actually removed a nid. */
-		if (j == lpni->lpni_pref_nnids) {
-			kfree(nids);
-			rc = -ENOENT;
-			goto out;
+		list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) {
+			if (ne->nl_nid == nid)
+				goto remove_nid_entry;
 		}
+		rc = -ENOENT;
+		ne = NULL;
+		goto out;
 	}
 
+remove_nid_entry:
 	lnet_net_lock(LNET_LOCK_EX);
 	spin_lock(&lpni->lpni_lock);
 	if (lpni->lpni_pref_nnids == 1) {
 		lpni->lpni_pref.nid = LNET_NID_ANY;
-	} else if (lpni->lpni_pref_nnids == 2) {
-		oldnids = lpni->lpni_pref.nids;
-		if (oldnids[0] == nid)
-			lpni->lpni_pref.nid = oldnids[1];
-		else
-			lpni->lpni_pref.nid = oldnids[2];
 	} else {
-		oldnids = lpni->lpni_pref.nids;
-		lpni->lpni_pref.nids = nids;
+		list_del_init(&ne->nl_list);
+		if (lpni->lpni_pref_nnids == 2) {
+			struct lnet_nid_list *ne, *tmp;
+
+			list_for_each_entry_safe(ne, tmp,
+						 &lpni->lpni_pref.nids,
+						 nl_list) {
+				lpni->lpni_pref.nid = ne->nl_nid;
+				list_del_init(&ne->nl_list);
+				kfree(ne);
+			}
+		}
 	}
 	lpni->lpni_pref_nnids--;
 	lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF;
 	spin_unlock(&lpni->lpni_lock);
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	kfree(oldnids);
+	kfree(ne);
 out:
 	CDEBUG(D_NET, "peer %s nid %s: %d\n",
 	       libcfs_nid2str(lp->lp_primary_nid), libcfs_nid2str(nid), rc);
@@ -1707,8 +1712,15 @@ struct lnet_peer_net *
 		spin_unlock(&ptable->pt_zombie_lock);
 	}
 
-	if (lpni->lpni_pref_nnids > 1)
-		kfree(lpni->lpni_pref.nids);
+	if (lpni->lpni_pref_nnids > 1) {
+		struct lnet_nid_list *ne, *tmp;
+
+		list_for_each_entry_safe(ne, tmp, &lpni->lpni_pref.nids,
+					 nl_list) {
+			list_del_init(&ne->nl_list);
+			kfree(ne);
+		}
+	}
 	kfree(lpni);
 
 	if (lpn)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 09/41] lnet: Preferred gateway selection
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (7 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 08/41] lnet: foundation patch for selection mod James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 10/41] lnet: Select NI/peer NI with highest prio James Simmons
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Add mechanism for managing preferred gateway lists.
When selecting a route through a gateway, if there exists
a preferred gateway list for the destination peer, then choose
the preferred gateway. If there are multiple preferred
gateways, to make the selection, use in order of decreasing
importance: route priority, number of hops, number of available
tx credits on the associated lpni and route sequence counters.
If there are no preferred routes, select the best route
available using the same criteria.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 66acff74d0da31e ("LU-9121 lnet: Preferred gateway selection")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34353
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |   5 ++
 net/lnet/lnet/lib-move.c      | 119 ++++++++++++++++++++++++++++++------------
 net/lnet/lnet/peer.c          | 111 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 34 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 927ca44..90f18a0 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -806,6 +806,11 @@ struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp,
 struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer,
 					       u32 net_id);
 bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni);
+bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, lnet_nid_t gw_nid);
+void lnet_peer_clr_pref_rtrs(struct lnet_peer_ni *lpni);
+int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, lnet_nid_t nid);
 int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
 int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr);
 int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 4687acd..8763c3f 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1097,24 +1097,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	}
 }
 
-static int
-lnet_compare_gw_lpnis(struct lnet_peer_ni *p1, struct lnet_peer_ni *p2)
-{
-	if (p1->lpni_txqnob < p2->lpni_txqnob)
-		return 1;
-
-	if (p1->lpni_txqnob > p2->lpni_txqnob)
-		return -1;
-
-	if (p1->lpni_txcredits > p2->lpni_txcredits)
-		return 1;
-
-	if (p1->lpni_txcredits < p2->lpni_txcredits)
-		return -1;
-
-	return 0;
-}
-
 static struct lnet_peer_ni *
 lnet_select_peer_ni(struct lnet_ni *best_ni, lnet_nid_t dst_nid,
 		    struct lnet_peer *peer,
@@ -1246,6 +1228,24 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	return NULL;
 }
 
+static int
+lnet_compare_gw_lpnis(struct lnet_peer_ni *lpni1, struct lnet_peer_ni *lpni2)
+{
+	if (lpni1->lpni_txqnob < lpni2->lpni_txqnob)
+		return 1;
+
+	if (lpni1->lpni_txqnob > lpni2->lpni_txqnob)
+		return -1;
+
+	if (lpni1->lpni_txcredits > lpni2->lpni_txcredits)
+		return 1;
+
+	if (lpni1->lpni_txcredits < lpni2->lpni_txcredits)
+		return -1;
+
+	return 0;
+}
+
 /* Compare route priorities and hop counts */
 static int
 lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2)
@@ -1270,6 +1270,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 
 static struct lnet_route *
 lnet_find_route_locked(struct lnet_remotenet *rnet, u32 src_net,
+		       struct lnet_peer_ni *remote_lpni,
 		       struct lnet_route **prev_route,
 		       struct lnet_peer_ni **gwni)
 {
@@ -1278,6 +1279,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	struct lnet_route *last_route;
 	struct lnet_route *route;
 	int rc;
+	bool best_rte_is_preferred = false;
+	lnet_nid_t gw_pnid;
 
 	CDEBUG(D_NET, "Looking up a route to %s, from %s\n",
 	       libcfs_net2str(rnet->lrn_net), libcfs_net2str(src_net));
@@ -1287,44 +1290,76 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
 		if (!lnet_is_route_alive(route))
 			continue;
+		gw_pnid = route->lr_gateway->lp_primary_nid;
+
+		/* no protection on below fields, but it's harmless */
+		if (last_route && (last_route->lr_seq - route->lr_seq < 0))
+			last_route = route;
 
-		/* Restrict the selection of the router NI on the src_net
-		 * provided. If the src_net is LNET_NID_ANY, then select
-		 * the best interface available.
+		/* if the best route found is in the preferred list then
+		 * tag it as preferred and use it later on. But if we
+		 * didn't find any routes which are on the preferred list
+		 * then just use the best route possible.
 		 */
-		if (!best_route) {
+		rc = lnet_peer_is_pref_rtr_locked(remote_lpni, gw_pnid);
+
+		if (!best_route || (rc && !best_rte_is_preferred)) {
+			/* Restrict the selection of the router NI on the
+			 * src_net provided. If the src_net is LNET_NID_ANY,
+			 * then select the best interface available.
+			 */
 			lpni = lnet_find_best_lpni(NULL, LNET_NID_ANY,
 						   route->lr_gateway,
 						   src_net);
-			if (lpni) {
-				best_route = route;
-				last_route = route;
-				best_gw_ni = lpni;
-			} else {
+			if (!lpni) {
 				CDEBUG(D_NET,
 				       "Gateway %s does not have a peer NI on net %s\n",
-				       libcfs_nid2str(route->lr_gateway->lp_primary_nid),
+				       libcfs_nid2str(gw_pnid),
 				       libcfs_net2str(src_net));
+				continue;
 			}
-			continue;
 		}
 
-		/* no protection on below fields, but it's harmless */
-		if (last_route->lr_seq - route->lr_seq < 0)
+		if (rc && !best_rte_is_preferred) {
+			/* This is the first preferred route we found,
+			 * so it beats any route found previously
+			 */
+			best_route = route;
+			if (!last_route)
+				last_route = route;
+			best_gw_ni = lpni;
+			best_rte_is_preferred = true;
+			CDEBUG(D_NET, "preferred gw = %s\n",
+			       libcfs_nid2str(gw_pnid));
+			continue;
+		} else if ((!rc) && best_rte_is_preferred)
+			/* The best route we found so far is in the preferred
+			 * list, so it beats any non-preferred route
+			 */
+			continue;
+
+		if (!best_route) {
+			best_route = route;
 			last_route = route;
+			best_gw_ni = lpni;
+			continue;
+		}
 
 		rc = lnet_compare_routes(route, best_route);
 		if (rc == -1)
 			continue;
 
+		/* Restrict the selection of the router NI on the
+		 * src_net provided. If the src_net is LNET_NID_ANY,
+		 * then select the best interface available.
+		 */
 		lpni = lnet_find_best_lpni(NULL, LNET_NID_ANY,
 					   route->lr_gateway,
 					   src_net);
-		/* restrict the lpni on the src_net if specified */
 		if (!lpni) {
 			CDEBUG(D_NET,
 			       "Gateway %s does not have a peer NI on net %s\n",
-			       libcfs_nid2str(route->lr_gateway->lp_primary_nid),
+			       libcfs_nid2str(gw_pnid),
 			       libcfs_net2str(src_net));
 			continue;
 		}
@@ -1805,6 +1840,8 @@ struct lnet_ni *
 	lnet_nid_t src_nid = (sd->sd_src_nid != LNET_NID_ANY) ? sd->sd_src_nid :
 			      sd->sd_best_ni ? sd->sd_best_ni->ni_nid :
 			      LNET_NID_ANY;
+	int best_lpn_healthv = 0;
+	u32 best_lpn_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 
 	CDEBUG(D_NET, "using src nid %s for route restriction\n",
 	       libcfs_nid2str(src_nid));
@@ -1861,9 +1898,22 @@ struct lnet_ni *
 					best_rnet = rnet;
 				}
 
-				if (best_lpn->lpn_seq <= lpn->lpn_seq)
+				/* select the preferred peer net */
+				if (best_lpn_healthv > lpn->lpn_healthv)
 					continue;
+				else if (best_lpn_healthv < lpn->lpn_healthv)
+					goto use_lpn;
 
+				if (best_lpn_sel_prio < lpn->lpn_sel_priority)
+					continue;
+				else if (best_lpn_sel_prio > lpn->lpn_sel_priority)
+					goto use_lpn;
+
+				if (best_lpn->lpn_seq <= lpn->lpn_seq)
+					continue;
+use_lpn:
+				best_lpn_healthv = lpn->lpn_healthv;
+				best_lpn_sel_prio = lpn->lpn_sel_priority;
 				best_lpn = lpn;
 				best_rnet = rnet;
 			}
@@ -1905,6 +1955,7 @@ struct lnet_ni *
 		 */
 		best_route = lnet_find_route_locked(best_rnet,
 						    LNET_NIDNET(src_nid),
+						    sd->sd_best_lpni,
 						    &last_route, &gwni);
 		if (!best_route) {
 			CERROR("no route to %s from %s\n",
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 60e6b51..bbd43c8 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -894,6 +894,94 @@ struct lnet_peer_ni *
 	wake_up(&the_lnet.ln_dc_waitq);
 }
 
+/* find the NID in the preferred gateways for the remote peer
+ * return:
+ *	false: list is not empty and NID is not preferred
+ *	false: list is empty
+ *	true: nid is found in the list
+ */
+bool
+lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni,
+			     lnet_nid_t gw_nid)
+{
+	struct lnet_nid_list *ne;
+
+	CDEBUG(D_NET, "%s: rtr pref emtpy: %d\n",
+	       libcfs_nid2str(lpni->lpni_nid),
+	       list_empty(&lpni->lpni_rtr_pref_nids));
+
+	if (list_empty(&lpni->lpni_rtr_pref_nids))
+		return false;
+
+	/* iterate through all the preferred NIDs and see if any of them
+	 * matches the provided gw_nid
+	 */
+	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
+		CDEBUG(D_NET, "Comparing pref %s with gw %s\n",
+		       libcfs_nid2str(ne->nl_nid),
+		       libcfs_nid2str(gw_nid));
+		if (ne->nl_nid == gw_nid)
+			return true;
+	}
+
+	return false;
+}
+
+void
+lnet_peer_clr_pref_rtrs(struct lnet_peer_ni *lpni)
+{
+	struct list_head zombies;
+	struct lnet_nid_list *ne;
+	struct lnet_nid_list *tmp;
+	int cpt = lpni->lpni_cpt;
+
+	INIT_LIST_HEAD(&zombies);
+
+	lnet_net_lock(cpt);
+	list_splice_init(&lpni->lpni_rtr_pref_nids, &zombies);
+	lnet_net_unlock(cpt);
+
+	list_for_each_entry_safe(ne, tmp, &zombies, nl_list) {
+		list_del(&ne->nl_list);
+		kfree(ne);
+	}
+}
+
+int
+lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni,
+		       lnet_nid_t gw_nid)
+{
+	int cpt = lpni->lpni_cpt;
+	struct lnet_nid_list *ne = NULL;
+
+	/* This function is called with api_mutex held. When the api_mutex
+	 * is held the list can not be modified, as it is only modified as
+	 * a result of applying a UDSP and that happens under api_mutex
+	 * lock.
+	 */
+	__must_hold(&the_lnet.ln_api_mutex);
+
+	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
+		if (ne->nl_nid == gw_nid)
+			return -EEXIST;
+	}
+
+	ne = kzalloc_cpt(sizeof(*ne), GFP_KERNEL, cpt);
+	if (!ne)
+		return -ENOMEM;
+
+	ne->nl_nid = gw_nid;
+
+	/* Lock the cpt to protect against addition and checks in the
+	 * selection algorithm
+	 */
+	lnet_net_lock(cpt);
+	list_add(&ne->nl_list, &lpni->lpni_rtr_pref_nids);
+	lnet_net_unlock(cpt);
+
+	return 0;
+}
+
 /*
  * Test whether a ni is a preferred ni for this peer_ni, e.g, whether
  * this is a preferred point-to-point path. Call with lnet_net_lock in
@@ -1123,6 +1211,29 @@ struct lnet_peer_ni *
 	return rc;
 }
 
+void
+lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni)
+{
+	struct list_head zombies;
+	struct lnet_nid_list *ne;
+	struct lnet_nid_list *tmp;
+
+	INIT_LIST_HEAD(&zombies);
+
+	lnet_net_lock(LNET_LOCK_EX);
+	if (lpni->lpni_pref_nnids == 1)
+		lpni->lpni_pref.nid = LNET_NID_ANY;
+	else if (lpni->lpni_pref_nnids > 1)
+		list_splice_init(&lpni->lpni_pref.nids, &zombies);
+	lpni->lpni_pref_nnids = 0;
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	list_for_each_entry_safe(ne, tmp, &zombies, nl_list) {
+		list_del_init(&ne->nl_list);
+		kfree(ne);
+	}
+}
+
 lnet_nid_t
 lnet_peer_primary_nid_locked(lnet_nid_t nid)
 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 10/41] lnet: Select NI/peer NI with highest prio
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (8 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 09/41] lnet: Preferred gateway selection James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 11/41] lnet: select best peer and local net James Simmons
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Modify the selection algorithm to select the highest priority
local and peer NI. Health always trumps all other selection
criteria

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 374fcb2caea3ca0 ("LU-9121 lnet: Select NI/peer NI with highest prio")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34351
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 148 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 95 insertions(+), 53 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 8763c3f..166ebcc 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1112,65 +1112,91 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	 */
 	struct lnet_peer_ni *lpni = NULL;
 	int best_lpni_credits =  (best_lpni) ? best_lpni->lpni_txcredits :
-					       INT_MIN;
+				 INT_MIN;
 	int best_lpni_healthv = (best_lpni) ?
 				atomic_read(&best_lpni->lpni_healthv) : 0;
-	bool preferred = false;
-	bool ni_is_pref;
+	bool best_lpni_is_preferred = false;
+	bool lpni_is_preferred;
 	int lpni_healthv;
+	u32 lpni_sel_prio;
+	u32 best_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 
 	while ((lpni = lnet_get_next_peer_ni_locked(peer, peer_net, lpni))) {
 		/* if the best_ni we've chosen aleady has this lpni
 		 * preferred, then let's use it
 		 */
 		if (best_ni) {
-			ni_is_pref = lnet_peer_is_pref_nid_locked(lpni,
-								  best_ni->ni_nid);
-			CDEBUG(D_NET, "%s ni_is_pref = %d\n",
-			       libcfs_nid2str(best_ni->ni_nid), ni_is_pref);
+			lpni_is_preferred = lnet_peer_is_pref_nid_locked(lpni,
+									 best_ni->ni_nid);
+			CDEBUG(D_NET, "%s lpni_is_preferred = %d\n",
+			       libcfs_nid2str(best_ni->ni_nid),
+			       lpni_is_preferred);
 		} else {
-			ni_is_pref = false;
+			lpni_is_preferred = false;
 		}
 
 		lpni_healthv = atomic_read(&lpni->lpni_healthv);
+		lpni_sel_prio = lpni->lpni_sel_priority;
 
 		if (best_lpni)
-			CDEBUG(D_NET, "%s c:[%d, %d], s:[%d, %d]\n",
+			CDEBUG(D_NET,
+			       "n:[%s, %s] h:[%d, %d] p:[%d, %d] c:[%d, %d] s:[%d, %d]\n",
 			       libcfs_nid2str(lpni->lpni_nid),
+			       libcfs_nid2str(best_lpni->lpni_nid),
+			       lpni_healthv, best_lpni_healthv,
+			       lpni_sel_prio, best_sel_prio,
 			       lpni->lpni_txcredits, best_lpni_credits,
 			       lpni->lpni_seq, best_lpni->lpni_seq);
+		else
+			goto select_lpni;
 
 		/* pick the healthiest peer ni */
 		if (lpni_healthv < best_lpni_healthv) {
 			continue;
 		} else if (lpni_healthv > best_lpni_healthv) {
-			best_lpni_healthv = lpni_healthv;
+			if (best_lpni_is_preferred)
+				best_lpni_is_preferred = false;
+			goto select_lpni;
+		}
+
+		if (lpni_sel_prio > best_sel_prio) {
+			continue;
+		} else if (lpni_sel_prio < best_sel_prio) {
+			if (best_lpni_is_preferred)
+				best_lpni_is_preferred = false;
+			goto select_lpni;
+		}
+
 		/* if this is a preferred peer use it */
-		} else if (!preferred && ni_is_pref) {
-			preferred = true;
-		} else if (preferred && !ni_is_pref) {
+		if (!best_lpni_is_preferred && lpni_is_preferred) {
+			best_lpni_is_preferred = true;
+			goto select_lpni;
+		} else if (best_lpni_is_preferred && !lpni_is_preferred) {
 			/* this is not the preferred peer so let's ignore
 			 * it.
 			 */
 			continue;
-		} else if (lpni->lpni_txcredits < best_lpni_credits) {
+		}
+
+		if (lpni->lpni_txcredits < best_lpni_credits)
 			/* We already have a peer that has more credits
 			 * available than this one. No need to consider
 			 * this peer further.
 			 */
 			continue;
-		} else if (lpni->lpni_txcredits == best_lpni_credits) {
-			/* The best peer found so far and the current peer
-			 * have the same number of available credits let's
-			 * make sure to select between them using Round
-			 * Robin
-			 */
-			if (best_lpni) {
-				if (best_lpni->lpni_seq <= lpni->lpni_seq)
-					continue;
-			}
-		}
+		else if (lpni->lpni_txcredits > best_lpni_credits)
+			goto select_lpni;
 
+		/* The best peer found so far and the current peer
+		 * have the same number of available credits let's
+		 * make sure to select between them using Round Robin
+		 */
+		if (best_lpni && best_lpni->lpni_seq <= lpni->lpni_seq)
+			continue;
+select_lpni:
+		best_lpni_is_preferred = lpni_is_preferred;
+		best_lpni_healthv = lpni_healthv;
+		best_sel_prio = lpni_sel_prio;
 		best_lpni = lpni;
 		best_lpni_credits = lpni->lpni_txcredits;
 	}
@@ -1178,7 +1204,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	/* if we still can't find a peer ni then we can't reach it */
 	if (!best_lpni) {
 		u32 net_id = (peer_net) ? peer_net->lpn_net_id :
-			LNET_NIDNET(dst_nid);
+			     LNET_NIDNET(dst_nid);
 		CDEBUG(D_NET, "no peer_ni found on peer net %s\n",
 		       libcfs_net2str(net_id));
 		return NULL;
@@ -1396,6 +1422,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	unsigned int shortest_distance;
 	int best_credits;
 	int best_healthv;
+	u32 best_sel_prio;
 
 	/* If there is no peer_ni that we can send to on this network,
 	 * then there is no point in looking for a new best_ni here.
@@ -1404,6 +1431,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		return best_ni;
 
 	if (!best_ni) {
+		best_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 		shortest_distance = UINT_MAX;
 		best_credits = INT_MIN;
 		best_healthv = 0;
@@ -1412,6 +1440,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 						     best_ni->ni_dev_cpt);
 		best_credits = atomic_read(&best_ni->ni_tx_credits);
 		best_healthv = atomic_read(&best_ni->ni_healthv);
+		best_sel_prio = best_ni->ni_sel_priority;
 	}
 
 	while ((ni = lnet_get_next_ni_locked(local_net, ni))) {
@@ -1419,10 +1448,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		int ni_credits;
 		int ni_healthv;
 		int ni_fatal;
+		u32 ni_sel_prio;
 
 		ni_credits = atomic_read(&ni->ni_tx_credits);
 		ni_healthv = atomic_read(&ni->ni_healthv);
 		ni_fatal = atomic_read(&ni->ni_fatal_error_on);
+		ni_sel_prio = ni->ni_sel_priority;
 
 		/*
 		 * calculate the distance from the CPT on which
@@ -1433,13 +1464,6 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 					    md_cpt,
 					    ni->ni_dev_cpt);
 
-		CDEBUG(D_NET,
-		       "compare ni %s [c:%d, d:%d, s:%d] with best_ni %s [c:%d, d:%d, s:%d]\n",
-		       libcfs_nid2str(ni->ni_nid), ni_credits, distance,
-		       ni->ni_seq, (best_ni) ? libcfs_nid2str(best_ni->ni_nid)
-			: "not seleced", best_credits, shortest_distance,
-			(best_ni) ? best_ni->ni_seq : 0);
-
 		/*
 		 * All distances smaller than the NUMA range
 		 * are treated equally.
@@ -1451,30 +1475,48 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 * Select on health, shorter distance, available
 		 * credits, then round-robin.
 		 */
-		if (ni_fatal) {
+		if (ni_fatal)
 			continue;
-		} else if (ni_healthv < best_healthv) {
+
+		if (best_ni)
+			CDEBUG(D_NET,
+			       "compare ni %s [c:%d, d:%d, s:%d, p:%u] with best_ni %s [c:%d, d:%d, s:%d, p:%u]\n",
+			       libcfs_nid2str(ni->ni_nid), ni_credits, distance,
+			       ni->ni_seq, ni_sel_prio,
+			       (best_ni) ? libcfs_nid2str(best_ni->ni_nid)
+			       : "not selected", best_credits, shortest_distance,
+			       (best_ni) ? best_ni->ni_seq : 0,
+			       best_sel_prio);
+		else
+			goto select_ni;
+
+		if (ni_healthv < best_healthv)
 			continue;
-		} else if (ni_healthv > best_healthv) {
-			best_healthv = ni_healthv;
-			/* If we're going to prefer this ni because it's
-			 * the healthiest, then we should set the
-			 * shortest_distance in the algorithm in case
-			 * there are multiple NIs with the same health but
-			 * different distances.
-			 */
-			if (distance < shortest_distance)
-				shortest_distance = distance;
-		} else if (distance > shortest_distance) {
+		else if (ni_healthv > best_healthv)
+			goto select_ni;
+
+		if (ni_sel_prio > best_sel_prio)
 			continue;
-		} else if (distance < shortest_distance) {
-			shortest_distance = distance;
-		} else if (ni_credits < best_credits) {
+		else if (ni_sel_prio < best_sel_prio)
+			goto select_ni;
+
+		if (distance > shortest_distance)
 			continue;
-		} else if (ni_credits == best_credits) {
-			if (best_ni && best_ni->ni_seq <= ni->ni_seq)
-				continue;
-		}
+		else if (distance < shortest_distance)
+			goto select_ni;
+
+		if (ni_credits < best_credits)
+			continue;
+		else if (ni_credits > best_credits)
+			goto select_ni;
+
+		if (best_ni && best_ni->ni_seq <= ni->ni_seq)
+			continue;
+
+select_ni:
+		best_sel_prio = ni_sel_prio;
+		shortest_distance = distance;
+		best_healthv = ni_healthv;
 		best_ni = ni;
 		best_credits = ni_credits;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 11/41] lnet: select best peer and local net
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (9 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 10/41] lnet: Select NI/peer NI with highest prio James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 12/41] lnet: UDSP handling James Simmons
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Select the healthiest and highest priority peer and local net when
sending a message.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 7d309d57fd843f1 ("LU-9121 lnet: select best peer and local net")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34352
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h  |   2 +
 include/linux/lnet/lib-types.h |   3 +
 net/lnet/lnet/api-ni.c         |  15 +++++
 net/lnet/lnet/lib-move.c       | 125 +++++++++++++++++++++++++++++++----------
 4 files changed, 116 insertions(+), 29 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 90f18a0..5152c0a70 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -507,6 +507,8 @@ int lnet_get_route(int idx, u32 *net, u32 *hops,
 struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
 					struct lnet_ni *prev);
 struct lnet_ni *lnet_get_ni_idx_locked(int idx);
+int lnet_get_net_healthv_locked(struct lnet_net *net);
+
 int lnet_get_peer_list(u32 *countp, u32 *sizep,
 		       struct lnet_process_id __user *ids);
 extern void lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 187e1f3..f1f4eac5 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -359,6 +359,9 @@ struct lnet_net {
 	 * lnet/include/lnet/nidstr.h */
 	u32			net_id;
 
+	/* round robin selection */
+	u32			net_seq;
+
 	/* total number of CPTs in the array */
 	u32			net_ncpts;
 
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 3acc86e..2c31b06 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -2931,6 +2931,21 @@ struct lnet_ni *
 	return NULL;
 }
 
+int lnet_get_net_healthv_locked(struct lnet_net *net)
+{
+	struct lnet_ni *ni;
+	int best_healthv = 0;
+	int healthv;
+
+	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+		healthv = atomic_read(&ni->ni_healthv);
+		if (healthv > best_healthv)
+			best_healthv = healthv;
+	}
+
+	return best_healthv;
+}
+
 struct lnet_ni *
 lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev)
 {
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 166ebcc..4dcc68a 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1602,10 +1602,25 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	u32 routing = send_case & REMOTE_DST;
 	struct lnet_rsp_tracker *rspt;
 
-	/* Increment sequence number of the selected peer so that we
-	 * pick the next one in Round Robin.
+	/* Increment sequence number of the selected peer, peer net,
+	 * local ni and local net so that we pick the next ones
+	 * in Round Robin.
 	 */
 	best_lpni->lpni_seq++;
+	best_lpni->lpni_peer_net->lpn_seq++;
+	best_ni->ni_seq++;
+	best_ni->ni_net->net_seq++;
+
+	CDEBUG(D_NET,
+	       "%s NI seq info: [%d:%d:%d:%u] %s LPNI seq info [%d:%d:%d:%u]\n",
+	       libcfs_nid2str(best_ni->ni_nid),
+	       best_ni->ni_seq, best_ni->ni_net->net_seq,
+	       atomic_read(&best_ni->ni_tx_credits),
+	       best_ni->ni_sel_priority,
+	       libcfs_nid2str(best_lpni->lpni_nid),
+	       best_lpni->lpni_seq, best_lpni->lpni_peer_net->lpn_seq,
+	       best_lpni->lpni_txcredits,
+	       best_lpni->lpni_sel_priority);
 
 	/* grab a reference on the peer_ni so it sticks around even if
 	 * we need to drop and relock the lnet_net_lock below.
@@ -1787,8 +1802,7 @@ struct lnet_ni *
 lnet_find_best_ni_on_spec_net(struct lnet_ni *cur_best_ni,
 			      struct lnet_peer *peer,
 			      struct lnet_peer_net *peer_net,
-			      int cpt,
-			      bool incr_seq)
+			      int cpt)
 {
 	struct lnet_net *local_net;
 	struct lnet_ni *best_ni;
@@ -1807,9 +1821,6 @@ struct lnet_ni *
 	best_ni = lnet_get_best_ni(local_net, cur_best_ni,
 				   peer, peer_net, cpt);
 
-	if (incr_seq && best_ni)
-		best_ni->ni_seq++;
-
 	return best_ni;
 }
 
@@ -2032,8 +2043,7 @@ struct lnet_ni *
 
 		lpeer = lnet_peer_get_net_locked(gw, local_lnet);
 		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpeer,
-							       sd->sd_md_cpt,
-							       true);
+							       sd->sd_md_cpt);
 	}
 
 	if (!sd->sd_best_ni) {
@@ -2115,9 +2125,19 @@ struct lnet_ni *
 lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt,
 			       bool discovery)
 {
-	struct lnet_peer_net *peer_net = NULL;
+	struct lnet_peer_net *lpn = NULL;
+	struct lnet_peer_net *best_lpn = NULL;
+	struct lnet_net *net = NULL;
+	struct lnet_net *best_net = NULL;
 	struct lnet_ni *best_ni = NULL;
-	int lpn_healthv = 0;
+	int best_lpn_healthv = 0;
+	int best_net_healthv = 0;
+	int net_healthv;
+	u32 best_lpn_sel_prio = LNET_MAX_SELECTION_PRIORITY;
+	u32 lpn_sel_prio;
+	u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY;
+	u32 net_sel_prio;
+	bool exit = false;
 
 	/* The peer can have multiple interfaces, some of them can be on
 	 * the local network and others on a routed network. We should
@@ -2126,32 +2146,80 @@ struct lnet_ni *
 	 */
 
 	/* go through all the peer nets and find the best_ni */
-	list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) {
+	list_for_each_entry(lpn, &peer->lp_peer_nets, lpn_peer_nets) {
 		/* The peer's list of nets can contain non-local nets. We
 		 * want to only examine the local ones.
 		 */
-		if (!lnet_get_net_locked(peer_net->lpn_net_id))
+		net = lnet_get_net_locked(lpn->lpn_net_id);
+		if (!net)
 			continue;
 
-		/* always select the lpn with the best health */
-		if (lpn_healthv <= peer_net->lpn_healthv)
-			lpn_healthv = peer_net->lpn_healthv;
-		else
-			continue;
+		lpn_sel_prio = lpn->lpn_sel_priority;
+		net_healthv = lnet_get_net_healthv_locked(net);
+		net_sel_prio = net->net_sel_priority;
 
-		best_ni = lnet_find_best_ni_on_spec_net(best_ni, peer, peer_net,
-							md_cpt, false);
 		/* if this is a discovery message and lp_disc_net_id is
 		 * specified then use that net to send the discovery on.
 		 */
-		if (peer->lp_disc_net_id == peer_net->lpn_net_id &&
-		    discovery)
+		if (peer->lp_disc_net_id == lpn->lpn_net_id &&
+		    discovery) {
+			exit = true;
+			goto select_lpn;
+		}
+
+		if (!best_lpn)
+			goto select_lpn;
+
+		/* always select the lpn with the best health */
+		if (best_lpn_healthv > lpn->lpn_healthv)
+			continue;
+		else if (best_lpn_healthv < lpn->lpn_healthv)
+			goto select_lpn;
+
+		/* select the preferred peer and local nets */
+		if (best_lpn_sel_prio < lpn_sel_prio)
+			continue;
+		else if (best_lpn_sel_prio > lpn_sel_prio)
+			goto select_lpn;
+
+		if (best_net_healthv > net_healthv)
+			continue;
+		else if (best_net_healthv < net_healthv)
+			goto select_lpn;
+
+		if (best_net_sel_prio < net_sel_prio)
+			continue;
+		else if (best_net_sel_prio > net_sel_prio)
+			goto select_lpn;
+
+		if (best_lpn->lpn_seq < lpn->lpn_seq)
+			continue;
+		else if (best_lpn->lpn_seq > lpn->lpn_seq)
+			goto select_lpn;
+
+		/* round robin over the local networks */
+		if (best_net->net_seq <= net->net_seq)
+			continue;
+
+select_lpn:
+		best_net_healthv = net_healthv;
+		best_net_sel_prio = net_sel_prio;
+		best_lpn_healthv = lpn->lpn_healthv;
+		best_lpn_sel_prio = lpn_sel_prio;
+		best_lpn = lpn;
+		best_net = net;
+
+		if (exit)
 			break;
 	}
 
-	if (best_ni)
-		/* increment sequence number so we can round robin */
-		best_ni->ni_seq++;
+	if (best_lpn) {
+		/* Select the best NI on the same net as best_lpn chosen
+		 * above
+		 */
+		best_ni = lnet_find_best_ni_on_spec_net(NULL, peer,
+							best_lpn, md_cpt);
+	}
 
 	return best_ni;
 }
@@ -2210,7 +2278,7 @@ struct lnet_ni *
 		best_ni =
 			lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
 						      sd->sd_best_lpni->lpni_peer_net,
-						      sd->sd_md_cpt, true);
+						      sd->sd_md_cpt);
 		/* If there is no best_ni we don't have a route */
 		if (!best_ni) {
 			CERROR("no path to %s from net %s\n",
@@ -2262,8 +2330,7 @@ struct lnet_ni *
 		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL,
 							       sd->sd_peer,
 							       sd->sd_best_lpni->lpni_peer_net,
-							       sd->sd_md_cpt,
-							       true);
+							       sd->sd_md_cpt);
 		if (!sd->sd_best_ni) {
 			CERROR("Unable to forward message to %s. No local NI available\n",
 			       libcfs_nid2str(sd->sd_dst_nid));
@@ -2295,7 +2362,7 @@ struct lnet_ni *
 		sd->sd_best_ni =
 		  lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
 						sd->sd_best_lpni->lpni_peer_net,
-						sd->sd_md_cpt, true);
+						sd->sd_md_cpt);
 
 		if (!sd->sd_best_ni) {
 			/* We're not going to deal with not able to send
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 12/41] lnet: UDSP handling
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (10 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 11/41] lnet: select best peer and local net James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 13/41] lnet: Apply UDSP on local and remote NIs James Simmons
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

This patch adds the following functionality:
1. Add UDSPs
2. Delete UDSPs
3. Apply UDSPs

- Adding a local network udsp: if multiple local networks are
available, each one can have a priority.
- Adding a local NID udsp: after a local network is chosen,
if there are multiple NIs, each one can have a priority.
- Adding a remote NID udsp: assign priority to peer NIDs.
- Adding a NID pair udsp: allows to specify local] NIDs
to be added to the list on the specified peer NIs. When
selecting a peer NI, the one with the local NID being used
on its list is preferred.
- Adding a Router udsp: similar to the NID pair udsp.
Specified router NIDs are added on the list on the specified
peer NIs. When sending to the remote peer, remote net is
selected and the peer NID is selected. The router which has
its nid on the peer NI list is preferred.
- Deleting a udsp: use the specified policy index to remove it
from the policy list.

Generally, the syntax is as follows
 lnetctl policy <add | del | show>
  --src: ip2nets syntax specifying the local NID to match
  --dst: ip2nets syntax specifying the remote NID to match
  --rte: ip2nets syntax specifying the router NID to match
  --priority: Priority to apply to rule matches
  --idx: Index of where to insert the rule. By default it appends
     to the end of the rule list

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: e5ea6387eb9f882 ("LU-9121 lnet: UDSP handling")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34354
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h    |   38 ++
 include/linux/lnet/udsp.h        |  117 +++++
 include/uapi/linux/lnet/nidstr.h |    4 +
 net/lnet/lnet/Makefile           |    2 +-
 net/lnet/lnet/api-ni.c           |   87 ++++
 net/lnet/lnet/nidstrings.c       |   66 +++
 net/lnet/lnet/peer.c             |    6 +
 net/lnet/lnet/udsp.c             | 1051 ++++++++++++++++++++++++++++++++++++++
 8 files changed, 1370 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/lnet/udsp.h
 create mode 100644 net/lnet/lnet/udsp.c

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 5152c0a70..1efac9b 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -95,6 +95,7 @@
 extern struct kmem_cache *lnet_small_mds_cachep; /* <= LNET_SMALL_MD_SIZE bytes
 						  * MDs kmem_cache
 						  */
+extern struct kmem_cache *lnet_udsp_cachep;
 extern struct kmem_cache *lnet_rspt_cachep;
 extern struct kmem_cache *lnet_msg_cachep;
 
@@ -513,6 +514,11 @@ int lnet_get_peer_list(u32 *countp, u32 *sizep,
 		       struct lnet_process_id __user *ids);
 extern void lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all);
 extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni);
+extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+extern void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni);
+extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
+void lnet_peer_ni_set_selection_priority(struct lnet_peer_ni *lpni,
+					 u32 priority);
 
 void lnet_router_debugfs_init(void);
 void lnet_router_debugfs_fini(void);
@@ -531,6 +537,8 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src,
 int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf);
 int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
 struct lnet_net *lnet_get_net_locked(u32 net_id);
+void lnet_net_clr_pref_rtrs(struct lnet_net *net);
+int lnet_net_add_pref_rtr(struct lnet_net *net, lnet_nid_t gw_nid);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(u32 net);
@@ -670,6 +678,17 @@ int lnet_delay_rule_list(int pos, struct lnet_fault_attr *attr,
 void lnet_counters_get_common(struct lnet_counters_common *common);
 int lnet_counters_get(struct lnet_counters *counters);
 void lnet_counters_reset(void);
+static inline void
+lnet_ni_set_sel_priority_locked(struct lnet_ni *ni, u32 priority)
+{
+	ni->ni_sel_priority = priority;
+}
+
+static inline void
+lnet_net_set_sel_priority_locked(struct lnet_net *net, u32 priority)
+{
+	net->net_sel_priority = priority;
+}
 
 unsigned int lnet_iov_nob(unsigned int niov, struct kvec *iov);
 unsigned int lnet_kiov_nob(unsigned int niov, struct bio_vec *iov);
@@ -825,6 +844,13 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 			  u32 *peer_tx_qnob);
 int lnet_get_peer_ni_hstats(struct lnet_ioctl_peer_ni_hstats *stats);
 
+static inline void
+lnet_peer_net_set_sel_priority_locked(struct lnet_peer_net *lpn, u32 priority)
+{
+	lpn->lpn_sel_priority = priority;
+}
+
+
 static inline struct lnet_peer_net *
 lnet_find_peer_net_locked(struct lnet_peer *peer, u32 net_id)
 {
@@ -968,6 +994,18 @@ int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 	lnet_atomic_add_unless_max(healthv, value, LNET_MAX_HEALTH_VALUE);
 }
 
+static inline int
+lnet_get_list_len(struct list_head *list)
+{
+	struct list_head *l;
+	int count = 0;
+
+	list_for_each(l, list)
+		count++;
+
+	return count;
+}
+
 void lnet_incr_stats(struct lnet_element_stats *stats,
 		     enum lnet_msg_type msg_type,
 		     enum lnet_stats_type stats_type);
diff --git a/include/linux/lnet/udsp.h b/include/linux/lnet/udsp.h
new file mode 100644
index 0000000..265cb42
--- /dev/null
+++ b/include/linux/lnet/udsp.h
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
+ *
+ * Copyright (c) 2011, 2017, Intel Corporation.
+ *
+ * Copyright (c) 2018-2020 Data Direct Networks.
+ *
+ *   This file is part of Lustre, https://wiki.whamcloud.com/
+ *
+ *   Portals is free software; you can redistribute it and/or
+ *   modify it under the terms of version 2 of the GNU General Public
+ *   License as published by the Free Software Foundation.
+ *
+ *   Portals is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   version 2 along with this program; If not, see
+ *   http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * Author: Amir Shehata
+ */
+
+#ifndef UDSP_H
+#define UDSP_H
+
+#include <linux/lnet/lib-lnet.h>
+
+/**
+ * lnet_udsp_add_policy
+ *	Add a policy \new in position \idx
+ *	Must be called with api_mutex held
+ */
+int lnet_udsp_add_policy(struct lnet_udsp *new, int idx);
+
+/**
+ * lnet_udsp_get_policy
+ *	get a policy in position \idx
+ *	Must be called with api_mutex held
+ */
+struct lnet_udsp *lnet_udsp_get_policy(int idx);
+
+/**
+ * lnet_udsp_del_policy
+ *	Delete a policy from position \idx
+ *	Must be called with api_mutex held
+ */
+int lnet_udsp_del_policy(int idx);
+
+/**
+ * lnet_udsp_apply_policies
+ *	apply all stored policies across the system
+ *	Must be called with api_mutex held
+ *	Must NOT be called with lnet_net_lock held
+ *	udsp: NULL to apply on all existing udsps
+ *	      non-NULL to apply to specified udsp
+ *	revert: true to revert policy application
+ */
+int lnet_udsp_apply_policies(struct lnet_udsp *udsp, bool revert);
+
+/**
+ * lnet_udsp_apply_policies_on_lpni
+ *	apply all stored policies on specified \lpni
+ *	Must be called with api_mutex held
+ *	Must be called with LNET_LOCK_EX
+ */
+int lnet_udsp_apply_policies_on_lpni(struct lnet_peer_ni *lpni);
+
+/**
+ * lnet_udsp_apply_policies_on_lpn
+ *	Must be called with api_mutex held
+ *	apply all stored policies on specified \lpn
+ *	Must be called with LNET_LOCK_EX
+ */
+int lnet_udsp_apply_policies_on_lpn(struct lnet_peer_net *lpn);
+
+/**
+ * lnet_udsp_apply_policies_on_ni
+ *	apply all stored policies on specified \ni
+ *	Must be called with api_mutex held
+ *	Must be called with LNET_LOCK_EX
+ */
+int lnet_udsp_apply_policies_on_ni(struct lnet_ni *ni);
+
+/**
+ * lnet_udsp_apply_policies_on_net
+ *	apply all stored policies on specified \net
+ *	Must be called with api_mutex held
+ *	Must be called with LNET_LOCK_EX
+ */
+int lnet_udsp_apply_policies_on_net(struct lnet_net *net);
+
+/**
+ * lnet_udsp_alloc
+ *	Allocates a UDSP block and initializes it.
+ *	Return NULL if allocation fails
+ *	pointer to UDSP otherwise.
+ */
+struct lnet_udsp *lnet_udsp_alloc(void);
+
+/**
+ * lnet_udsp_free
+ *	Free a UDSP and all its descriptors
+ */
+void lnet_udsp_free(struct lnet_udsp *udsp);
+
+/**
+ * lnet_udsp_destroy
+ *	Free all the UDSPs
+ *	force: true to indicate shutdown in progress
+ */
+void lnet_udsp_destroy(bool shutdown);
+
+#endif /* UDSP_H */
diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h
index 34ba497..021ee0e 100644
--- a/include/uapi/linux/lnet/nidstr.h
+++ b/include/uapi/linux/lnet/nidstr.h
@@ -97,6 +97,10 @@ static inline char *libcfs_nid2str(lnet_nid_t nid)
 int cfs_parse_nidlist(char *str, int len, struct list_head *list);
 int cfs_print_nidlist(char *buffer, int count, struct list_head *list);
 int cfs_match_nid(lnet_nid_t nid, struct list_head *list);
+int cfs_match_nid_net(lnet_nid_t nid, __u32 net, struct list_head *net_num_list,
+		      struct list_head *addr);
+int cfs_match_net(__u32 net_id, __u32 net_type,
+		  struct list_head *net_num_list);
 
 int cfs_ip_addr_parse(char *str, int len, struct list_head *list);
 int cfs_ip_addr_match(__u32 addr, struct list_head *list);
diff --git a/net/lnet/lnet/Makefile b/net/lnet/lnet/Makefile
index 4442e07..9918008 100644
--- a/net/lnet/lnet/Makefile
+++ b/net/lnet/lnet/Makefile
@@ -2,7 +2,7 @@
 
 obj-$(CONFIG_LNET) += lnet.o
 
-lnet-y := api-ni.o config.o nidstrings.o net_fault.o		\
+lnet-y := api-ni.o config.o nidstrings.o net_fault.o udsp.o	\
 	  lib-me.o lib-msg.o lib-md.o lib-ptl.o			\
 	  lib-socket.o lib-move.o module.o lo.o			\
 	  router.o router_proc.o acceptor.o peer.o
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 2c31b06..4809c76 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -36,6 +36,7 @@
 #include <linux/ktime.h>
 #include <linux/moduleparam.h>
 
+#include <linux/lnet/udsp.h>
 #include <linux/lnet/lib-lnet.h>
 #include <uapi/linux/lnet/lnet-dlc.h>
 
@@ -538,6 +539,7 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 struct kmem_cache *lnet_small_mds_cachep;  /* <= LNET_SMALL_MD_SIZE bytes
 					    *  MDs kmem_cache
 					    */
+struct kmem_cache *lnet_udsp_cachep;	   /* udsp cache */
 struct kmem_cache *lnet_rspt_cachep;	   /* response tracker cache */
 struct kmem_cache *lnet_msg_cachep;
 
@@ -558,6 +560,12 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 	if (!lnet_small_mds_cachep)
 		return -ENOMEM;
 
+	lnet_udsp_cachep = kmem_cache_create("lnet_udsp",
+					     sizeof(struct lnet_udsp),
+					     0, 0, NULL);
+	if (!lnet_udsp_cachep)
+		return -ENOMEM;
+
 	lnet_rspt_cachep = kmem_cache_create("lnet_rspt",
 					     sizeof(struct lnet_rsp_tracker),
 					     0, 0, NULL);
@@ -582,6 +590,9 @@ static int lnet_discover(struct lnet_process_id id, u32 force,
 	kmem_cache_destroy(lnet_rspt_cachep);
 	lnet_rspt_cachep = NULL;
 
+	kmem_cache_destroy(lnet_udsp_cachep);
+	lnet_udsp_cachep = NULL;
+
 	kmem_cache_destroy(lnet_small_mds_cachep);
 	lnet_small_mds_cachep = NULL;
 
@@ -1261,6 +1272,7 @@ struct list_head **
 		the_lnet.ln_counters = NULL;
 	}
 	lnet_destroy_remote_nets_table();
+	lnet_udsp_destroy(true);
 	lnet_slab_cleanup();
 
 	return 0;
@@ -1313,6 +1325,81 @@ struct lnet_net *
 	return NULL;
 }
 
+void
+lnet_net_clr_pref_rtrs(struct lnet_net *net)
+{
+	struct list_head zombies;
+	struct lnet_nid_list *ne;
+	struct lnet_nid_list *tmp;
+
+	INIT_LIST_HEAD(&zombies);
+
+	lnet_net_lock(LNET_LOCK_EX);
+	list_splice_init(&net->net_rtr_pref_nids, &zombies);
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	list_for_each_entry_safe(ne, tmp, &zombies, nl_list) {
+		list_del_init(&ne->nl_list);
+		kfree(ne);
+	}
+}
+
+int
+lnet_net_add_pref_rtr(struct lnet_net *net,
+		      lnet_nid_t gw_nid)
+__must_hold(&the_lnet.ln_api_mutex)
+{
+	struct lnet_nid_list *ne;
+
+	/* This function is called with api_mutex held. When the api_mutex
+	 * is held the list can not be modified, as it is only modified as
+	 * a result of applying a UDSP and that happens under api_mutex
+	 * lock.
+	 */
+	list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
+		if (ne->nl_nid == gw_nid)
+			return -EEXIST;
+	}
+
+	ne = kzalloc(sizeof(*ne), GFP_KERNEL);
+	if (!ne)
+		return -ENOMEM;
+
+	ne->nl_nid = gw_nid;
+
+	/* Lock the cpt to protect against addition and checks in the
+	 * selection algorithm
+	 */
+	lnet_net_lock(LNET_LOCK_EX);
+	list_add(&ne->nl_list, &net->net_rtr_pref_nids);
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	return 0;
+}
+
+bool
+lnet_net_is_pref_rtr_locked(struct lnet_net *net, lnet_nid_t rtr_nid)
+{
+	struct lnet_nid_list *ne;
+
+	CDEBUG(D_NET, "%s: rtr pref emtpy: %d\n",
+	       libcfs_net2str(net->net_id),
+	       list_empty(&net->net_rtr_pref_nids));
+
+	if (list_empty(&net->net_rtr_pref_nids))
+		return false;
+
+	list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
+		CDEBUG(D_NET, "Comparing pref %s with gw %s\n",
+		       libcfs_nid2str(ne->nl_nid),
+		       libcfs_nid2str(rtr_nid));
+		if (rtr_nid == ne->nl_nid)
+			return true;
+	}
+
+	return false;
+}
+
 unsigned int
 lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
 {
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index f260092..b1cd86b 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -706,6 +706,72 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 static const size_t libcfs_nnetstrfns = ARRAY_SIZE(libcfs_netstrfns);
 
 static struct netstrfns *
+type2net_info(u32 net_type)
+{
+	int i;
+
+	for (i = 0; i < libcfs_nnetstrfns; i++) {
+		if (libcfs_netstrfns[i].nf_type == net_type)
+			return &libcfs_netstrfns[i];
+	}
+
+	return NULL;
+}
+
+int
+cfs_match_net(u32 net_id, u32 net_type, struct list_head *net_num_list)
+{
+	u32 net_num;
+
+	if (!net_num_list)
+		return 0;
+
+	if (net_type != LNET_NETTYP(net_id))
+		return 0;
+
+	net_num = LNET_NETNUM(net_id);
+
+	/* if there is a net number but the list passed in is empty, then
+	 * there is no match.
+	 */
+	if (!net_num && list_empty(net_num_list))
+		return 1;
+	else if (list_empty(net_num_list))
+		return 0;
+
+	if (!libcfs_num_match(net_num, net_num_list))
+		return 0;
+
+	return 1;
+}
+
+int
+cfs_match_nid_net(lnet_nid_t nid, u32 net_type,
+		  struct list_head *net_num_list,
+		  struct list_head *addr)
+{
+	u32 address;
+	struct netstrfns *nf;
+
+	if (!addr || !net_num_list)
+		return 0;
+
+	nf = type2net_info(LNET_NETTYP(LNET_NIDNET(nid)));
+	if (!nf || !net_num_list || !addr)
+		return 0;
+
+	address = LNET_NIDADDR(nid);
+
+	/* if either the address or net number don't match then no match */
+	if (!nf->nf_match_addr(address, addr) ||
+	    !cfs_match_net(LNET_NIDNET(nid), net_type, net_num_list))
+		return 0;
+
+	return 1;
+}
+EXPORT_SYMBOL(cfs_match_nid_net);
+
+static struct netstrfns *
 libcfs_lnd2netstrfns(u32 lnd)
 {
 	int i;
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index bbd43c8..b4b8edd 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1054,6 +1054,12 @@ struct lnet_peer_ni *
 	return rc;
 }
 
+void
+lnet_peer_ni_set_selection_priority(struct lnet_peer_ni *lpni, u32 priority)
+{
+	lpni->lpni_sel_priority = priority;
+}
+
 /*
  * Clear the preferred NIDs from a non-multi-rail peer.
  */
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
new file mode 100644
index 0000000..85e31fe
--- /dev/null
+++ b/net/lnet/lnet/udsp.c
@@ -0,0 +1,1051 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
+ *
+ * Copyright (c) 2011, 2017, Intel Corporation.
+ *
+ * Copyright (c) 2018-2020 Data Direct Networks.
+ *
+ *   This file is part of Lustre, https://wiki.whamcloud.com/
+ *
+ *   Portals is free software; you can redistribute it and/or
+ *   modify it under the terms of version 2 of the GNU General Public
+ *   License as published by the Free Software Foundation.
+ *
+ *   Portals is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   version 2 along with this program; If not, see
+ *   http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ *   net/lnet/lnet/udsp.c
+ *
+ *   User Defined Selection Policies (UDSP) are introduced to add
+ *   ability of fine traffic control. The policies are instantiated
+ *   on LNet constructs and allow preference of some constructs
+ *   over others as an extension of the selection algorithm.
+ *   The order of operation is defined by the selection algorithm logical flow:
+ *
+ *   1. Iterate over all the networks that a peer can be reached on
+ *      and select the best local network
+ *      - The remote network with the highest priority is examined
+ *        (Network Rule)
+ *      - The local network with the highest priority is selected
+ *        (Network Rule)
+ *      - The local NI with the highest priority is selected
+ *        (NID Rule)
+ *   2. If the peer is a remote peer and has no local networks,
+ *      - then select the remote peer network with the highest priority
+ *        (Network Rule)
+ *      - Select the highest priority remote peer_ni on the network selected
+ *        (NID Rule)
+ *      - Now that the peer's network and NI are decided, select the router
+ *        in round robin from the peer NI's preferred router list.
+ *        (Router Rule)
+ *      - Select the highest priority local NI on the local net of the
+ *        selected route.
+ *        (NID Rule)
+ *   3. Otherwise for local peers, select the peer_ni from the peer.
+ *      - highest priority peer NI is selected
+ *        (NID Rule)
+ *      - Select the peer NI which has the local NI selected on its
+ *        preferred list.
+ *        (NID Pair Rule)
+ *
+ *   Accordingly, the User Interface allows for the following:
+ *   - Adding a local network udsp: if multiple local networks are
+ *     available, each one can have a priority.
+ *   - Adding a local NID udsp: after a local network is chosen,
+ *     if there are multiple NIs, each one can have a priority.
+ *   - Adding a remote NID udsp: assign priority to a peer NID.
+ *   - Adding a NID pair udsp: allows to specify local NIDs
+ *     to be added on the list on the specified peer NIs
+ *     When selecting a peer NI, the one with the
+ *     local NID being used on its list is preferred.
+ *   - Adding a Router udsp: similar to the NID pair udsp.
+ *     Specified router NIDs are added on the list on the specified peer NIs.
+ *     When sending to a remote peer, remote net is selected and the peer NID
+ *     is selected. The router which has its nid on the peer NI list
+ *     is preferred.
+ *   - Deleting a udsp: use the specified policy index to remove it
+ *     from the policy list.
+ *
+ *   Generally, the syntax is as follows
+ *     lnetctl policy <add | del | show>
+ *      --src:      ip2nets syntax specifying the local NID to match
+ *      --dst:      ip2nets syntax specifying the remote NID to match
+ *      --rte:      ip2nets syntax specifying the router NID to match
+ *      --priority: Priority to apply to rule matches
+ *      --idx:      Index of where to insert or delete the rule
+ *                  By default add appends to the end of the rule list
+ *
+ * Author: Amir Shehata
+ */
+
+#include <linux/uaccess.h>
+
+#include <linux/lnet/udsp.h>
+#include <linux/libcfs/libcfs.h>
+
+struct udsp_info {
+	struct lnet_peer_ni *udi_lpni;
+	struct lnet_peer_net *udi_lpn;
+	struct lnet_ni *udi_ni;
+	struct lnet_net *udi_net;
+	struct lnet_ud_nid_descr *udi_match;
+	struct lnet_ud_nid_descr *udi_action;
+	u32 udi_priority;
+	enum lnet_udsp_action_type udi_type;
+	bool udi_local;
+	bool udi_revert;
+};
+
+typedef int (*udsp_apply_rule)(struct udsp_info *);
+
+enum udsp_apply {
+	UDSP_APPLY_ON_PEERS = 0,
+	UDSP_APPLY_PRIO_ON_NIS = 1,
+	UDSP_APPLY_RTE_ON_NETS = 2,
+	UDSP_APPLY_MAX_ENUM = 3,
+};
+
+#define RULE_NOT_APPLICABLE -1
+
+static inline bool
+lnet_udsp_is_net_rule(struct lnet_ud_nid_descr *match)
+{
+	return list_empty(&match->ud_addr_range);
+}
+
+static bool
+lnet_udsp_expr_list_equal(struct list_head *e1,
+			  struct list_head *e2)
+{
+	struct cfs_expr_list *expr1;
+	struct cfs_expr_list *expr2;
+	struct cfs_range_expr *range1, *range2;
+
+	if (list_empty(e1) && list_empty(e2))
+		return true;
+
+	if (lnet_get_list_len(e1) != lnet_get_list_len(e2))
+		return false;
+
+	expr2 = list_first_entry(e2, struct cfs_expr_list, el_link);
+
+	list_for_each_entry(expr1, e1, el_link) {
+		if (lnet_get_list_len(&expr1->el_exprs) !=
+		    lnet_get_list_len(&expr2->el_exprs))
+			return false;
+
+		range2 = list_first_entry(&expr2->el_exprs,
+					  struct cfs_range_expr,
+					  re_link);
+
+		list_for_each_entry(range1, &expr1->el_exprs, re_link) {
+			if (range1->re_lo != range2->re_lo ||
+			    range1->re_hi != range2->re_hi ||
+			    range1->re_stride != range2->re_stride)
+				return false;
+			range2 = list_next_entry(range2, re_link);
+		}
+		expr2 = list_next_entry(expr2, el_link);
+	}
+
+	return true;
+}
+
+static bool
+lnet_udsp_nid_descr_equal(struct lnet_ud_nid_descr *e1,
+			  struct lnet_ud_nid_descr *e2)
+{
+	if (e1->ud_net_id.udn_net_type != e2->ud_net_id.udn_net_type ||
+	    !lnet_udsp_expr_list_equal(&e1->ud_net_id.udn_net_num_range,
+				       &e2->ud_net_id.udn_net_num_range) ||
+	    !lnet_udsp_expr_list_equal(&e1->ud_addr_range, &e2->ud_addr_range))
+		return false;
+
+	return true;
+}
+
+static bool
+lnet_udsp_action_equal(struct lnet_udsp *e1, struct lnet_udsp *e2)
+{
+	if (e1->udsp_action_type != e2->udsp_action_type)
+		return false;
+
+	if (e1->udsp_action_type == EN_LNET_UDSP_ACTION_PRIORITY &&
+	    e1->udsp_action.udsp_priority != e2->udsp_action.udsp_priority)
+		return false;
+
+	return true;
+}
+
+static bool
+lnet_udsp_equal(struct lnet_udsp *e1, struct lnet_udsp *e2)
+{
+	/* check each NID descr */
+	if (!lnet_udsp_nid_descr_equal(&e1->udsp_src, &e2->udsp_src) ||
+	    !lnet_udsp_nid_descr_equal(&e1->udsp_dst, &e2->udsp_dst) ||
+	    !lnet_udsp_nid_descr_equal(&e1->udsp_rte, &e2->udsp_rte))
+		return false;
+
+	return true;
+}
+
+/* it is enough to look at the net type of the descriptor. If the criteria
+ * is present the net must be specified
+ */
+static inline bool
+lnet_udsp_criteria_present(struct lnet_ud_nid_descr *descr)
+{
+	return (descr->ud_net_id.udn_net_type != 0);
+}
+
+static int
+lnet_udsp_apply_rule_on_ni(struct udsp_info *udi)
+{
+	int rc;
+	struct lnet_ni *ni = udi->udi_ni;
+	struct lnet_ud_nid_descr *ni_match = udi->udi_match;
+	u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority;
+
+	rc = cfs_match_nid_net(ni->ni_nid,
+			       ni_match->ud_net_id.udn_net_type,
+			       &ni_match->ud_net_id.udn_net_num_range,
+			       &ni_match->ud_addr_range);
+	if (!rc)
+		return 0;
+
+	CDEBUG(D_NET, "apply udsp on ni %s\n",
+	       libcfs_nid2str(ni->ni_nid));
+
+	/* Detected match. Set NIDs priority */
+	lnet_ni_set_sel_priority_locked(ni, priority);
+
+	return 0;
+}
+
+static int
+lnet_udsp_apply_rte_list_on_net(struct lnet_net *net,
+				struct lnet_ud_nid_descr *rte_action,
+				bool revert)
+{
+	struct lnet_remotenet *rnet;
+	struct list_head *rn_list;
+	struct lnet_route *route;
+	struct lnet_peer_ni *lpni;
+	bool cleared = false;
+	lnet_nid_t gw_nid, gw_prim_nid;
+	int rc = 0;
+	int i;
+
+	for (i = 0; i < LNET_REMOTE_NETS_HASH_SIZE; i++) {
+		rn_list = &the_lnet.ln_remote_nets_hash[i];
+		list_for_each_entry(rnet, rn_list, lrn_list) {
+			list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
+				/* look if gw nid on the same net matches */
+				gw_prim_nid = route->lr_gateway->lp_primary_nid;
+				lpni = NULL;
+				while ((lpni = lnet_get_next_peer_ni_locked(route->lr_gateway,
+									    NULL,
+									    lpni)) != NULL) {
+					if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id))
+						continue;
+					gw_nid = lpni->lpni_nid;
+					rc = cfs_match_nid_net(gw_nid,
+							       rte_action->ud_net_id.udn_net_type,
+							       &rte_action->ud_net_id.udn_net_num_range,
+							       &rte_action->ud_addr_range);
+					if (rc)
+						break;
+				}
+				/* match gw primary nid on a remote network */
+				if (!rc) {
+					gw_nid = gw_prim_nid;
+					rc = cfs_match_nid_net(gw_nid,
+							       rte_action->ud_net_id.udn_net_type,
+							       &rte_action->ud_net_id.udn_net_num_range,
+							       &rte_action->ud_addr_range);
+				}
+				if (!rc)
+					continue;
+				lnet_net_unlock(LNET_LOCK_EX);
+				if (!cleared || revert) {
+					lnet_net_clr_pref_rtrs(net);
+					cleared = true;
+					if (revert) {
+						lnet_net_lock(LNET_LOCK_EX);
+						continue;
+					}
+				}
+				/* match. Add to pref NIDs */
+				CDEBUG(D_NET, "udsp net->gw: %s->%s\n",
+				       libcfs_net2str(net->net_id),
+				       libcfs_nid2str(gw_prim_nid));
+				rc = lnet_net_add_pref_rtr(net, gw_prim_nid);
+				lnet_net_lock(LNET_LOCK_EX);
+				/* success if EEXIST return */
+				if (rc && rc != -EEXIST) {
+					CERROR("Failed to add %s to %s pref rtr list\n",
+					       libcfs_nid2str(gw_prim_nid),
+					       libcfs_net2str(net->net_id));
+					return rc;
+				}
+			}
+		}
+	}
+
+	return rc;
+}
+
+static int
+lnet_udsp_apply_rte_rule_on_nets(struct udsp_info *udi)
+{
+	int rc = 0;
+	int last_failure = 0;
+	struct lnet_net *net;
+	struct lnet_ud_nid_descr *match = udi->udi_match;
+	struct lnet_ud_nid_descr *rte_action = udi->udi_action;
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		if (LNET_NETTYP(net->net_id) != match->ud_net_id.udn_net_type)
+			continue;
+
+		rc = cfs_match_net(net->net_id,
+				   match->ud_net_id.udn_net_type,
+				   &match->ud_net_id.udn_net_num_range);
+		if (!rc)
+			continue;
+
+		CDEBUG(D_NET, "apply rule on %s\n",
+		       libcfs_net2str(net->net_id));
+		rc = lnet_udsp_apply_rte_list_on_net(net, rte_action,
+						     udi->udi_revert);
+		if (rc)
+			last_failure = rc;
+	}
+
+	return last_failure;
+}
+
+static int
+lnet_udsp_apply_rte_rule_on_net(struct udsp_info *udi)
+{
+	int rc = 0;
+	struct lnet_net *net = udi->udi_net;
+	struct lnet_ud_nid_descr *match = udi->udi_match;
+	struct lnet_ud_nid_descr *rte_action = udi->udi_action;
+
+	rc = cfs_match_net(net->net_id,
+			   match->ud_net_id.udn_net_type,
+			   &match->ud_net_id.udn_net_num_range);
+	if (!rc)
+		return 0;
+
+	CDEBUG(D_NET, "apply rule on %s\n",
+	       libcfs_net2str(net->net_id));
+	rc = lnet_udsp_apply_rte_list_on_net(net, rte_action,
+					     udi->udi_revert);
+
+	return rc;
+}
+
+static int
+lnet_udsp_apply_prio_rule_on_net(struct udsp_info *udi)
+{
+	int rc;
+	struct lnet_ud_nid_descr *match = udi->udi_match;
+	struct lnet_net *net = udi->udi_net;
+	u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority;
+
+	if (!lnet_udsp_is_net_rule(match))
+		return RULE_NOT_APPLICABLE;
+
+	rc = cfs_match_net(net->net_id,
+			   match->ud_net_id.udn_net_type,
+			   &match->ud_net_id.udn_net_num_range);
+	if (!rc)
+		return 0;
+
+	CDEBUG(D_NET, "apply rule on %s\n",
+	       libcfs_net2str(net->net_id));
+
+	lnet_net_set_sel_priority_locked(net, priority);
+
+	return 0;
+}
+
+static int
+lnet_udsp_apply_rule_on_nis(struct udsp_info *udi)
+{
+	int rc = 0;
+	struct lnet_ni *ni;
+	struct lnet_net *net;
+	struct lnet_ud_nid_descr *ni_match = udi->udi_match;
+	int last_failure = 0;
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		if (LNET_NETTYP(net->net_id) !=
+		    ni_match->ud_net_id.udn_net_type)
+			continue;
+
+		udi->udi_net = net;
+		if (!lnet_udsp_apply_prio_rule_on_net(udi))
+			continue;
+
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			udi->udi_ni = ni;
+			rc = lnet_udsp_apply_rule_on_ni(udi);
+			if (rc)
+				last_failure = rc;
+		}
+	}
+
+	return last_failure;
+}
+
+static int
+lnet_udsp_apply_rte_list_on_lpni(struct lnet_peer_ni *lpni,
+				 struct lnet_ud_nid_descr *rte_action,
+				 bool revert)
+{
+	struct lnet_remotenet *rnet;
+	struct list_head *rn_list;
+	struct lnet_route *route;
+	bool cleared = false;
+	lnet_nid_t gw_nid;
+	int rc = 0;
+	int i;
+
+	for (i = 0; i < LNET_REMOTE_NETS_HASH_SIZE; i++) {
+		rn_list = &the_lnet.ln_remote_nets_hash[i];
+		list_for_each_entry(rnet, rn_list, lrn_list) {
+			list_for_each_entry(route, &rnet->lrn_routes, lr_list) {
+				gw_nid = route->lr_gateway->lp_primary_nid;
+				rc = cfs_match_nid_net(gw_nid,
+						       rte_action->ud_net_id.udn_net_type,
+						       &rte_action->ud_net_id.udn_net_num_range,
+						       &rte_action->ud_addr_range);
+				if (!rc)
+					continue;
+				lnet_net_unlock(LNET_LOCK_EX);
+				if (!cleared || revert) {
+					CDEBUG(D_NET,
+					       "%spref rtr nids from lpni %s\n",
+					       (revert) ? "revert " : "clear ",
+					       libcfs_nid2str(lpni->lpni_nid));
+					lnet_peer_clr_pref_rtrs(lpni);
+					cleared = true;
+					if (revert) {
+						lnet_net_lock(LNET_LOCK_EX);
+						continue;
+					}
+				}
+				CDEBUG(D_NET,
+				       "add gw nid %s as preferred for peer %s\n",
+				       libcfs_nid2str(gw_nid),
+				       libcfs_nid2str(lpni->lpni_nid));
+				/* match. Add to pref NIDs */
+				rc = lnet_peer_add_pref_rtr(lpni, gw_nid);
+				lnet_net_lock(LNET_LOCK_EX);
+				/* success if EEXIST return */
+				if (rc && rc != -EEXIST) {
+					CERROR("Failed to add %s to %s pref rtr list\n",
+					       libcfs_nid2str(gw_nid),
+					       libcfs_nid2str(lpni->lpni_nid));
+					return rc;
+				}
+			}
+		}
+	}
+
+	return rc;
+}
+
+static int
+lnet_udsp_apply_ni_list(struct lnet_peer_ni *lpni,
+			struct lnet_ud_nid_descr *ni_action,
+			bool revert)
+{
+	int rc = 0;
+	struct lnet_ni *ni;
+	struct lnet_net *net;
+	bool cleared = false;
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		if (LNET_NETTYP(net->net_id) !=
+		    ni_action->ud_net_id.udn_net_type)
+			continue;
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			rc = cfs_match_nid_net(ni->ni_nid,
+					       ni_action->ud_net_id.udn_net_type,
+					       &ni_action->ud_net_id.udn_net_num_range,
+					       &ni_action->ud_addr_range);
+			if (!rc)
+				continue;
+			lnet_net_unlock(LNET_LOCK_EX);
+			if (!cleared || revert) {
+				lnet_peer_clr_pref_nids(lpni);
+				CDEBUG(D_NET, "%spref nids from lpni %s\n",
+				       (revert) ? "revert " : "clear ",
+				       libcfs_nid2str(lpni->lpni_nid));
+				cleared = true;
+				if (revert) {
+					lnet_net_lock(LNET_LOCK_EX);
+					continue;
+				}
+			}
+			CDEBUG(D_NET, "add nid %s as preferred for peer %s\n",
+			       libcfs_nid2str(ni->ni_nid),
+			       libcfs_nid2str(lpni->lpni_nid));
+			/* match. Add to pref NIDs */
+			rc = lnet_peer_add_pref_nid(lpni, ni->ni_nid);
+			lnet_net_lock(LNET_LOCK_EX);
+			/* success if EEXIST return */
+			if (rc && rc != -EEXIST) {
+				CERROR("Failed to add %s to %s pref nid list\n",
+				       libcfs_nid2str(ni->ni_nid),
+				       libcfs_nid2str(lpni->lpni_nid));
+				return rc;
+			}
+		}
+	}
+
+	return rc;
+}
+
+static int
+lnet_udsp_apply_rule_on_lpni(struct udsp_info *udi)
+{
+	int rc;
+	struct lnet_peer_ni *lpni = udi->udi_lpni;
+	struct lnet_ud_nid_descr *lp_match = udi->udi_match;
+	struct lnet_ud_nid_descr *action = udi->udi_action;
+	u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority;
+	bool local = udi->udi_local;
+	enum lnet_udsp_action_type type = udi->udi_type;
+
+	rc = cfs_match_nid_net(lpni->lpni_nid,
+			       lp_match->ud_net_id.udn_net_type,
+			       &lp_match->ud_net_id.udn_net_num_range,
+			       &lp_match->ud_addr_range);
+
+	/* check if looking for a net match */
+	if (!rc &&
+	    (lnet_get_list_len(&lp_match->ud_addr_range) ||
+	     !cfs_match_net(udi->udi_lpn->lpn_net_id,
+			    lp_match->ud_net_id.udn_net_type,
+			    &lp_match->ud_net_id.udn_net_num_range))) {
+		return 0;
+	}
+
+	if (type == EN_LNET_UDSP_ACTION_PREFERRED_LIST && local) {
+		rc = lnet_udsp_apply_ni_list(lpni, action,
+					     udi->udi_revert);
+		if (rc)
+			return rc;
+	} else if (type == EN_LNET_UDSP_ACTION_PREFERRED_LIST &&
+			!local) {
+		rc = lnet_udsp_apply_rte_list_on_lpni(lpni, action,
+						      udi->udi_revert);
+		if (rc)
+			return rc;
+	} else {
+		lnet_peer_ni_set_selection_priority(lpni, priority);
+	}
+
+	return 0;
+}
+
+static int
+lnet_udsp_apply_rule_on_lpn(struct udsp_info *udi)
+{
+	int rc;
+	struct lnet_ud_nid_descr *match = udi->udi_match;
+	struct lnet_peer_net *lpn = udi->udi_lpn;
+	u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority;
+
+	if (udi->udi_type == EN_LNET_UDSP_ACTION_PREFERRED_LIST ||
+	    !lnet_udsp_is_net_rule(match))
+		return RULE_NOT_APPLICABLE;
+
+	rc = cfs_match_net(lpn->lpn_net_id,
+			   match->ud_net_id.udn_net_type,
+			   &match->ud_net_id.udn_net_num_range);
+	if (!rc)
+		return 0;
+
+	CDEBUG(D_NET, "apply rule on lpn %s\n",
+	       libcfs_net2str(lpn->lpn_net_id));
+	lnet_peer_net_set_sel_priority_locked(lpn, priority);
+
+	return 0;
+}
+
+static int
+lnet_udsp_apply_rule_on_lpnis(struct udsp_info *udi)
+{
+	/* iterate over all the peers in the system and find if any of the
+	 * peers match the criteria. If they do, clear the preferred list
+	 * and add the new list
+	 */
+	int lncpt = cfs_percpt_number(the_lnet.ln_peer_tables);
+	struct lnet_ud_nid_descr *lp_match = udi->udi_match;
+	struct lnet_peer_table *ptable;
+	struct lnet_peer_net *lpn;
+	struct lnet_peer_ni *lpni;
+	struct lnet_peer *lp;
+	int last_failure = 0;
+	int cpt;
+	int rc;
+
+	for (cpt = 0; cpt < lncpt; cpt++) {
+		ptable = the_lnet.ln_peer_tables[cpt];
+		list_for_each_entry(lp, &ptable->pt_peer_list, lp_peer_list) {
+			CDEBUG(D_NET, "udsp examining lp %s\n",
+			       libcfs_nid2str(lp->lp_primary_nid));
+			list_for_each_entry(lpn,
+					    &lp->lp_peer_nets,
+					    lpn_peer_nets) {
+				CDEBUG(D_NET, "udsp examining lpn %s\n",
+				       libcfs_net2str(lpn->lpn_net_id));
+
+				if (LNET_NETTYP(lpn->lpn_net_id) !=
+				    lp_match->ud_net_id.udn_net_type)
+					continue;
+
+				udi->udi_lpn = lpn;
+
+				if (!lnet_udsp_apply_rule_on_lpn(udi))
+					continue;
+
+				list_for_each_entry(lpni,
+						    &lpn->lpn_peer_nis,
+						    lpni_peer_nis) {
+					CDEBUG(D_NET,
+					       "udsp examining lpni %s\n",
+					       libcfs_nid2str(lpni->lpni_nid));
+					udi->udi_lpni = lpni;
+					rc = lnet_udsp_apply_rule_on_lpni(udi);
+					if (rc)
+						last_failure = rc;
+				}
+			}
+		}
+	}
+
+	return last_failure;
+}
+
+static int
+lnet_udsp_apply_single_policy(struct lnet_udsp *udsp, struct udsp_info *udi,
+			      udsp_apply_rule *cbs)
+{
+	int rc;
+
+	if (lnet_udsp_criteria_present(&udsp->udsp_dst) &&
+	    lnet_udsp_criteria_present(&udsp->udsp_src)) {
+		/* NID Pair rule */
+		if (!cbs[UDSP_APPLY_ON_PEERS])
+			return 0;
+
+		if (udsp->udsp_action_type !=
+			EN_LNET_UDSP_ACTION_PREFERRED_LIST) {
+			CERROR("Bad action type. Expected %d got %d\n",
+			       EN_LNET_UDSP_ACTION_PREFERRED_LIST,
+			       udsp->udsp_action_type);
+			return 0;
+		}
+		udi->udi_match = &udsp->udsp_dst;
+		udi->udi_action = &udsp->udsp_src;
+		udi->udi_type = EN_LNET_UDSP_ACTION_PREFERRED_LIST;
+		udi->udi_local = true;
+
+		CDEBUG(D_NET, "applying udsp (%p) dst->src\n",
+		       udsp);
+		rc = cbs[UDSP_APPLY_ON_PEERS](udi);
+		if (rc)
+			return rc;
+	} else if (lnet_udsp_criteria_present(&udsp->udsp_dst) &&
+		   lnet_udsp_criteria_present(&udsp->udsp_rte)) {
+		/* Router rule */
+		if (!cbs[UDSP_APPLY_ON_PEERS])
+			return 0;
+
+		if (udsp->udsp_action_type !=
+			EN_LNET_UDSP_ACTION_PREFERRED_LIST) {
+			CERROR("Bad action type. Expected %d got %d\n",
+			       EN_LNET_UDSP_ACTION_PREFERRED_LIST,
+			       udsp->udsp_action_type);
+			return 0;
+		}
+
+		if (lnet_udsp_criteria_present(&udsp->udsp_src)) {
+			CERROR("only one of src or dst can be specified\n");
+			return 0;
+		}
+		udi->udi_match = &udsp->udsp_dst;
+		udi->udi_action = &udsp->udsp_rte;
+		udi->udi_type = EN_LNET_UDSP_ACTION_PREFERRED_LIST;
+		udi->udi_local = false;
+
+		CDEBUG(D_NET, "applying udsp (%p) dst->rte\n",
+		       udsp);
+		rc = cbs[UDSP_APPLY_ON_PEERS](udi);
+		if (rc)
+			return rc;
+	} else if (lnet_udsp_criteria_present(&udsp->udsp_dst)) {
+		/* destination priority rule */
+		if (!cbs[UDSP_APPLY_ON_PEERS])
+			return 0;
+
+		if (udsp->udsp_action_type !=
+			EN_LNET_UDSP_ACTION_PRIORITY) {
+			CERROR("Bad action type. Expected %d got %d\n",
+			       EN_LNET_UDSP_ACTION_PRIORITY,
+			       udsp->udsp_action_type);
+			return 0;
+		}
+		udi->udi_match = &udsp->udsp_dst;
+		udi->udi_type = EN_LNET_UDSP_ACTION_PRIORITY;
+		if (udsp->udsp_action_type !=
+		    EN_LNET_UDSP_ACTION_PRIORITY) {
+			udi->udi_priority = 0;
+		} else {
+			udi->udi_priority = udsp->udsp_action.udsp_priority;
+		}
+		udi->udi_local = true;
+
+		CDEBUG(D_NET, "applying udsp (%p) on destination\n",
+		       udsp);
+		rc = cbs[UDSP_APPLY_ON_PEERS](udi);
+		if (rc)
+			return rc;
+	} else if (lnet_udsp_criteria_present(&udsp->udsp_src)) {
+		/* source priority rule */
+		if (!cbs[UDSP_APPLY_PRIO_ON_NIS])
+			return 0;
+
+		if (udsp->udsp_action_type !=
+			EN_LNET_UDSP_ACTION_PRIORITY) {
+			CERROR("Bad action type. Expected %d got %d\n",
+			       EN_LNET_UDSP_ACTION_PRIORITY,
+			       udsp->udsp_action_type);
+			return 0;
+		}
+		udi->udi_match = &udsp->udsp_src;
+		udi->udi_type = EN_LNET_UDSP_ACTION_PRIORITY;
+		if (udsp->udsp_action_type !=
+		    EN_LNET_UDSP_ACTION_PRIORITY) {
+			udi->udi_priority = 0;
+		} else {
+			udi->udi_priority = udsp->udsp_action.udsp_priority;
+		}
+		udi->udi_local = true;
+
+		CDEBUG(D_NET, "applying udsp (%p) on source\n",
+		       udsp);
+		rc = cbs[UDSP_APPLY_PRIO_ON_NIS](udi);
+	} else {
+		CERROR("Bad UDSP policy\n");
+		return 0;
+	}
+
+	return 0;
+}
+
+static int
+lnet_udsp_apply_policies_helper(struct lnet_udsp *udsp, struct udsp_info *udi,
+				udsp_apply_rule *cbs)
+{
+	int rc;
+	int last_failure = 0;
+
+	if (udsp)
+		return lnet_udsp_apply_single_policy(udsp, udi, cbs);
+
+	list_for_each_entry_reverse(udsp,
+				    &the_lnet.ln_udsp_list,
+				    udsp_on_list) {
+		rc = lnet_udsp_apply_single_policy(udsp, udi, cbs);
+		if (rc)
+			last_failure = rc;
+	}
+
+	return last_failure;
+}
+
+int
+lnet_udsp_apply_policies_on_ni(struct lnet_ni *ni)
+{
+	struct udsp_info udi;
+	udsp_apply_rule cbs[UDSP_APPLY_MAX_ENUM] = {NULL};
+
+	memset(&udi, 0, sizeof(udi));
+
+	udi.udi_ni = ni;
+
+	cbs[UDSP_APPLY_PRIO_ON_NIS] = lnet_udsp_apply_rule_on_ni;
+
+	return lnet_udsp_apply_policies_helper(NULL, &udi, cbs);
+}
+
+int
+lnet_udsp_apply_policies_on_net(struct lnet_net *net)
+{
+	struct udsp_info udi;
+	udsp_apply_rule cbs[UDSP_APPLY_MAX_ENUM] = {NULL};
+
+	memset(&udi, 0, sizeof(udi));
+
+	udi.udi_net = net;
+
+	cbs[UDSP_APPLY_PRIO_ON_NIS] = lnet_udsp_apply_prio_rule_on_net;
+	cbs[UDSP_APPLY_RTE_ON_NETS] = lnet_udsp_apply_rte_rule_on_net;
+
+	return lnet_udsp_apply_policies_helper(NULL, &udi, cbs);
+}
+
+int
+lnet_udsp_apply_policies_on_lpni(struct lnet_peer_ni *lpni)
+{
+	struct udsp_info udi;
+	udsp_apply_rule cbs[UDSP_APPLY_MAX_ENUM] = {NULL};
+
+	memset(&udi, 0, sizeof(udi));
+
+	udi.udi_lpni = lpni;
+
+	cbs[UDSP_APPLY_ON_PEERS] = lnet_udsp_apply_rule_on_lpni;
+
+	return lnet_udsp_apply_policies_helper(NULL, &udi, cbs);
+}
+
+int
+lnet_udsp_apply_policies_on_lpn(struct lnet_peer_net *lpn)
+{
+	struct udsp_info udi;
+	udsp_apply_rule cbs[UDSP_APPLY_MAX_ENUM] = {NULL};
+
+	memset(&udi, 0, sizeof(udi));
+
+	udi.udi_lpn = lpn;
+
+	cbs[UDSP_APPLY_ON_PEERS] = lnet_udsp_apply_rule_on_lpn;
+
+	return lnet_udsp_apply_policies_helper(NULL, &udi, cbs);
+}
+
+int
+lnet_udsp_apply_policies(struct lnet_udsp *udsp, bool revert)
+{
+	int rc;
+	struct udsp_info udi;
+	udsp_apply_rule cbs[UDSP_APPLY_MAX_ENUM] = {NULL};
+
+	memset(&udi, 0, sizeof(udi));
+
+	cbs[UDSP_APPLY_ON_PEERS] = lnet_udsp_apply_rule_on_lpnis;
+	cbs[UDSP_APPLY_PRIO_ON_NIS] = lnet_udsp_apply_rule_on_nis;
+	cbs[UDSP_APPLY_RTE_ON_NETS] = lnet_udsp_apply_rte_rule_on_nets;
+
+	udi.udi_revert = revert;
+
+	lnet_net_lock(LNET_LOCK_EX);
+	rc = lnet_udsp_apply_policies_helper(udsp, &udi, cbs);
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	return rc;
+}
+
+struct lnet_udsp *
+lnet_udsp_get_policy(int idx)
+{
+	int i = 0;
+	struct lnet_udsp *udsp = NULL;
+	bool found = false;
+
+	CDEBUG(D_NET, "Get UDSP at idx = %d\n", idx);
+
+	if (idx < 0)
+		return NULL;
+
+	list_for_each_entry(udsp, &the_lnet.ln_udsp_list, udsp_on_list) {
+		CDEBUG(D_NET, "iterating over upsp %d:%d:%d\n",
+		       udsp->udsp_idx, i, idx);
+		if (i == idx) {
+			found = true;
+			break;
+		}
+		i++;
+	}
+
+	CDEBUG(D_NET, "Found UDSP (%p)\n", udsp);
+
+	if (!found)
+		return NULL;
+
+	return udsp;
+}
+
+int
+lnet_udsp_add_policy(struct lnet_udsp *new, int idx)
+{
+	struct lnet_udsp *udsp;
+	struct lnet_udsp *insert = NULL;
+	int i = 0;
+
+	list_for_each_entry(udsp, &the_lnet.ln_udsp_list, udsp_on_list) {
+		CDEBUG(D_NET, "found udsp i = %d:%d, idx = %d\n",
+		       i, udsp->udsp_idx, idx);
+		if (i == idx) {
+			insert = udsp;
+			new->udsp_idx = idx;
+		}
+		i++;
+		if (lnet_udsp_equal(udsp, new)) {
+			if (!lnet_udsp_action_equal(udsp, new) &&
+			    udsp->udsp_action_type == EN_LNET_UDSP_ACTION_PRIORITY &&
+			    new->udsp_action_type == EN_LNET_UDSP_ACTION_PRIORITY) {
+				udsp->udsp_action.udsp_priority = new->udsp_action.udsp_priority;
+				CDEBUG(D_NET,
+				       "udsp: %p index %d updated priority to %d\n",
+				       udsp,
+				       udsp->udsp_idx,
+				       udsp->udsp_action.udsp_priority);
+				return 0;
+			}
+			return -EALREADY;
+		}
+	}
+
+	if (insert) {
+		list_add(&new->udsp_on_list, insert->udsp_on_list.prev);
+		i = 0;
+		list_for_each_entry(udsp,
+				    &the_lnet.ln_udsp_list,
+				    udsp_on_list) {
+			if (i <= idx) {
+				i++;
+				continue;
+			}
+			udsp->udsp_idx++;
+		}
+	} else {
+		list_add_tail(&new->udsp_on_list, &the_lnet.ln_udsp_list);
+		new->udsp_idx = i;
+	}
+
+	CDEBUG(D_NET, "udsp: %p added at index %d\n", new, new->udsp_idx);
+
+	CDEBUG(D_NET, "udsp list:\n");
+	list_for_each_entry(udsp, &the_lnet.ln_udsp_list, udsp_on_list)
+		CDEBUG(D_NET, "udsp %p:%d\n", udsp, udsp->udsp_idx);
+
+	return 0;
+}
+
+int
+lnet_udsp_del_policy(int idx)
+{
+	struct lnet_udsp *udsp;
+	struct lnet_udsp *tmp;
+	bool removed = false;
+
+	if (idx < 0) {
+		lnet_udsp_destroy(false);
+		return 0;
+	}
+
+	CDEBUG(D_NET, "del udsp at idx = %d\n", idx);
+
+	list_for_each_entry_safe(udsp,
+				 tmp,
+				 &the_lnet.ln_udsp_list,
+				 udsp_on_list) {
+		if (removed)
+			udsp->udsp_idx--;
+		if (udsp->udsp_idx == idx && !removed) {
+			list_del_init(&udsp->udsp_on_list);
+			lnet_udsp_apply_policies(udsp, true);
+			lnet_udsp_free(udsp);
+			removed = true;
+		}
+	}
+
+	return 0;
+}
+
+struct lnet_udsp *
+lnet_udsp_alloc(void)
+{
+	struct lnet_udsp *udsp;
+
+	udsp = kmem_cache_alloc(lnet_udsp_cachep, GFP_NOFS | __GFP_ZERO);
+
+	if (!udsp)
+		return NULL;
+
+	INIT_LIST_HEAD(&udsp->udsp_on_list);
+	INIT_LIST_HEAD(&udsp->udsp_src.ud_addr_range);
+	INIT_LIST_HEAD(&udsp->udsp_src.ud_net_id.udn_net_num_range);
+	INIT_LIST_HEAD(&udsp->udsp_dst.ud_addr_range);
+	INIT_LIST_HEAD(&udsp->udsp_dst.ud_net_id.udn_net_num_range);
+	INIT_LIST_HEAD(&udsp->udsp_rte.ud_addr_range);
+	INIT_LIST_HEAD(&udsp->udsp_rte.ud_net_id.udn_net_num_range);
+
+	CDEBUG(D_MALLOC, "udsp alloc %p\n", udsp);
+	return udsp;
+}
+
+static void
+lnet_udsp_nid_descr_free(struct lnet_ud_nid_descr *nid_descr)
+{
+	struct list_head *net_range = &nid_descr->ud_net_id.udn_net_num_range;
+
+	if (!lnet_udsp_criteria_present(nid_descr))
+		return;
+
+	/* memory management is a bit tricky here. When we allocate the
+	 * memory to store the NID descriptor we allocate a large buffer
+	 * for all the data, so we need to free the entire buffer at
+	 * once. If the net is present the net_range->next points to that
+	 * buffer otherwise if the ud_addr_range is present then it's the
+	 * ud_addr_range.next
+	 */
+	if (!list_empty(net_range))
+		kfree(net_range->next);
+	else if (!list_empty(&nid_descr->ud_addr_range))
+		kfree(nid_descr->ud_addr_range.next);
+}
+
+void
+lnet_udsp_free(struct lnet_udsp *udsp)
+{
+	lnet_udsp_nid_descr_free(&udsp->udsp_src);
+	lnet_udsp_nid_descr_free(&udsp->udsp_dst);
+	lnet_udsp_nid_descr_free(&udsp->udsp_rte);
+
+	CDEBUG(D_MALLOC, "udsp free %p\n", udsp);
+	kmem_cache_free(lnet_udsp_cachep, udsp);
+}
+
+void
+lnet_udsp_destroy(bool shutdown)
+{
+	struct lnet_udsp *udsp, *tmp;
+
+	CDEBUG(D_NET, "Destroying UDSPs in the system\n");
+
+	list_for_each_entry_safe(udsp, tmp, &the_lnet.ln_udsp_list,
+				 udsp_on_list) {
+		list_del(&udsp->udsp_on_list);
+		if (!shutdown)
+			lnet_udsp_apply_policies(udsp, true);
+		lnet_udsp_free(udsp);
+	}
+}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 13/41] lnet: Apply UDSP on local and remote NIs
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (11 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 12/41] lnet: UDSP handling James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 14/41] lnet: Add the kernel level Marshalling API James Simmons
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

When a peer net, peer ni, local net or local ni are created
apply the UDSPs in the system on these constructs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: a29151e30d8b89d ("LU-9121 lnet: Apply UDSP on local and remote NIs")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34355
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 26 +++++++++++++++++++++-----
 net/lnet/lnet/peer.c   | 16 ++++++++++++++++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 4809c76..9ff2776 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3164,12 +3164,13 @@ int lnet_get_ni_stats(struct lnet_ioctl_element_msg_stats *msg_stats)
 static int lnet_add_net_common(struct lnet_net *net,
 			       struct lnet_ioctl_config_lnd_tunables *tun)
 {
-	u32 net_id;
-	struct lnet_ping_buffer *pbuf;
 	struct lnet_handle_md ping_mdh;
-	int rc;
+	struct lnet_ping_buffer *pbuf;
 	struct lnet_remotenet *rnet;
+	struct lnet_ni *ni;
 	int net_ni_count;
+	u32 net_id;
+	int rc;
 
 	lnet_net_lock(LNET_LOCK_EX);
 	rnet = lnet_find_rnet_locked(net->net_id);
@@ -3219,10 +3220,25 @@ static int lnet_add_net_common(struct lnet_net *net,
 
 	lnet_net_lock(LNET_LOCK_EX);
 	net = lnet_get_net_locked(net_id);
-	lnet_net_unlock(LNET_LOCK_EX);
-
 	LASSERT(net);
 
+	/* apply the UDSPs */
+	rc = lnet_udsp_apply_policies_on_net(net);
+	if (rc)
+		CERROR("Failed to apply UDSPs on local net %s\n",
+		       libcfs_net2str(net->net_id));
+
+	/* At this point we lost track of which NI was just added, so we
+	 * just re-apply the policies on all of the NIs on this net
+	 */
+	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+		rc = lnet_udsp_apply_policies_on_ni(ni);
+		if (rc)
+			CERROR("Failed to apply UDSPs on ni %s\n",
+			       libcfs_nid2str(ni->ni_nid));
+	}
+	lnet_net_unlock(LNET_LOCK_EX);
+
 	/*
 	 * Start the acceptor thread if this is the first network
 	 * being added that requires the thread.
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index b4b8edd..8ee5ec3 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -35,6 +35,7 @@
 
 #define DEBUG_SUBSYSTEM S_LNET
 
+#include <linux/lnet/udsp.h>
 #include <linux/lnet/lib-lnet.h>
 #include <uapi/linux/lnet/lnet-dlc.h>
 
@@ -1357,6 +1358,8 @@ struct lnet_peer_net *
 			 unsigned int flags)
 {
 	struct lnet_peer_table *ptable;
+	bool new_lpn = false;
+	int rc;
 
 	/* Install the new peer_ni */
 	lnet_net_lock(LNET_LOCK_EX);
@@ -1387,6 +1390,7 @@ struct lnet_peer_net *
 
 	/* Add peer_net to peer */
 	if (!lpn->lpn_peer) {
+		new_lpn = true;
 		lpn->lpn_peer = lp;
 		list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets);
 		lnet_peer_addref_locked(lp);
@@ -1416,6 +1420,18 @@ struct lnet_peer_net *
 
 	lp->lp_nnis++;
 
+	/* apply UDSPs */
+	if (new_lpn) {
+		rc = lnet_udsp_apply_policies_on_lpn(lpn);
+		if (rc)
+			CERROR("Failed to apply UDSPs on lpn %s\n",
+			       libcfs_net2str(lpn->lpn_net_id));
+	}
+	rc = lnet_udsp_apply_policies_on_lpni(lpni);
+	if (rc)
+		CERROR("Failed to apply UDSPs on lpni %s\n",
+		       libcfs_nid2str(lpni->lpni_nid));
+
 	CDEBUG(D_NET, "peer %s NID %s flags %#x\n",
 	       libcfs_nid2str(lp->lp_primary_nid),
 	       libcfs_nid2str(lpni->lpni_nid), flags);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 14/41] lnet: Add the kernel level Marshalling API
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (12 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 13/41] lnet: Apply UDSP on local and remote NIs James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 15/41] lnet: Add the kernel level De-Marshalling API James Simmons
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Given a UDSP, Marshal the UDSP pointed to by udsp
into the memory block that is allocated from userspace.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: cd0ef3165e1d1b5f ("LU-9121 lnet: Add the kernel level Marshalling API")
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34403
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/udsp.h |  13 +++
 net/lnet/lnet/udsp.c      | 214 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 227 insertions(+)

diff --git a/include/linux/lnet/udsp.h b/include/linux/lnet/udsp.h
index 265cb42..0cf630f 100644
--- a/include/linux/lnet/udsp.h
+++ b/include/linux/lnet/udsp.h
@@ -114,4 +114,17 @@
  */
 void lnet_udsp_destroy(bool shutdown);
 
+/**
+ * lnet_get_udsp_size
+ *	Return the size needed to store the marshalled UDSP
+ */
+size_t lnet_get_udsp_size(struct lnet_udsp *udsp);
+
+/**
+ * lnet_udsp_marshal
+ *	Marshal the udsp into the bulk memory provided.
+ *	Return success/failure.
+ */
+int lnet_udsp_marshal(struct lnet_udsp *udsp,
+		      struct lnet_ioctl_udsp *ioc_udsp);
 #endif /* UDSP_H */
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
index 85e31fe..499035d 100644
--- a/net/lnet/lnet/udsp.c
+++ b/net/lnet/lnet/udsp.c
@@ -1049,3 +1049,217 @@ struct lnet_udsp *
 		lnet_udsp_free(udsp);
 	}
 }
+
+static size_t
+lnet_size_marshaled_nid_descr(struct lnet_ud_nid_descr *descr)
+{
+	struct cfs_expr_list *expr;
+	int expr_count = 0;
+	int range_count = 0;
+	size_t size = sizeof(struct lnet_ioctl_udsp_descr);
+
+	if (!lnet_udsp_criteria_present(descr))
+		return size;
+
+	/* we always have one net expression */
+	if (!list_empty(&descr->ud_net_id.udn_net_num_range)) {
+		expr = list_first_entry(&descr->ud_net_id.udn_net_num_range,
+					struct cfs_expr_list, el_link);
+
+		/* count the number of cfs_range_expr in the net expression */
+		range_count = lnet_get_list_len(&expr->el_exprs);
+	}
+
+	/* count the number of cfs_range_expr in the address expressions */
+	list_for_each_entry(expr, &descr->ud_addr_range, el_link) {
+		expr_count++;
+		range_count += lnet_get_list_len(&expr->el_exprs);
+	}
+
+	size += (sizeof(struct lnet_expressions) * expr_count);
+	size += (sizeof(struct lnet_range_expr) * range_count);
+
+	return size;
+}
+
+size_t
+lnet_get_udsp_size(struct lnet_udsp *udsp)
+{
+	size_t size = sizeof(struct lnet_ioctl_udsp);
+
+	size += lnet_size_marshaled_nid_descr(&udsp->udsp_src);
+	size += lnet_size_marshaled_nid_descr(&udsp->udsp_dst);
+	size += lnet_size_marshaled_nid_descr(&udsp->udsp_rte);
+
+	CDEBUG(D_NET, "get udsp (%p) size: %d\n", udsp, (int)size);
+
+	return size;
+}
+
+static int
+copy_exprs(struct cfs_expr_list *expr, void __user **bulk,
+	   u32 *bulk_size)
+{
+	struct cfs_range_expr *range;
+	struct lnet_range_expr range_expr;
+
+	/* copy over the net range expressions to the bulk */
+	list_for_each_entry(range, &expr->el_exprs, re_link) {
+		range_expr.re_lo = range->re_lo;
+		range_expr.re_hi = range->re_hi;
+		range_expr.re_stride = range->re_stride;
+		CDEBUG(D_NET, "Copy Range %u:%u:%u\n",
+		       range_expr.re_lo, range_expr.re_hi,
+		       range_expr.re_stride);
+		if (copy_to_user(*bulk, &range_expr, sizeof(range_expr))) {
+			CDEBUG(D_NET, "Failed to copy range_expr\n");
+			return -EFAULT;
+		}
+		*bulk += sizeof(range_expr);
+		*bulk_size -= sizeof(range_expr);
+	}
+
+	return 0;
+}
+
+static int
+copy_nid_range(struct lnet_ud_nid_descr *nid_descr, char *type,
+	       void **bulk, u32 *bulk_size)
+{
+	struct lnet_ioctl_udsp_descr ioc_udsp_descr;
+	struct cfs_expr_list *expr;
+	struct lnet_expressions ioc_expr;
+	int expr_count;
+	int net_expr_count;
+	int rc;
+
+	memset(&ioc_udsp_descr, 0, sizeof(ioc_udsp_descr));
+	ioc_udsp_descr.iud_src_hdr.ud_descr_type = *(u32 *)type;
+
+	/* if criteria not present, copy over the static part of the NID
+	 * descriptor
+	 */
+	if (!lnet_udsp_criteria_present(nid_descr)) {
+		CDEBUG(D_NET, "Descriptor %u:%u:%u:%u\n",
+		       ioc_udsp_descr.iud_src_hdr.ud_descr_type,
+		       ioc_udsp_descr.iud_src_hdr.ud_descr_count,
+		       ioc_udsp_descr.iud_net.ud_net_type,
+		       ioc_udsp_descr.iud_net.ud_net_num_expr.le_count);
+		if (copy_to_user(*bulk, &ioc_udsp_descr,
+				 sizeof(ioc_udsp_descr))) {
+			CDEBUG(D_NET, "failed to copy ioc_udsp_descr\n");
+			return -EFAULT;
+		}
+		*bulk += sizeof(ioc_udsp_descr);
+		*bulk_size -= sizeof(ioc_udsp_descr);
+		return 0;
+	}
+
+	expr_count = lnet_get_list_len(&nid_descr->ud_addr_range);
+
+	/* copy the net information */
+	if (!list_empty(&nid_descr->ud_net_id.udn_net_num_range)) {
+		expr = list_first_entry(&nid_descr->ud_net_id.udn_net_num_range,
+					struct cfs_expr_list, el_link);
+		net_expr_count = lnet_get_list_len(&expr->el_exprs);
+	} else {
+		net_expr_count = 0;
+	}
+
+	/* set the total expression count */
+	ioc_udsp_descr.iud_src_hdr.ud_descr_count = expr_count;
+	ioc_udsp_descr.iud_net.ud_net_type =
+		nid_descr->ud_net_id.udn_net_type;
+	ioc_udsp_descr.iud_net.ud_net_num_expr.le_count = net_expr_count;
+
+	CDEBUG(D_NET, "Descriptor %u:%u:%u:%u\n",
+	       ioc_udsp_descr.iud_src_hdr.ud_descr_type,
+	       ioc_udsp_descr.iud_src_hdr.ud_descr_count,
+	       ioc_udsp_descr.iud_net.ud_net_type,
+	       ioc_udsp_descr.iud_net.ud_net_num_expr.le_count);
+
+	/* copy over the header info to the bulk */
+	if (copy_to_user(*bulk, &ioc_udsp_descr, sizeof(ioc_udsp_descr))) {
+		CDEBUG(D_NET, "Failed to copy data\n");
+		return -EFAULT;
+	}
+	*bulk += sizeof(ioc_udsp_descr);
+	*bulk_size -= sizeof(ioc_udsp_descr);
+
+	/* copy over the net num expression if it exists */
+	if (net_expr_count) {
+		rc = copy_exprs(expr, bulk, bulk_size);
+		if (rc)
+			return rc;
+	}
+
+	/* copy the address range */
+	list_for_each_entry(expr, &nid_descr->ud_addr_range, el_link) {
+		ioc_expr.le_count = lnet_get_list_len(&expr->el_exprs);
+		if (copy_to_user(*bulk, &ioc_expr, sizeof(ioc_expr))) {
+			CDEBUG(D_NET, "failex to copy ioc_expr\n");
+			return -EFAULT;
+		}
+		*bulk += sizeof(ioc_expr);
+		*bulk_size -= sizeof(ioc_expr);
+
+		rc = copy_exprs(expr, bulk, bulk_size);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+int
+lnet_udsp_marshal(struct lnet_udsp *udsp, struct lnet_ioctl_udsp *ioc_udsp)
+{
+	int rc = -ENOMEM;
+	void __user *bulk;
+	u32 bulk_size;
+
+	if (!ioc_udsp)
+		return -EINVAL;
+
+	bulk = ioc_udsp->iou_bulk;
+	bulk_size = ioc_udsp->iou_hdr.ioc_len +
+	  ioc_udsp->iou_bulk_size;
+
+	CDEBUG(D_NET, "marshal udsp (%p)\n", udsp);
+	CDEBUG(D_NET, "MEM -----> bulk: %p:0x%x\n", bulk, bulk_size);
+	/* make sure user space allocated enough buffer to marshal the
+	 * udsp
+	 */
+	if (bulk_size != lnet_get_udsp_size(udsp)) {
+		rc = -ENOSPC;
+		goto fail;
+	}
+
+	ioc_udsp->iou_idx = udsp->udsp_idx;
+	ioc_udsp->iou_action_type = udsp->udsp_action_type;
+	ioc_udsp->iou_action.priority = udsp->udsp_action.udsp_priority;
+
+	bulk_size -= sizeof(*ioc_udsp);
+
+	rc = copy_nid_range(&udsp->udsp_src, "SRC", &bulk, &bulk_size);
+	if (rc)
+		goto fail;
+
+	rc = copy_nid_range(&udsp->udsp_dst, "DST", &bulk, &bulk_size);
+	if (rc)
+		goto fail;
+
+	rc = copy_nid_range(&udsp->udsp_rte, "RTE", &bulk, &bulk_size);
+	if (rc)
+		goto fail;
+
+	CDEBUG(D_NET, "MEM <----- bulk: %p\n", bulk);
+
+	/* we should've consumed the entire buffer */
+	LASSERT(bulk_size == 0);
+	return 0;
+
+fail:
+	CERROR("Failed to marshal udsp: %d\n", rc);
+	return rc;
+}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 15/41] lnet: Add the kernel level De-Marshalling API
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (13 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 14/41] lnet: Add the kernel level Marshalling API James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 16/41] lnet: Add the ioctl handler for "add policy" James Simmons
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Sonia Sharma <sharmaso@whamcloud.com>

Given a bulk allocated from userspace containing a
single UDSP, De-Marshalling API demarshals it
and populate the provided udsp structure.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 764d16bf7803908 ("LU-9121 lnet: Add the kernel level De-Marshalling API")
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34488
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/udsp.h |   7 ++
 net/lnet/lnet/udsp.c      | 202 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/include/linux/lnet/udsp.h b/include/linux/lnet/udsp.h
index 0cf630f..3683d43 100644
--- a/include/linux/lnet/udsp.h
+++ b/include/linux/lnet/udsp.h
@@ -127,4 +127,11 @@
  */
 int lnet_udsp_marshal(struct lnet_udsp *udsp,
 		      struct lnet_ioctl_udsp *ioc_udsp);
+/**
+ * lnet_udsp_demarshal_add
+ *	Given a bulk containing a single UDSP,
+ *	demarshal and populate a udsp structure then add policy
+ */
+int lnet_udsp_demarshal_add(void *bulk, u32 bulk_size);
+
 #endif /* UDSP_H */
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
index 499035d..f686ff2 100644
--- a/net/lnet/lnet/udsp.c
+++ b/net/lnet/lnet/udsp.c
@@ -1124,7 +1124,7 @@ struct lnet_udsp *
 
 static int
 copy_nid_range(struct lnet_ud_nid_descr *nid_descr, char *type,
-	       void **bulk, u32 *bulk_size)
+	       void __user **bulk, u32 *bulk_size)
 {
 	struct lnet_ioctl_udsp_descr ioc_udsp_descr;
 	struct cfs_expr_list *expr;
@@ -1263,3 +1263,203 @@ struct lnet_udsp *
 	CERROR("Failed to marshal udsp: %d\n", rc);
 	return rc;
 }
+
+static void
+copy_range_info(void **bulk, void **buf, struct list_head *list,
+		int count)
+{
+	struct lnet_range_expr *range_expr;
+	struct cfs_range_expr *range;
+	struct cfs_expr_list *exprs;
+	int range_count = count;
+	int i;
+
+	if (range_count == 0)
+		return;
+
+	if (range_count == -1) {
+		struct lnet_expressions *e;
+
+		e = *bulk;
+		range_count = e->le_count;
+		*bulk += sizeof(*e);
+	}
+
+	exprs = *buf;
+	INIT_LIST_HEAD(&exprs->el_link);
+	INIT_LIST_HEAD(&exprs->el_exprs);
+	list_add_tail(&exprs->el_link, list);
+	*buf += sizeof(*exprs);
+
+	for (i = 0; i < range_count; i++) {
+		range_expr = *bulk;
+		range = *buf;
+		INIT_LIST_HEAD(&range->re_link);
+		range->re_lo = range_expr->re_lo;
+		range->re_hi = range_expr->re_hi;
+		range->re_stride = range_expr->re_stride;
+		CDEBUG(D_NET, "Copy Range %u:%u:%u\n",
+		       range->re_lo,
+		       range->re_hi,
+		       range->re_stride);
+		list_add_tail(&range->re_link, &exprs->el_exprs);
+		*bulk += sizeof(*range_expr);
+		*buf += sizeof(*range);
+	}
+}
+
+static int
+copy_ioc_udsp_descr(struct lnet_ud_nid_descr *nid_descr, char *type,
+		    void **bulk, u32 *bulk_size)
+{
+	struct lnet_ioctl_udsp_descr *ioc_nid = *bulk;
+	struct lnet_expressions *exprs;
+	u32 descr_type;
+	int expr_count = 0;
+	int range_count = 0;
+	int i;
+	u32 size;
+	int remaining_size = *bulk_size;
+	void *tmp = *bulk;
+	u32 alloc_size;
+	void *buf;
+	size_t range_expr_s = sizeof(struct lnet_range_expr);
+	size_t lnet_exprs_s = sizeof(struct lnet_expressions);
+
+	CDEBUG(D_NET, "%s: bulk = %p:%u\n", type, *bulk, *bulk_size);
+
+	/* criteria not present, skip over the static part of the
+	 * bulk, which is included for each NID descriptor
+	 */
+	if (ioc_nid->iud_net.ud_net_type == 0) {
+		remaining_size -= sizeof(*ioc_nid);
+		if (remaining_size < 0) {
+			CERROR("Truncated userspace udsp buffer given\n");
+			return -EINVAL;
+		}
+		*bulk += sizeof(*ioc_nid);
+		*bulk_size = remaining_size;
+		return 0;
+	}
+
+	descr_type = ioc_nid->iud_src_hdr.ud_descr_type;
+	if (descr_type != *(u32 *)type) {
+		CERROR("Bad NID descriptor type. Expected %s, given %c%c%c\n",
+		       type, (u8)descr_type, (u8)(descr_type << 4),
+		       (u8)(descr_type << 8));
+		return -EINVAL;
+	}
+
+	/* calculate the total size to verify we have enough buffer.
+	 * Start of by finding how many ranges there are for the net
+	 * expression.
+	 */
+	range_count = ioc_nid->iud_net.ud_net_num_expr.le_count;
+	size = sizeof(*ioc_nid) + (range_count * range_expr_s);
+	remaining_size -= size;
+	if (remaining_size < 0) {
+		CERROR("Truncated userspace udsp buffer given\n");
+		return -EINVAL;
+	}
+
+	CDEBUG(D_NET, "Total net num ranges in %s: %d:%u\n", type,
+	       range_count, size);
+	/* the number of expressions for the NID. IE 4 for IP, 1 for GNI */
+	expr_count = ioc_nid->iud_src_hdr.ud_descr_count;
+	CDEBUG(D_NET, "addr as %d exprs\n", expr_count);
+	/* point tmp to the beginning of the NID expressions */
+	tmp += size;
+	for (i = 0; i < expr_count; i++) {
+		/* get the number of ranges per expression */
+		exprs = tmp;
+		range_count += exprs->le_count;
+		size = (range_expr_s * exprs->le_count) + lnet_exprs_s;
+		remaining_size -= size;
+		CDEBUG(D_NET, "expr %d:%d:%u:%d:%d\n", i, exprs->le_count,
+		       size, remaining_size, range_count);
+		if (remaining_size < 0) {
+			CERROR("Truncated userspace udsp buffer given\n");
+			return -EINVAL;
+		}
+		tmp += size;
+	}
+
+	*bulk_size = remaining_size;
+
+	/* copy over the net type */
+	nid_descr->ud_net_id.udn_net_type = ioc_nid->iud_net.ud_net_type;
+
+	CDEBUG(D_NET, "%u\n", nid_descr->ud_net_id.udn_net_type);
+
+	/* allocate the total memory required to copy this NID descriptor */
+	alloc_size = (sizeof(struct cfs_expr_list) * (expr_count + 1)) +
+		     (sizeof(struct cfs_range_expr) * (range_count));
+	buf = kzalloc(alloc_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* store the amount of memory allocated so we can free it later on */
+	nid_descr->ud_mem_size = alloc_size;
+
+	/* copy over the net number range */
+	range_count = ioc_nid->iud_net.ud_net_num_expr.le_count;
+	*bulk += sizeof(*ioc_nid);
+	CDEBUG(D_NET, "bulk = %p\n", *bulk);
+	copy_range_info(bulk, &buf, &nid_descr->ud_net_id.udn_net_num_range,
+			range_count);
+	CDEBUG(D_NET, "bulk = %p\n", *bulk);
+
+	/* copy over the NID descriptor */
+	for (i = 0; i < expr_count; i++) {
+		copy_range_info(bulk, &buf, &nid_descr->ud_addr_range, -1);
+		CDEBUG(D_NET, "bulk = %p\n", *bulk);
+	}
+
+	return 0;
+}
+
+int
+lnet_udsp_demarshal_add(void *bulk, u32 bulk_size)
+{
+	struct lnet_ioctl_udsp *ioc_udsp;
+	struct lnet_udsp *udsp;
+	int rc = -ENOMEM;
+	int idx;
+
+	if (bulk_size < sizeof(*ioc_udsp))
+		return -ENOSPC;
+
+	udsp = lnet_udsp_alloc();
+	if (!udsp)
+		return rc;
+
+	ioc_udsp = bulk;
+
+	udsp->udsp_action_type = ioc_udsp->iou_action_type;
+	udsp->udsp_action.udsp_priority = ioc_udsp->iou_action.priority;
+	idx = ioc_udsp->iou_idx;
+
+	CDEBUG(D_NET, "demarshal descr %u:%u:%d:%u\n", udsp->udsp_action_type,
+	       udsp->udsp_action.udsp_priority, idx, bulk_size);
+
+	bulk += sizeof(*ioc_udsp);
+	bulk_size -= sizeof(*ioc_udsp);
+
+	rc = copy_ioc_udsp_descr(&udsp->udsp_src, "SRC", &bulk, &bulk_size);
+	if (rc < 0)
+		goto free_udsp;
+
+	rc = copy_ioc_udsp_descr(&udsp->udsp_dst, "DST", &bulk, &bulk_size);
+	if (rc < 0)
+		goto free_udsp;
+
+	rc = copy_ioc_udsp_descr(&udsp->udsp_rte, "RTE", &bulk, &bulk_size);
+	if (rc < 0)
+		goto free_udsp;
+
+	return lnet_udsp_add_policy(udsp, idx);
+
+free_udsp:
+	lnet_udsp_free(udsp);
+	return rc;
+}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 16/41] lnet: Add the ioctl handler for "add policy"
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (14 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 15/41] lnet: Add the kernel level De-Marshalling API James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 17/41] lnet: ioctl handler for "delete policy" James Simmons
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Sonia Sharma <sharmaso@whamcloud.com>

The ioctl handler for "add policy" de-marshals the
udsp rules passed from userspace and then add the
rules if there is no copy of the same rules already
added. Apply the rules to the existing LNet
constructs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 1fdb21ac697c368 ("LU-9121 lnet: Add the ioctl handler for "add policy"")
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34514
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/libcfs_ioctl.h |  3 ++-
 net/lnet/lnet/api-ni.c                 | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index c9d42361..6cffa01 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -150,6 +150,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_SET_HEALHV		_IOWR(IOC_LIBCFS_TYPE, 102, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_LOCAL_HSTATS	_IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_RECOVERY_QUEUE	_IOWR(IOC_LIBCFS_TYPE, 104, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR		104
+#define IOC_LIBCFS_ADD_UDSP		_IOWR(IOC_LIBCFS_TYPE, 105, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       105
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 9ff2776..fe8da30 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4126,6 +4126,22 @@ u32 lnet_get_dlc_seq_locked(void)
 		return 0;
 	}
 
+	case IOC_LIBCFS_ADD_UDSP: {
+		struct lnet_ioctl_udsp *ioc_udsp = arg;
+		u32 bulk_size = ioc_udsp->iou_hdr.ioc_len;
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		rc = lnet_udsp_demarshal_add(arg, bulk_size);
+		if (!rc) {
+			rc = lnet_udsp_apply_policies(NULL, false);
+			CDEBUG(D_NET, "policy application returned %d\n", rc);
+			rc = 0;
+		}
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		return rc;
+	}
+
 	default:
 		ni = lnet_net2ni_addref(data->ioc_net);
 		if (!ni)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 17/41] lnet: ioctl handler for "delete policy"
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (15 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 16/41] lnet: Add the ioctl handler for "add policy" James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 18/41] lnet: ioctl handler for get policy info James Simmons
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Sonia Sharma <sharmaso@whamcloud.com>

The ioctl handler for "delete policy" deletes
a policy with the given index value. It
returns 0 if a policy with that index is found
else it returns -EINVAL.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 8258e7235c7047b ("LU-9121 lnet: ioctl handler for "delete policy"")
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34552
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/libcfs_ioctl.h |  3 ++-
 net/lnet/lnet/api-ni.c                 | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 6cffa01..9e3c427 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -151,6 +151,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_GET_LOCAL_HSTATS	_IOWR(IOC_LIBCFS_TYPE, 103, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_RECOVERY_QUEUE	_IOWR(IOC_LIBCFS_TYPE, 104, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_ADD_UDSP		_IOWR(IOC_LIBCFS_TYPE, 105, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR				       105
+#define IOC_LIBCFS_DEL_UDSP		_IOWR(IOC_LIBCFS_TYPE, 106, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       106
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index fe8da30..50f7b9e 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4142,6 +4142,26 @@ u32 lnet_get_dlc_seq_locked(void)
 		return rc;
 	}
 
+	case IOC_LIBCFS_DEL_UDSP: {
+		struct lnet_ioctl_udsp *ioc_udsp = arg;
+		int idx = ioc_udsp->iou_idx;
+
+		if (ioc_udsp->iou_hdr.ioc_len < sizeof(*ioc_udsp))
+			return -EINVAL;
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		rc = lnet_udsp_del_policy(idx);
+		if (!rc) {
+			rc = lnet_udsp_apply_policies(NULL, false);
+			CDEBUG(D_NET, "policy re-application returned %d\n",
+			       rc);
+			rc = 0;
+		}
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		return rc;
+	}
+
 	default:
 		ni = lnet_net2ni_addref(data->ioc_net);
 		if (!ni)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 18/41] lnet: ioctl handler for get policy info
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (16 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 17/41] lnet: ioctl handler for "delete policy" James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 19/41] lustre: update version to 2.14.50 James Simmons
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Add ioctl handler for GET_UDSP_SIZE and GET_UDSP

WC-bug-id: https://jira.whamcloud.com/browse/LU-9121
Lustre-commit: 6248e1cd7fb70f4 ("LU-9121 lnet: ioctl handler for get policy info")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34579
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/udsp.h              |  7 +++
 include/uapi/linux/lnet/libcfs_ioctl.h |  5 +-
 net/lnet/lnet/api-ni.c                 | 64 +++++++++++++++++++++++++
 net/lnet/lnet/udsp.c                   | 88 ++++++++++++++++++++++++++++++++++
 4 files changed, 163 insertions(+), 1 deletion(-)

diff --git a/include/linux/lnet/udsp.h b/include/linux/lnet/udsp.h
index 3683d43..188dce4 100644
--- a/include/linux/lnet/udsp.h
+++ b/include/linux/lnet/udsp.h
@@ -134,4 +134,11 @@ int lnet_udsp_marshal(struct lnet_udsp *udsp,
  */
 int lnet_udsp_demarshal_add(void *bulk, u32 bulk_size);
 
+/**
+ * lnet_udsp_get_construct_info
+ *	get information of how the UDSP policies impacted the given
+ *	construct.
+ */
+void lnet_udsp_get_construct_info(struct lnet_ioctl_construct_udsp_info *info);
+
 #endif /* UDSP_H */
diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 9e3c427..d0b29c52 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -152,6 +152,9 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_GET_RECOVERY_QUEUE	_IOWR(IOC_LIBCFS_TYPE, 104, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_ADD_UDSP		_IOWR(IOC_LIBCFS_TYPE, 105, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_DEL_UDSP		_IOWR(IOC_LIBCFS_TYPE, 106, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR				       106
+#define IOC_LIBCFS_GET_UDSP_SIZE	_IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_UDSP		_IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_GET_CONST_UDSP_INFO	_IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       109
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 50f7b9e..f121d69 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4162,6 +4162,70 @@ u32 lnet_get_dlc_seq_locked(void)
 		return rc;
 	}
 
+	case IOC_LIBCFS_GET_UDSP_SIZE: {
+		struct lnet_ioctl_udsp *ioc_udsp = arg;
+		struct lnet_udsp *udsp;
+
+		if (ioc_udsp->iou_hdr.ioc_len < sizeof(*ioc_udsp))
+			return -EINVAL;
+
+		rc = 0;
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		udsp = lnet_udsp_get_policy(ioc_udsp->iou_idx);
+		if (!udsp) {
+			rc = -ENOENT;
+		} else {
+			/* coming in iou_idx will hold the idx of the udsp
+			 * to get the size of. going out the iou_idx will
+			 * hold the size of the UDSP found at the passed
+			 * in index.
+			 */
+			ioc_udsp->iou_idx = lnet_get_udsp_size(udsp);
+			if (ioc_udsp->iou_idx < 0)
+				rc = -EINVAL;
+		}
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		return rc;
+	}
+
+	case IOC_LIBCFS_GET_UDSP: {
+		struct lnet_ioctl_udsp *ioc_udsp = arg;
+		struct lnet_udsp *udsp;
+
+		if (ioc_udsp->iou_hdr.ioc_len < sizeof(*ioc_udsp))
+			return -EINVAL;
+
+		rc = 0;
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		udsp = lnet_udsp_get_policy(ioc_udsp->iou_idx);
+		if (!udsp)
+			rc = -ENOENT;
+		else
+			rc = lnet_udsp_marshal(udsp, ioc_udsp);
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		return rc;
+	}
+
+	case IOC_LIBCFS_GET_CONST_UDSP_INFO: {
+		struct lnet_ioctl_construct_udsp_info *info = arg;
+
+		if (info->cud_hdr.ioc_len < sizeof(*info))
+			return -EINVAL;
+
+		CDEBUG(D_NET, "GET_UDSP_INFO for %s\n",
+		       libcfs_nid2str(info->cud_nid));
+
+		mutex_lock(&the_lnet.ln_api_mutex);
+		lnet_udsp_get_construct_info(info);
+		mutex_unlock(&the_lnet.ln_api_mutex);
+
+		return 0;
+	}
+
 	default:
 		ni = lnet_net2ni_addref(data->ioc_net);
 		if (!ni)
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
index f686ff2..516db98 100644
--- a/net/lnet/lnet/udsp.c
+++ b/net/lnet/lnet/udsp.c
@@ -980,6 +980,94 @@ struct lnet_udsp *
 	return 0;
 }
 
+static void
+lnet_udsp_get_ni_info(struct lnet_ioctl_construct_udsp_info *info,
+		      struct lnet_ni *ni)
+{
+	struct lnet_nid_list *ne;
+	struct lnet_net *net = ni->ni_net;
+	int i = 0;
+
+	LASSERT(ni);
+
+	info->cud_nid_priority = ni->ni_sel_priority;
+	if (net) {
+		info->cud_net_priority = ni->ni_net->net_sel_priority;
+		list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) {
+			if (i < LNET_MAX_SHOW_NUM_NID)
+				info->cud_pref_rtr_nid[i] = ne->nl_nid;
+			else
+				break;
+			i++;
+		}
+	}
+}
+
+static void
+lnet_udsp_get_peer_info(struct lnet_ioctl_construct_udsp_info *info,
+			struct lnet_peer_ni *lpni)
+{
+	struct lnet_nid_list *ne;
+	int i = 0;
+
+	/* peer tree structure needs to be in existence */
+	LASSERT(lpni && lpni->lpni_peer_net &&
+		lpni->lpni_peer_net->lpn_peer);
+
+	info->cud_nid_priority = lpni->lpni_sel_priority;
+	CDEBUG(D_NET, "lpni %s has %d pref nids\n",
+	       libcfs_nid2str(lpni->lpni_nid),
+	       lpni->lpni_pref_nnids);
+	if (lpni->lpni_pref_nnids == 1) {
+		info->cud_pref_nid[0] = lpni->lpni_pref.nid;
+	} else if (lpni->lpni_pref_nnids > 1) {
+		struct list_head *list = &lpni->lpni_pref.nids;
+
+		list_for_each_entry(ne, list, nl_list) {
+			if (i < LNET_MAX_SHOW_NUM_NID)
+				info->cud_pref_nid[i] = ne->nl_nid;
+			else
+				break;
+			i++;
+		}
+	}
+
+	i = 0;
+	list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) {
+		if (i < LNET_MAX_SHOW_NUM_NID)
+			info->cud_pref_rtr_nid[i] = ne->nl_nid;
+		else
+			break;
+		i++;
+	}
+
+	info->cud_net_priority = lpni->lpni_peer_net->lpn_sel_priority;
+}
+
+void
+lnet_udsp_get_construct_info(struct lnet_ioctl_construct_udsp_info *info)
+{
+	struct lnet_ni *ni;
+	struct lnet_peer_ni *lpni;
+
+	lnet_net_lock(0);
+	if (!info->cud_peer) {
+		ni = lnet_nid2ni_locked(info->cud_nid, 0);
+		if (ni)
+			lnet_udsp_get_ni_info(info, ni);
+	} else {
+		lpni = lnet_find_peer_ni_locked(info->cud_nid);
+		if (!lpni) {
+			CDEBUG(D_NET, "nid %s is not found\n",
+			       libcfs_nid2str(info->cud_nid));
+		} else {
+			lnet_udsp_get_peer_info(info, lpni);
+			lnet_peer_ni_decref_locked(lpni);
+		}
+	}
+	lnet_net_unlock(0);
+}
+
 struct lnet_udsp *
 lnet_udsp_alloc(void)
 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 19/41] lustre: update version to 2.14.50
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (17 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 18/41] lnet: ioctl handler for get policy info James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 20/41] lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch James Simmons
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.50

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index c02a322..1e2b148 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 0
+#define LUSTRE_PATCH 50
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.0"
+#define LUSTRE_VERSION_STRING "2.14.50"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 20/41] lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (18 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 19/41] lustre: update version to 2.14.50 James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 21/41] lustre: sec: file ioctls to handle encryption policies James Simmons
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

In sptlrpc_req_ctx_switch(), everything is already there to handle
the case of a ptlrpc_request that has an empty rq_reqmsg.
But assertions were left over at the beginning of the function, so
just remove them from here.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14444
Lustre-commit: dfe87b089b66266 ("LU-14444 gss: handle empty reqmsg in sptlrpc_req_ctx_switch")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/41685
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/sec.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 5b5faac..1822dbd 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -431,10 +431,6 @@ int sptlrpc_req_ctx_switch(struct ptlrpc_request *req,
 	int reqmsg_size;
 	int rc = 0;
 
-	LASSERT(req->rq_reqmsg);
-	LASSERT(req->rq_reqlen);
-	LASSERT(req->rq_replen);
-
 	CDEBUG(D_SEC,
 	       "req %p: switch ctx %p(%u->%s) -> %p(%u->%s), switch sec %p(%s) -> %p(%s)\n",
 	       req,
@@ -449,6 +445,7 @@ int sptlrpc_req_ctx_switch(struct ptlrpc_request *req,
 	/* save request message */
 	reqmsg_size = req->rq_reqlen;
 	if (reqmsg_size != 0) {
+		LASSERT(req->rq_reqmsg);
 		reqmsg = kvzalloc(reqmsg_size, GFP_NOFS);
 		if (!reqmsg)
 			return -ENOMEM;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 21/41] lustre: sec: file ioctls to handle encryption policies
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (19 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 20/41] lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 22/41] lustre: obdclass: try to skip corrupted llog records James Simmons
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Introduce support for fscrypt IOCTLs that handle encryption
policies v2. It enables setting/getting encryption policies on
individual directories, letting users decide how they want to
encrypt specific directories. Add file IOCTls this time.

fscrypt encryption policies v2 are supported from Linux 5.4.

Fixes: b8b71993 ("lustre: sec: ioctls to handle encryption policies")
WC-bug-id: https://jira.whamcloud.com/browse/LU-12275
Lustre-commit: 3973cf8dc955c77 ("LU-12275 sec: ioctls to handle encryption policies")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/37673
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index fd01e14..60b6ac4 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3989,6 +3989,33 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 		kfree(state);
 		return rc;
 	}
+#ifdef CONFIG_FS_ENCRYPTION
+	case FS_IOC_SET_ENCRYPTION_POLICY:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_set_policy(file, (const void __user *)arg);
+	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+	case FS_IOC_ADD_ENCRYPTION_KEY:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_add_key(file, (void __user *)arg);
+	case FS_IOC_REMOVE_ENCRYPTION_KEY:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_remove_key(file, (void __user *)arg);
+	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_remove_key_all_users(file,
+							  (void __user *)arg);
+	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
+		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
+			return -EOPNOTSUPP;
+		return llcrypt_ioctl_get_key_status(file, (void __user *)arg);
+#endif
 	default:
 		return obd_iocontrol(cmd, ll_i2dtexp(inode), 0, NULL,
 				     (void __user *)arg);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 22/41] lustre: obdclass: try to skip corrupted llog records
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (20 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 21/41] lustre: sec: file ioctls to handle encryption policies James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 23/41] lustre: lov: fix layout generation inc for mirror split James Simmons
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14098
Lustre-commit: 910eb97c1b43a44 ("LU-14098 obdclass: try to skip corrupted llog records")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40754
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/llog.c          | 76 ++++++++++++++++++++++++++++++--------
 fs/lustre/obdclass/llog_cat.c      | 14 +++----
 fs/lustre/obdclass/llog_internal.h |  5 +++
 3 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c
index e172ebc..7668d51 100644
--- a/fs/lustre/obdclass/llog.c
+++ b/fs/lustre/obdclass/llog.c
@@ -184,7 +184,7 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle,
 			     (llh->llh_flags & LLOG_F_IS_CAT &&
 			      flags & LLOG_F_IS_PLAIN))) {
 			CERROR("%s: llog type is %s but initializing %s\n",
-			       handle->lgh_ctxt->loc_obd->obd_name,
+			       loghandle2name(handle),
 			       llh->llh_flags & LLOG_F_IS_CAT ?
 			       "catalog" : "plain",
 			       flags & LLOG_F_IS_CAT ? "catalog" : "plain");
@@ -206,7 +206,7 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle,
 		if (unlikely(uuid &&
 			     !obd_uuid_equals(uuid, &llh->llh_tgtuuid))) {
 			CERROR("%s: llog uuid mismatch: %s/%s\n",
-			       handle->lgh_ctxt->loc_obd->obd_name,
+			       loghandle2name(handle),
 			       (char *)uuid->uuid,
 			       (char *)llh->llh_tgtuuid.uuid);
 			rc = -EEXIST;
@@ -220,8 +220,8 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle,
 		llh->llh_flags |= LLOG_F_IS_FIXSIZE;
 	} else if (!(flags & LLOG_F_IS_PLAIN)) {
 		CERROR("%s: unknown flags: %#x (expected %#x or %#x)\n",
-		       handle->lgh_ctxt->loc_obd->obd_name,
-		       flags, LLOG_F_IS_CAT, LLOG_F_IS_PLAIN);
+		       loghandle2name(handle), flags, LLOG_F_IS_CAT,
+		       LLOG_F_IS_PLAIN);
 		rc = -EINVAL;
 	}
 	llh->llh_flags |= fmt;
@@ -234,6 +234,29 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle,
 }
 EXPORT_SYMBOL(llog_init_handle);
 
+int llog_verify_record(const struct llog_handle *llh, struct llog_rec_hdr *rec)
+{
+	int chunk_size = llh->lgh_hdr->llh_hdr.lrh_len;
+
+	if (rec->lrh_len == 0 || rec->lrh_len > chunk_size) {
+		CERROR("%s: record is too large: %d > %d\n",
+		       loghandle2name(llh), rec->lrh_len, chunk_size);
+		return -EINVAL;
+	}
+	if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr)) {
+		CERROR("%s: index is too high: %d\n",
+		       loghandle2name(llh), rec->lrh_index);
+		return -EINVAL;
+	}
+	if ((rec->lrh_type & LLOG_OP_MASK) != LLOG_OP_MAGIC) {
+		CERROR("%s: magic %x is bad\n",
+		       loghandle2name(llh), rec->lrh_type);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int llog_process_thread(void *arg)
 {
 	struct llog_process_info *lpi = arg;
@@ -247,6 +270,7 @@ static int llog_process_thread(void *arg)
 	int saved_index = 0;
 	int last_called_index = 0;
 	bool repeated = false;
+	bool refresh_idx = false;
 
 	if (!llh)
 		return -EINVAL;
@@ -380,12 +404,21 @@ static int llog_process_thread(void *arg)
 
 			repeated = false;
 
-			if (!rec->lrh_len || rec->lrh_len > chunk_size) {
-				CWARN("invalid length %d in llog record for index %d/%d\n",
-				      rec->lrh_len,
-				      rec->lrh_index, index);
-				rc = -EINVAL;
-				goto out;
+			rc = llog_verify_record(loghandle, rec);
+			if (rc) {
+				CERROR("%s: invalid record in llog "DFID" record for index %d/%d: rc = %d\n",
+				       loghandle2name(loghandle),
+                                       PFID(&loghandle->lgh_id.lgl_oi.oi_fid),
+				       rec->lrh_len, index, rc);
+				/*
+				 * the block seem to be corrupted, let's try
+				 * with the next one. reset rc to go to the
+				 * next chunk.
+				 */
+				refresh_idx = true;
+				index = 0;
+				rc = 0;
+				goto repeat;
 			}
 
 			if (rec->lrh_index < index) {
@@ -395,11 +428,22 @@ static int llog_process_thread(void *arg)
 			}
 
 			if (rec->lrh_index != index) {
-				CERROR("%s: Invalid record: index %u but expected %u\n",
-				       loghandle->lgh_ctxt->loc_obd->obd_name,
-				       rec->lrh_index, index);
-				rc = -ERANGE;
-				goto out;
+				/*
+				 * the last time we couldn't parse the block due
+				 * to corruption, thus has no idea about the
+				 * next index, take it from the block, once.
+				 */
+				if (refresh_idx) {
+					refresh_idx = false;
+					index = rec->lrh_index;
+				} else {
+					CERROR("%s: "DFID" Invalid record: index %u but expected %u\n",
+					       loghandle2name(loghandle),
+					       PFID(&loghandle->lgh_id.lgl_oi.oi_fid),
+					       rec->lrh_index, index);
+					rc = -ERANGE;
+					goto out;
+				}
 			}
 
 			CDEBUG(D_OTHER,
@@ -501,7 +545,7 @@ int llog_process_or_fork(const struct lu_env *env,
 		if (IS_ERR(task)) {
 			rc = PTR_ERR(task);
 			CERROR("%s: cannot start thread: rc = %d\n",
-			       loghandle->lgh_ctxt->loc_obd->obd_name, rc);
+			       loghandle2name(loghandle), rc);
 			goto out_lpi;
 		}
 		wait_for_completion(&lpi->lpi_completion);
diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c
index 9298808..b67e7a2b 100644
--- a/fs/lustre/obdclass/llog_cat.c
+++ b/fs/lustre/obdclass/llog_cat.c
@@ -80,7 +80,7 @@ static int llog_cat_id2handle(const struct lu_env *env,
 		    ostid_seq(&cgl->lgl_oi) == ostid_seq(&logid->lgl_oi)) {
 			if (cgl->lgl_ogen != logid->lgl_ogen) {
 				CWARN("%s: log " DFID " generation %x != %x\n",
-				      loghandle->lgh_ctxt->loc_obd->obd_name,
+				      loghandle2name(loghandle),
 				      PFID(&logid->lgl_oi.oi_fid),
 				      cgl->lgl_ogen, logid->lgl_ogen);
 				continue;
@@ -88,7 +88,7 @@ static int llog_cat_id2handle(const struct lu_env *env,
 			*res = llog_handle_get(loghandle);
 			if (!*res) {
 				CERROR("%s: log "DFID" refcount is zero!\n",
-				       loghandle->lgh_ctxt->loc_obd->obd_name,
+				       loghandle2name(loghandle),
 				       PFID(&logid->lgl_oi.oi_fid));
 				continue;
 			}
@@ -103,8 +103,8 @@ static int llog_cat_id2handle(const struct lu_env *env,
 		       LLOG_OPEN_EXISTS);
 	if (rc < 0) {
 		CERROR("%s: error opening log id " DFID ":%x: rc = %d\n",
-		       cathandle->lgh_ctxt->loc_obd->obd_name,
-		       PFID(&logid->lgl_oi.oi_fid), logid->lgl_ogen, rc);
+		       loghandle2name(cathandle), PFID(&logid->lgl_oi.oi_fid),
+		       logid->lgl_ogen, rc);
 		return rc;
 	}
 
@@ -155,7 +155,7 @@ static int llog_cat_process_common(const struct lu_env *env,
 	if (rec->lrh_type != le32_to_cpu(LLOG_LOGID_MAGIC)) {
 		rc = -EINVAL;
 		CWARN("%s: invalid record in catalog " DFID ":%x: rc = %d\n",
-		      cat_llh->lgh_ctxt->loc_obd->obd_name,
+		      loghandle2name(cat_llh),
 		      PFID(&cat_llh->lgh_id.lgl_oi.oi_fid),
 		      cat_llh->lgh_id.lgl_ogen, rc);
 
@@ -170,7 +170,7 @@ static int llog_cat_process_common(const struct lu_env *env,
 	rc = llog_cat_id2handle(env, cat_llh, llhp, &lir->lid_id);
 	if (rc) {
 		CWARN("%s: can't find llog handle " DFID ":%x: rc = %d\n",
-		      cat_llh->lgh_ctxt->loc_obd->obd_name,
+		      loghandle2name(cat_llh),
 		      PFID(&lir->lid_id.lgl_oi.oi_fid),
 		      lir->lid_id.lgl_ogen, rc);
 
@@ -235,7 +235,7 @@ static int llog_cat_process_or_fork(const struct lu_env *env,
 		struct llog_process_cat_data cd;
 
 		CWARN("%s: catlog " DFID " crosses index zero\n",
-		      cat_llh->lgh_ctxt->loc_obd->obd_name,
+		      loghandle2name(cat_llh),
 		      PFID(&cat_llh->lgh_id.lgl_oi.oi_fid));
 		/*startcat = 0 is default value for general processing */
 		if ((startcat != LLOG_CAT_FIRST &&
diff --git a/fs/lustre/obdclass/llog_internal.h b/fs/lustre/obdclass/llog_internal.h
index c34adfe..41ac4f0 100644
--- a/fs/lustre/obdclass/llog_internal.h
+++ b/fs/lustre/obdclass/llog_internal.h
@@ -74,4 +74,9 @@ static inline struct llog_rec_hdr *llog_rec_hdr_next(struct llog_rec_hdr *rec)
 {
 	return (struct llog_rec_hdr *)((char *)rec + rec->lrh_len);
 }
+
+static inline char *loghandle2name(const struct llog_handle *lgh)
+{
+	return lgh->lgh_ctxt->loc_obd->obd_name;
+}
 #endif
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 23/41] lustre: lov: fix layout generation inc for mirror split
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (21 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 22/41] lustre: obdclass: try to skip corrupted llog records James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 24/41] lnet: modify assertion in lnet_post_send_locked James Simmons
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

Mirror split does not increase the layout generation properly.

Mirror split does not change FLR state of the file, even when it
contains 1 mirror afterwards, and FLR state should be LCM_FL_NONE
instead.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14268
Lustre-commit: ffa858b1657145c ("LU-14268 lod: fix layout generation inc for mirror split")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41068
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_ea.c     | 4 ++--
 fs/lustre/lov/lov_object.c | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/lov/lov_ea.c b/fs/lustre/lov/lov_ea.c
index 75a19a4..f6b3df0 100644
--- a/fs/lustre/lov/lov_ea.c
+++ b/fs/lustre/lov/lov_ea.c
@@ -635,10 +635,10 @@ void dump_lsm(unsigned int level, const struct lov_stripe_md *lsm)
 	int i, j;
 
 	CDEBUG_LIMIT(level,
-		     "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, layout_gen %u\n",
+		     "lsm %p, objid " DOSTID ", maxbytes %#llx, magic 0x%08X, refc: %d, entry: %u, mirror: %u, flags: %u,layout_gen %u\n",
 	       lsm, POSTID(&lsm->lsm_oi), lsm->lsm_maxbytes, lsm->lsm_magic,
 	       atomic_read(&lsm->lsm_refc), lsm->lsm_entry_count,
-	       lsm->lsm_layout_gen);
+	       lsm->lsm_mirror_count, lsm->lsm_flags, lsm->lsm_layout_gen);
 
 	if (lsm->lsm_magic == LOV_MAGIC_FOREIGN) {
 		struct lov_foreign_md *lfm = (void *)lsm_foreign(lsm);
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 5d618c1..abe1cee 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -1374,6 +1374,7 @@ static int lov_conf_set(const struct lu_env *env, struct cl_object *obj,
 	if ((!lsm && !lov->lo_lsm) ||
 	    ((lsm && lov->lo_lsm) &&
 	     (lov->lo_lsm->lsm_layout_gen == lsm->lsm_layout_gen) &&
+	     (lov->lo_lsm->lsm_flags == lsm->lsm_flags) &&
 	     (lov->lo_lsm->lsm_entries[0]->lsme_pattern ==
 	      lsm->lsm_entries[0]->lsme_pattern))) {
 		/* same version of layout */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 24/41] lnet: modify assertion in lnet_post_send_locked
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (22 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 23/41] lustre: lov: fix layout generation inc for mirror split James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 25/41] lustre: lov: fixes bitfield in lod qos code James Simmons
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Check that the pointer to the local interface is not NULL
before asserting. While checking if local ni is the destination,
the assertion may attempt to dereference pointer to local
interface after it has already been cleaned up on shutdown.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13929
Lustre-commit: e5a8f3fc12840ae ("LU-13929 lnet: modify assertion in lnet_post_send_locked")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40749
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 4dcc68a..de17de4b 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -646,8 +646,10 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	LASSERT(!do_send || msg->msg_tx_delayed);
 	LASSERT(!msg->msg_receiving);
 	LASSERT(msg->msg_tx_committed);
+
 	/* can't get here if we're sending to the loopback interface */
-	LASSERT(lp->lpni_nid != the_lnet.ln_loni->ni_nid);
+	if (the_lnet.ln_loni)
+		LASSERT(lp->lpni_nid != the_lnet.ln_loni->ni_nid);
 
 	/* NB 'lp' is always the next hop */
 	if (!(msg->msg_target.pid & LNET_PID_USERFLAG) &&
@@ -1576,6 +1578,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	struct lnet_msg *msg = sd->sd_msg;
 	int cpt = sd->sd_cpt;
 
+	if (the_lnet.ln_state != LNET_STATE_RUNNING)
+		return -ESHUTDOWN;
+
 	/* No send credit hassles with LOLND */
 	lnet_ni_addref_locked(the_lnet.ln_loni, cpt);
 	msg->msg_hdr.dest_nid = cpu_to_le64(the_lnet.ln_loni->ni_nid);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 25/41] lustre: lov: fixes bitfield in lod qos code
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (23 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 24/41] lnet: modify assertion in lnet_post_send_locked James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 26/41] lustre: lov: grant deadlock if same OSC in two components James Simmons
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Alexander Zarochentsev, Lustre Development List

From: Rahul Deshmkuh <rahul.deshmukh@seagate.com>

Updating bitfields in struct lod_qos struct is protected
by lq_rw_sem in most places but an update can be lost
due unprotected bitfield access from
lod_qos_thresholdrr_seq_write() and qos_prio_free_store().
This patch fixes it by replacing bitfields with named bits
and atomic bitops.

Cray-bug-id: LUS-4651
WC-bug-id: https://jira.whamcloud.com/browse/LU-7853
Lustre-commit: 3bae39f0a5b98a2 ("LU-7853 lod: fixes bitfield in lod qos code")
Signed-off-by: Rahul Deshmukh <rahul.deshmukh@seagate.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-on: https://review.whamcloud.com/18812
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lu_object.h     | 17 +++++++++++------
 fs/lustre/lmv/lmv_obd.c           |  2 +-
 fs/lustre/lmv/lproc_lmv.c         |  6 +++---
 fs/lustre/obdclass/lu_tgt_descs.c | 38 ++++++++++++++++++++------------------
 4 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index 6c47f43..34610d4 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -1405,6 +1405,15 @@ struct lu_kmem_descr {
 extern u32 lu_context_tags_default;
 extern u32 lu_session_tags_default;
 
+/* bitflags used in rr / qos allocation */
+enum lq_flag {
+	LQ_DIRTY	= 0,	/* recalc qos data */
+	LQ_SAME_SPACE,		/* the OSTs all have approx.
+				 * the same space avail
+				 */
+	LQ_RESET,		/* zero current penalties */
+};
+
 /* round-robin QoS data for LOD/LMV */
 struct lu_qos_rr {
 	spinlock_t		 lqr_alloc;	/* protect allocation index */
@@ -1412,7 +1421,7 @@ struct lu_qos_rr {
 	u32			 lqr_offset_idx;/* aliasing for start_idx */
 	int			 lqr_start_count;/* reseed counter */
 	struct lu_tgt_pool	 lqr_pool;	/* round-robin optimized list */
-	unsigned long		 lqr_dirty:1;	/* recalc round-robin list */
+	unsigned long		 lqr_flags;
 };
 
 /* QoS data per MDS/OSS */
@@ -1482,11 +1491,7 @@ struct lu_qos {
 	unsigned int		 lq_prio_free;	 /* priority for free space */
 	unsigned int		 lq_threshold_rr;/* priority for rr */
 	struct lu_qos_rr	 lq_rr;		 /* round robin qos data */
-	unsigned long		 lq_dirty:1,	 /* recalc qos data */
-				 lq_same_space:1,/* the servers all have approx.
-						  * the same space avail
-						  */
-				 lq_reset:1;	 /* zero current penalties */
+	unsigned long		lq_flags;
 };
 
 struct lu_tgt_descs {
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index d845118..747786e 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1303,7 +1303,7 @@ static int lmv_statfs_update(void *cookie, int rc)
 		tgt->ltd_statfs = *osfs;
 		tgt->ltd_statfs_age = ktime_get_seconds();
 		spin_unlock(&lmv->lmv_lock);
-		lmv->lmv_qos.lq_dirty = 1;
+		set_bit(LQ_DIRTY, &lmv->lmv_qos.lq_flags);
 	}
 
 	return rc;
diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c
index 59922b8..85963d2 100644
--- a/fs/lustre/lmv/lproc_lmv.c
+++ b/fs/lustre/lmv/lproc_lmv.c
@@ -133,8 +133,8 @@ static ssize_t qos_prio_free_store(struct kobject *kobj,
 		return -EINVAL;
 
 	lmv->lmv_qos.lq_prio_free = (val << 8) / 100;
-	lmv->lmv_qos.lq_dirty = 1;
-	lmv->lmv_qos.lq_reset = 1;
+	set_bit(LQ_DIRTY, &lmv->lmv_qos.lq_flags);
+	set_bit(LQ_RESET, &lmv->lmv_qos.lq_flags);
 
 	return count;
 }
@@ -170,7 +170,7 @@ static ssize_t qos_threshold_rr_store(struct kobject *kobj,
 		return -EINVAL;
 
 	lmv->lmv_qos.lq_threshold_rr = (val << 8) / 100;
-	lmv->lmv_qos.lq_dirty = 1;
+	set_bit(LQ_DIRTY, &lmv->lmv_qos.lq_flags);
 
 	return count;
 }
diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index 469c935..a77ce20 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -80,7 +80,7 @@ u64 lu_prandom_u64_max(u64 ep_ro)
 void lu_qos_rr_init(struct lu_qos_rr *lqr)
 {
 	spin_lock_init(&lqr->lqr_alloc);
-	lqr->lqr_dirty = 1;
+	set_bit(LQ_DIRTY, &lqr->lqr_flags);
 }
 EXPORT_SYMBOL(lu_qos_rr_init);
 
@@ -158,9 +158,8 @@ int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *tgt)
 	 */
 	list_add_tail(&svr->lsq_svr_list, &tempsvr->lsq_svr_list);
 
-	qos->lq_dirty = 1;
-	qos->lq_rr.lqr_dirty = 1;
-
+	set_bit(LQ_DIRTY, &qos->lq_flags);
+	set_bit(LQ_DIRTY, &qos->lq_rr.lqr_flags);
 out:
 	up_write(&qos->lq_rw_sem);
 	return rc;
@@ -200,8 +199,8 @@ static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd)
 		kfree(svr);
 	}
 
-	qos->lq_dirty = 1;
-	qos->lq_rr.lqr_dirty = 1;
+	set_bit(LQ_DIRTY, &qos->lq_flags);
+	set_bit(LQ_DIRTY, &qos->lq_rr.lqr_flags);
 out:
 	up_write(&qos->lq_rw_sem);
 	return rc;
@@ -273,8 +272,8 @@ int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt)
 	/* Set up allocation policy (QoS and RR) */
 	INIT_LIST_HEAD(&ltd->ltd_qos.lq_svr_list);
 	init_rwsem(&ltd->ltd_qos.lq_rw_sem);
-	ltd->ltd_qos.lq_dirty = 1;
-	ltd->ltd_qos.lq_reset = 1;
+	set_bit(LQ_DIRTY, &ltd->ltd_qos.lq_flags);
+	set_bit(LQ_RESET, &ltd->ltd_qos.lq_flags);
 	/* Default priority is toward free space balance */
 	ltd->ltd_qos.lq_prio_free = 232;
 	/* Default threshold for rr (roughly 17%) */
@@ -416,7 +415,8 @@ void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt)
  */
 bool ltd_qos_is_usable(struct lu_tgt_descs *ltd)
 {
-	if (!ltd->ltd_qos.lq_dirty && ltd->ltd_qos.lq_same_space)
+	if (!test_bit(LQ_DIRTY, &ltd->ltd_qos.lq_flags) &&
+	    test_bit(LQ_SAME_SPACE, &ltd->ltd_qos.lq_flags))
 		return false;
 
 	if (ltd->ltd_lov_desc.ld_active_tgt_count < 2)
@@ -456,7 +456,7 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd)
 	time64_t now, age;
 	int rc;
 
-	if (!qos->lq_dirty) {
+	if (!test_bit(LQ_DIRTY, &qos->lq_flags)) {
 		rc = 0;
 		goto out;
 	}
@@ -531,7 +531,8 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd)
 		tgt->ltd_qos.ltq_penalty_per_obj >>= 1;
 
 		age = (now - tgt->ltd_qos.ltq_used) >> 3;
-		if (qos->lq_reset || age > 32 * desc->ld_qos_maxage)
+		if (test_bit(LQ_RESET, &qos->lq_flags) ||
+		    age > 32 * desc->ld_qos_maxage)
 			tgt->ltd_qos.ltq_penalty = 0;
 		else if (age > desc->ld_qos_maxage)
 			/* Decay tgt penalty. */
@@ -566,31 +567,32 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd)
 		svr->lsq_penalty_per_obj >>= 1;
 
 		age = (now - svr->lsq_used) >> 3;
-		if (qos->lq_reset || age > 32 * desc->ld_qos_maxage)
+		if (test_bit(LQ_RESET, &qos->lq_flags) ||
+		    age > 32 * desc->ld_qos_maxage)
 			svr->lsq_penalty = 0;
 		else if (age > desc->ld_qos_maxage)
 			/* Decay server penalty. */
 			svr->lsq_penalty >>= age / desc->ld_qos_maxage;
 	}
 
-	qos->lq_dirty = 0;
-	qos->lq_reset = 0;
+	clear_bit(LQ_DIRTY, &qos->lq_flags);
+	clear_bit(LQ_RESET, &qos->lq_flags);
 
 	/*
 	 * If each tgt has almost same free space, do rr allocation for better
 	 * creation performance
 	 */
-	qos->lq_same_space = 0;
+	clear_bit(LQ_SAME_SPACE, &qos->lq_flags);
 	if ((ba_max * (256 - qos->lq_threshold_rr)) >> 8 < ba_min &&
 	    (ia_max * (256 - qos->lq_threshold_rr)) >> 8 < ia_min) {
-		qos->lq_same_space = 1;
+		set_bit(LQ_SAME_SPACE, &qos->lq_flags);
 		/* Reset weights for the next time we enter qos mode */
-		qos->lq_reset = 1;
+		set_bit(LQ_RESET, &qos->lq_flags);
 	}
 	rc = 0;
 
 out:
-	if (!rc && qos->lq_same_space)
+	if (!rc && test_bit(LQ_SAME_SPACE, &qos->lq_flags))
 		return -EAGAIN;
 
 	return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 26/41] lustre: lov: grant deadlock if same OSC in two components
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (24 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 25/41] lustre: lov: fixes bitfield in lod qos code James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 27/41] lustre: change EWOULDBLOCK to EAGAIN James Simmons
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Andriy Skulysh, Lustre Development List

From: Andriy Skulysh <c17819@cray.com>

The same osc can be involved in several components but osc layer
leaves active last used extent, so an RPC can't be sent if grants
are required from the same OST for another component.

Add cl_io_extent_release() to release active extent before
switching to the next component.

Cray-bug-id: LUS-8038
WC-bug-id: https://jira.whamcloud.com/browse/LU-13100
Lustre-commit: 2070e9bcc0c1bd2 ("LU-13100 lov: grant deadlock if same OSC in two components")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-on: https://review.whamcloud.com/37095
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h  |  6 ++++++
 fs/lustre/include/lustre_osc.h |  2 ++
 fs/lustre/lov/lov_io.c         |  4 ++++
 fs/lustre/mdc/mdc_dev.c        |  1 +
 fs/lustre/obdclass/cl_io.c     | 12 ++++++++++++
 fs/lustre/osc/osc_io.c         | 15 ++++++++++++++-
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 739fe5b..2d08ddd 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1611,6 +1611,11 @@ struct cl_io_operations {
 				 struct cl_page_list *queue, int from, int to,
 				 cl_commit_cbt cb);
 	/**
+	 * Release active extent.
+	 */
+	void  (*cio_extent_release)(const struct lu_env *env,
+				    const struct cl_io_slice *slice);
+	/**
 	 * Decide maximum read ahead extent
 	 *
 	 * \pre io->ci_type == CIT_READ
@@ -2439,6 +2444,7 @@ int cl_io_submit_sync(const struct lu_env *env, struct cl_io *io,
 int cl_io_commit_async(const struct lu_env *env, struct cl_io *io,
 		       struct cl_page_list *queue, int from, int to,
 		       cl_commit_cbt cb);
+void cl_io_extent_release(const struct lu_env *env, struct cl_io *io);
 int cl_io_read_ahead(const struct lu_env *env, struct cl_io *io,
 		     pgoff_t start, struct cl_read_ahead *ra);
 
diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index e32723c..4575956 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -689,6 +689,8 @@ int osc_io_commit_async(const struct lu_env *env,
 			const struct cl_io_slice *ios,
 			struct cl_page_list *qin, int from, int to,
 			cl_commit_cbt cb);
+void osc_io_extent_release(const struct lu_env *env,
+			   const struct cl_io_slice *ios);
 int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios);
 void osc_io_iter_fini(const struct lu_env *env,
 		      const struct cl_io_slice *ios);
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index 2297e53..a8bba1c 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -1318,6 +1318,10 @@ static int lov_io_commit_async(const struct lu_env *env,
 			break;
 
 		from = 0;
+
+		if (lov_comp_entry(index) !=
+		    lov_comp_entry(page->cp_lov_index))
+			cl_io_extent_release(sub->sub_env, &sub->sub_io);
 	}
 
 	/* for error case, add the page back into the qin list */
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index e86e69d..68088ef 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1325,6 +1325,7 @@ static void mdc_io_data_version_end(const struct lu_env *env,
 	.cio_read_ahead		= mdc_io_read_ahead,
 	.cio_submit		= osc_io_submit,
 	.cio_commit_async	= osc_io_commit_async,
+	.cio_extent_release	= osc_io_extent_release,
 };
 
 int mdc_io_init(const struct lu_env *env, struct cl_object *obj,
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index c57a3766..cc5a503 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -597,6 +597,18 @@ int cl_io_commit_async(const struct lu_env *env, struct cl_io *io,
 }
 EXPORT_SYMBOL(cl_io_commit_async);
 
+void cl_io_extent_release(const struct lu_env *env, struct cl_io *io)
+{
+	const struct cl_io_slice *scan;
+
+	list_for_each_entry(scan, &io->ci_layers, cis_linkage) {
+		if (!scan->cis_iop->cio_extent_release)
+			continue;
+		scan->cis_iop->cio_extent_release(env, scan);
+	}
+}
+EXPORT_SYMBOL(cl_io_extent_release);
+
 /**
  * Submits a list of pages for immediate io.
  *
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index ce0f7ec..9ec2734 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -373,6 +373,18 @@ int osc_io_commit_async(const struct lu_env *env,
 }
 EXPORT_SYMBOL(osc_io_commit_async);
 
+void osc_io_extent_release(const struct lu_env *env,
+			   const struct cl_io_slice *ios)
+{
+	struct osc_io *oio = cl2osc_io(env, ios);
+
+	if (oio->oi_active) {
+		osc_extent_release(env, oio->oi_active);
+		oio->oi_active = NULL;
+	}
+}
+EXPORT_SYMBOL(osc_io_extent_release);
+
 static bool osc_import_not_healthy(struct obd_import *imp)
 {
 	return imp->imp_invalid || imp->imp_deactive ||
@@ -1218,7 +1230,8 @@ void osc_io_lseek_end(const struct lu_env *env,
 	},
 	.cio_read_ahead			= osc_io_read_ahead,
 	.cio_submit			= osc_io_submit,
-	.cio_commit_async		= osc_io_commit_async
+	.cio_commit_async		= osc_io_commit_async,
+	.cio_extent_release		= osc_io_extent_release
 };
 
 /*****************************************************************************
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 27/41] lustre: change EWOULDBLOCK to EAGAIN
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (25 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 26/41] lustre: lov: grant deadlock if same OSC in two components James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 28/41] lsutre: ldlm: return error from ldlm_namespace_new() James Simmons
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

On linux, EWOULDBLOCK has always been defined as an alias for
EAGAIN. In the interest of readability we should not use two names for
the same thing. So change the remaining uses of EWOULDBLOCK to EAGAIN.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14047
Lustre-commit: a7f48e6c15e28617 ("LU-14047 lustre: change EWOULDBLOCK to EAGAIN")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40307
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/fld/fld_request.c   | 2 +-
 fs/lustre/include/cl_object.h | 2 +-
 fs/lustre/llite/glimpse.c     | 2 +-
 fs/lustre/llite/vvp_page.c    | 2 +-
 fs/lustre/obdclass/cl_io.c    | 2 +-
 fs/lustre/osc/osc_io.c        | 2 +-
 fs/lustre/osc/osc_lock.c      | 2 +-
 fs/lustre/osc/osc_request.c   | 2 +-
 fs/lustre/ptlrpc/client.c     | 2 +-
 fs/lustre/ptlrpc/errno.c      | 4 ++--
 fs/lustre/ptlrpc/sec.c        | 2 +-
 11 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index 52c148a..2e8d0b8 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -364,7 +364,7 @@ int fld_client_rpc(struct obd_export *exp,
 
 	if (OBD_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ && req->rq_no_delay)) {
 		/* the same error returned by ptlrpc_import_delay_req */
-		rc = -EWOULDBLOCK;
+		rc = -EAGAIN;
 		req->rq_status = rc;
 	} else {
 		obd_get_request_slot(&exp->exp_obd->u.cli);
diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 2d08ddd..b36942a 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1637,7 +1637,7 @@ struct cl_io_operations {
 enum cl_enq_flags {
 	/**
 	 * instruct server to not block, if conflicting lock is found. Instead
-	 * -EWOULDBLOCK is returned immediately.
+	 * -EAGAIN is returned immediately.
 	 */
 	CEF_NONBLOCK		= 0x00000001,
 	/**
diff --git a/fs/lustre/llite/glimpse.c b/fs/lustre/llite/glimpse.c
index 3441904..3d23612 100644
--- a/fs/lustre/llite/glimpse.c
+++ b/fs/lustre/llite/glimpse.c
@@ -207,7 +207,7 @@ int __cl_glimpse_size(struct inode *inode, int agl)
 		} else if (result == 0) {
 			result = cl_glimpse_lock(env, io, inode, io->ci_obj,
 						 agl);
-			if (!agl && result == -EWOULDBLOCK)
+			if (!agl && result == -EAGAIN)
 				io->ci_need_restart = 1;
 		}
 
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 0403f00..b0a119e 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -272,7 +272,7 @@ static void vvp_page_completion_read(const struct lu_env *env,
 			cl_page_export(env, page, 1);
 	} else if (vpg->vpg_defer_uptodate) {
 		vpg->vpg_defer_uptodate = 0;
-		if (ioret == -EWOULDBLOCK) {
+		if (ioret == -EAGAIN) {
 			/* mirror read failed, it needs to destroy the page
 			 * because subpage would be from wrong osc when trying
 			 * to read from a new mirror
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index cc5a503..138ff27 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -749,7 +749,7 @@ int cl_io_loop(const struct lu_env *env, struct cl_io *io)
 	if (rc && !result)
 		result = rc;
 
-	if (result == -EWOULDBLOCK && io->ci_ndelay) {
+	if (result == -EAGAIN && io->ci_ndelay) {
 		io->ci_need_restart = 1;
 		result = 0;
 	}
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index 9ec2734..b965608 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -406,7 +406,7 @@ int osc_io_iter_init(const struct lu_env *env, const struct cl_io_slice *ios)
 	 */
 	if (ios->cis_io->ci_type == CIT_READ && ios->cis_io->ci_ndelay &&
 	    !ios->cis_io->ci_tried_all_mirrors && osc_import_not_healthy(imp)) {
-		rc = -EWOULDBLOCK;
+		rc = -EAGAIN;
 	} else if (likely(!imp->imp_invalid)) {
 		atomic_inc(&osc->oo_nr_ios);
 		oio->oi_is_active = 1;
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 536142f2..6ff3fb6 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -306,7 +306,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 		/* Hide the error. */
 		rc = 0;
 	} else if (rc < 0 && oscl->ols_flags & LDLM_FL_NDELAY) {
-		rc = -EWOULDBLOCK;
+		rc = -EAGAIN;
 	}
 
 	if (oscl->ols_owner)
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index be722c9..066ecdb 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2430,7 +2430,7 @@ static int brw_interpret(const struct lu_env *env,
 	list_for_each_entry_safe(ext, tmp, &aa->aa_exts, oe_link) {
 		list_del_init(&ext->oe_link);
 		osc_extent_finish(env, ext, 1,
-				  rc && req->rq_no_delay ? -EWOULDBLOCK : rc);
+				  rc && req->rq_no_delay ? -EAGAIN : rc);
 	}
 	LASSERT(list_empty(&aa->aa_exts));
 	LASSERT(list_empty(&aa->aa_oaps));
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index cec4da99..20c00ad 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1236,7 +1236,7 @@ static int ptlrpc_import_delay_req(struct obd_import *imp,
 		} else if (req->rq_no_delay &&
 			   imp->imp_generation != imp->imp_initiated_at) {
 			/* ignore nodelay for requests initiating connections */
-			*status = -EWOULDBLOCK;
+			*status = -EAGAIN;
 		} else if (req->rq_allow_replay &&
 			  (imp->imp_state == LUSTRE_IMP_REPLAY ||
 			   imp->imp_state == LUSTRE_IMP_REPLAY_LOCKS ||
diff --git a/fs/lustre/ptlrpc/errno.c b/fs/lustre/ptlrpc/errno.c
index 2975010..415967e 100644
--- a/fs/lustre/ptlrpc/errno.c
+++ b/fs/lustre/ptlrpc/errno.c
@@ -36,8 +36,8 @@
  * The two translation tables below must define a one-to-one mapping between
  * host and network errnos.
  *
- * EWOULDBLOCK is equal to EAGAIN on all architectures except for parisc, which
- * appears irrelevant.  Thus, existing references to EWOULDBLOCK are fine.
+ * EAGAIN is equal to EAGAIN on all architectures except for parisc, which
+ * appears irrelevant.  Thus, existing references to EAGAIN are fine.
  *
  * EDEADLOCK is equal to EDEADLK on x86 but not on sparc, at least.  A sparc
  * host has no context-free way to determine if a LUSTRE_EDEADLK represents an
diff --git a/fs/lustre/ptlrpc/sec.c b/fs/lustre/ptlrpc/sec.c
index 1822dbd..ea1dafe 100644
--- a/fs/lustre/ptlrpc/sec.c
+++ b/fs/lustre/ptlrpc/sec.c
@@ -720,7 +720,7 @@ int sptlrpc_req_refresh_ctx(struct ptlrpc_request *req, long timeout)
 	spin_unlock(&ctx->cc_lock);
 
 	if (timeout == 0)
-		return -EWOULDBLOCK;
+		return -EAGAIN;
 
 	/* Clear any flags that may be present from previous sends */
 	LASSERT(req->rq_receiving_reply == 0);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 28/41] lsutre: ldlm: return error from ldlm_namespace_new()
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (26 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 27/41] lustre: change EWOULDBLOCK to EAGAIN James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 29/41] lustre: llite: remove unused ll_teardown_mmaps() James Simmons
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Return the underlying error in ldlm_namespace_new() from
ldlm_namespace_sysfs_register() to the caller instead of NULL.
Otherwise, the callers convert the NULL to -ENOMEM and this
is incorrectly reported as an allocation error to the user.

  sysfs: cannot create duplicate filename
     '/fs/lustre/ldlm/namespaces/lustre-OST0002-osc-ffff89f33be70000'
  mount.lustre: mount mgs:/lfs at /lfs failed: Cannot allocate memory

Change ldlm_namespace_new() to return errors via PTR_ERR() and
change the callers to use IS_ERR().

Fix associated CERROR() messages to follow proper code style.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14178
Lutsre-commit: e9c3b89bdacdb90 ("LU-14178 ldlm: return error from ldlm_namespace_new()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40851
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lib.c      | 10 ++++++----
 fs/lustre/ldlm/ldlm_resource.c | 32 +++++++++++++++++++++-----------
 2 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 2965395..9499995 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -524,10 +524,11 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 						LDLM_NAMESPACE_CLIENT,
 						LDLM_NAMESPACE_GREEDY,
 						ns_type);
-	if (!obd->obd_namespace) {
-		CERROR("Unable to create client namespace - %s\n",
-		       obd->obd_name);
-		rc = -ENOMEM;
+	if (IS_ERR(obd->obd_namespace)) {
+		rc = PTR_ERR(obd->obd_namespace);
+		CERROR("%s: unable to create client namespace: rc = %d\n",
+		       obd->obd_name, rc);
+		obd->obd_namespace = NULL;
 		goto err_import;
 	}
 
@@ -540,6 +541,7 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg)
 err:
 	kfree(cli->cl_mod_tag_bitmap);
 	cli->cl_mod_tag_bitmap = NULL;
+
 	return rc;
 }
 EXPORT_SYMBOL(client_obd_setup);
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index dab837d..481f14e 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -630,19 +630,23 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 
 	rc = ldlm_get_ref();
 	if (rc) {
-		CERROR("ldlm_get_ref failed: %d\n", rc);
-		return NULL;
+		CERROR("%s: ldlm_get_ref failed: rc = %d\n", name, rc);
+		return ERR_PTR(rc);
 	}
 
 	if (ns_type >= ARRAY_SIZE(ldlm_ns_hash_defs) ||
 	    ldlm_ns_hash_defs[ns_type].nsd_bkt_bits == 0) {
-		CERROR("Unknown type %d for ns %s\n", ns_type, name);
+		rc = -EINVAL;
+		CERROR("%s: unknown namespace type %d: rc = %d\n",
+		       name, ns_type, rc);
 		goto out_ref;
 	}
 
 	ns = kzalloc(sizeof(*ns), GFP_NOFS);
-	if (!ns)
+	if (!ns) {
+		rc = -ENOMEM;
 		goto out_ref;
+	}
 
 	ns->ns_rs_hash = cfs_hash_create(name,
 					 ldlm_ns_hash_defs[ns_type].nsd_all_bits,
@@ -656,8 +660,10 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 					 CFS_HASH_BIGNAME |
 					 CFS_HASH_SPIN_BKTLOCK |
 					 CFS_HASH_NO_ITEMREF);
-	if (!ns->ns_rs_hash)
+	if (!ns->ns_rs_hash) {
+		rc = -ENOMEM;
 		goto out_ns;
+	}
 
 	ns->ns_bucket_bits = ldlm_ns_hash_defs[ns_type].nsd_all_bits -
 			     ldlm_ns_hash_defs[ns_type].nsd_bkt_bits;
@@ -665,8 +671,10 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	ns->ns_rs_buckets = kvzalloc((1 << ns->ns_bucket_bits) *
 				     sizeof(*ns->ns_rs_buckets),
 				     GFP_KERNEL);
-	if (!ns->ns_rs_buckets)
+	if (!ns->ns_rs_buckets) {
+		rc = -ENOMEM;
 		goto out_hash;
+	}
 
 	for (idx = 0; idx < (1 << ns->ns_bucket_bits); idx++) {
 		struct ldlm_ns_bucket *nsb = &ns->ns_rs_buckets[idx];
@@ -680,8 +688,10 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	ns->ns_appetite = apt;
 	ns->ns_client = client;
 	ns->ns_name = kstrdup(name, GFP_KERNEL);
-	if (!ns->ns_name)
+	if (!ns->ns_name) {
+		rc = -ENOMEM;
 		goto out_hash;
+	}
 
 	INIT_LIST_HEAD(&ns->ns_list_chain);
 	INIT_LIST_HEAD(&ns->ns_unused_list);
@@ -704,20 +714,20 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 
 	rc = ldlm_namespace_sysfs_register(ns);
 	if (rc != 0) {
-		CERROR("Can't initialize ns sysfs, rc %d\n", rc);
+		CERROR("%s: cannot initialize ns sysfs: rc = %d\n", name, rc);
 		goto out_hash;
 	}
 
 	rc = ldlm_namespace_debugfs_register(ns);
 	if (rc != 0) {
-		CERROR("Can't initialize ns proc, rc %d\n", rc);
+		CERROR("%s: cannot initialize ns proc: rc = %d\n", name, rc);
 		goto out_sysfs;
 	}
 
 	idx = ldlm_namespace_nr_read(client);
 	rc = ldlm_pool_init(&ns->ns_pool, ns, idx, client);
 	if (rc) {
-		CERROR("Can't initialize lock pool, rc %d\n", rc);
+		CERROR("%s: cannot initialize lock pool, rc = %d\n", name, rc);
 		goto out_proc;
 	}
 
@@ -736,7 +746,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name,
 	kfree(ns);
 out_ref:
 	ldlm_put_ref();
-	return NULL;
+	return ERR_PTR(rc);
 }
 EXPORT_SYMBOL(ldlm_namespace_new);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 29/41] lustre: llite: remove unused ll_teardown_mmaps()
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (27 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 28/41] lsutre: ldlm: return error from ldlm_namespace_new() James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:50 ` [lustre-devel] [PATCH 30/41] lustre: lov: style cleanups in lov_set_osc_active() James Simmons
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

The ll_teardown_mmaps() function is no longer used and can be
removed.

Fixes: d5419b40599b ("lustre: use generic_error_remove_page()")
WC-bug-id: https://jira.whamcloud.com/browse/LU-12477
Lustre-commit: 647c96562b27cef ("LU-12477 llite: remove unused ll_teardown_mmaps()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41086
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h |  1 -
 fs/lustre/llite/llite_mmap.c     | 17 -----------------
 2 files changed, 18 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 0fe0b562..3bd774b 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1265,7 +1265,6 @@ void ll_io_init(struct cl_io *io, const struct file *file, int write,
 
 /* llite/llite_mmap.c */
 
-int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last);
 int ll_file_mmap(struct file *file, struct vm_area_struct *vma);
 void policy_from_vma(union ldlm_policy_data *policy, struct vm_area_struct *vma,
 		     unsigned long addr, size_t count);
diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index b9a73e0..0963757 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -511,23 +511,6 @@ static void ll_vm_close(struct vm_area_struct *vma)
 	pcc_vm_close(vma);
 }
 
-/* XXX put nice comment here.  talk about __free_pte -> dirty pages and
- * nopage's reference passing to the pte
- */
-int ll_teardown_mmaps(struct address_space *mapping, u64 first, u64 last)
-{
-	int rc = -ENOENT;
-
-	LASSERTF(last > first, "last %llu first %llu\n", last, first);
-	if (mapping_mapped(mapping)) {
-		rc = 0;
-		unmap_mapping_range(mapping, first + PAGE_SIZE - 1,
-				    last - first + 1, 0);
-	}
-
-	return rc;
-}
-
 static const struct vm_operations_struct ll_file_vm_ops = {
 	.fault			= ll_fault,
 	.page_mkwrite		= ll_page_mkwrite,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 30/41] lustre: lov: style cleanups in lov_set_osc_active()
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (28 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 29/41] lustre: llite: remove unused ll_teardown_mmaps() James Simmons
@ 2021-04-05  0:50 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 31/41] lustre: change various operations structs to const James Simmons
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

1/ don't pre-declare the function as it is not used before it
   is defined.
2/ " TEST ? 1 : 0" is identical to "TEST", so remove the "? 1 : 0".
3/ When the 'then' part of an 'if' ends with a goto, there is
   not point having an 'else', particularly if there is also
   code in the block following the "if".  The 'else' code and
   the following code should be together.

4/ other minor changes, and conversion of spaces to tabs.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: cab152600ebe8a0 ("LU-6142 lov: style cleanups in lov_set_osc_active()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39382
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_obd.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index c8654bd..9554d85 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -114,8 +114,6 @@ void lov_tgts_putref(struct obd_device *obd)
 	}
 }
 
-static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
-			      enum obd_notify_event ev);
 static int lov_notify(struct obd_device *obd, struct obd_device *watched,
 		      enum obd_notify_event ev);
 
@@ -354,7 +352,8 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
 {
 	struct lov_obd *lov = &obd->u.lov;
 	struct lov_tgt_desc *tgt;
-	int index, activate, active;
+	int index;
+	bool activate, active;
 
 	CDEBUG(D_INFO, "Searching in lov %p for uuid %s event(%d)\n",
 	       lov, uuid->uuid, ev);
@@ -362,9 +361,7 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
 	lov_tgts_getref(obd);
 	for (index = 0; index < lov->desc.ld_tgt_count; index++) {
 		tgt = lov->lov_tgts[index];
-		if (!tgt)
-			continue;
-		if (obd_uuid_equals(uuid, &tgt->ltd_uuid))
+		if (tgt && obd_uuid_equals(uuid, &tgt->ltd_uuid))
 			break;
 	}
 
@@ -374,7 +371,7 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
 	}
 
 	if (ev == OBD_NOTIFY_DEACTIVATE || ev == OBD_NOTIFY_ACTIVATE) {
-		activate = (ev == OBD_NOTIFY_ACTIVATE) ? 1 : 0;
+		activate = (ev == OBD_NOTIFY_ACTIVATE);
 
 		/*
 		 * LU-642, initially inactive OSC could miss the obd_connect,
@@ -401,9 +398,8 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
 			CDEBUG(D_CONFIG, "%sactivate OSC %s\n",
 			       activate ? "" : "de", obd_uuid2str(uuid));
 		}
-
 	} else if (ev == OBD_NOTIFY_INACTIVE || ev == OBD_NOTIFY_ACTIVE) {
-		active = (ev == OBD_NOTIFY_ACTIVE) ? 1 : 0;
+		active = (ev == OBD_NOTIFY_ACTIVE);
 
 		if (lov->lov_tgts[index]->ltd_active == active) {
 			CDEBUG(D_INFO, "OSC %s already %sactive!\n",
@@ -422,7 +418,8 @@ static int lov_set_osc_active(struct obd_device *obd, struct obd_uuid *uuid,
 			lov->lov_tgts[index]->ltd_exp->exp_obd->obd_inactive = 1;
 		}
 	} else {
-		CERROR("Unknown event(%d) for uuid %s", ev, uuid->uuid);
+		CERROR("%s: unknown event %d for uuid %s\n", obd->obd_name,
+		       ev, uuid->uuid);
 	}
 
 	if (tgt->ltd_exp)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 31/41] lustre: change various operations structs to const
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (29 preceding siblings ...)
  2021-04-05  0:50 ` [lustre-devel] [PATCH 30/41] lustre: lov: style cleanups in lov_set_osc_active() James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 32/41] lustre: mark strings in char arrays as const James Simmons
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Nearly all of
  struct cl_io_operations
  struct cl_lock_operations
  struct super_operations
  struct llog_operations

are now const.  The one exception is changelog_orig_logops.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: c20b866ba374ea3 ("LU-6142 lustre: change various operations structs to const")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39400
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_log.h   | 10 +++++-----
 fs/lustre/include/lustre_net.h   |  2 +-
 fs/lustre/llite/llite_internal.h |  2 +-
 fs/lustre/llite/super25.c        |  2 +-
 fs/lustre/mdc/mdc_dev.c          |  2 +-
 fs/lustre/obdclass/llog.c        |  4 ++--
 fs/lustre/obdclass/llog_obd.c    |  2 +-
 fs/lustre/ptlrpc/llog_client.c   |  2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/lustre_log.h b/fs/lustre/include/lustre_log.h
index 6995414..24e6609 100644
--- a/fs/lustre/include/lustre_log.h
+++ b/fs/lustre/include/lustre_log.h
@@ -134,7 +134,7 @@ int llog_cat_process(const struct lu_env *env, struct llog_handle *cat_llh,
 /* llog_obd.c */
 int llog_setup(const struct lu_env *env, struct obd_device *obd,
 	       struct obd_llog_group *olg, int index,
-	       struct obd_device *disk_obd, struct llog_operations *op);
+	       struct obd_device *disk_obd, const struct llog_operations *op);
 int __llog_ctxt_put(const struct lu_env *env, struct llog_ctxt *ctxt);
 int llog_cleanup(const struct lu_env *env, struct llog_ctxt *);
 
@@ -225,7 +225,7 @@ struct llog_handle {
 	} u;
 	char			*lgh_name;
 	void			*private_data;
-	struct llog_operations	*lgh_logops;
+	const struct llog_operations	*lgh_logops;
 	refcount_t		 lgh_refcount;
 	bool			 lgh_destroyed;
 };
@@ -241,7 +241,7 @@ struct llog_ctxt {
 	struct obd_import	*loc_imp; /* to use in RPC's: can be backward
 					   * pointing import
 					   */
-	struct llog_operations  *loc_logops;
+	const struct llog_operations  *loc_logops;
 	struct llog_handle      *loc_handle;
 	struct mutex		 loc_mutex; /* protect loc_imp */
 	refcount_t		 loc_refcount;
@@ -257,7 +257,7 @@ struct llog_ctxt {
 #define LLOG_DEL_RECORD 0x0002
 
 static inline int llog_handle2ops(struct llog_handle *loghandle,
-				  struct llog_operations **lop)
+				  const struct llog_operations **lop)
 {
 	if (!loghandle || !loghandle->lgh_logops)
 		return -EINVAL;
@@ -351,7 +351,7 @@ static inline int llog_next_block(const struct lu_env *env,
 				  int next_idx, u64 *cur_offset, void *buf,
 				  int len)
 {
-	struct llog_operations *lop;
+	const struct llog_operations *lop;
 	int rc;
 
 	rc = llog_handle2ops(loghandle, &lop);
diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index f16c935..a9aa363 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -2409,7 +2409,7 @@ enum timeout_event {
 /** @} */
 
 /* ptlrpc/llog_client.c */
-extern struct llog_operations llog_client_ops;
+extern const struct llog_operations llog_client_ops;
 /** @} net */
 
 #endif
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 3bd774b..677106d 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1131,7 +1131,7 @@ int ll_revalidate_it_finish(struct ptlrpc_request *request,
 			    struct lookup_intent *it, struct inode *inode);
 
 /* llite/llite_lib.c */
-extern struct super_operations lustre_super_operations;
+extern const struct super_operations lustre_super_operations;
 
 void ll_common_put_super(struct super_block *sb);
 void ll_lli_init(struct ll_inode_info *lli);
diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c
index d02c8cf..e3194a5 100644
--- a/fs/lustre/llite/super25.c
+++ b/fs/lustre/llite/super25.c
@@ -83,7 +83,7 @@ static int ll_drop_inode(struct inode *inode)
 }
 
 /* exported operations */
-struct super_operations lustre_super_operations = {
+const struct super_operations lustre_super_operations = {
 	.alloc_inode		= ll_alloc_inode,
 	.destroy_inode		= ll_destroy_inode,
 	.drop_inode		= ll_drop_inode,
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 68088ef..39c1213 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -1284,7 +1284,7 @@ static void mdc_io_data_version_end(const struct lu_env *env,
 	}
 }
 
-static struct cl_io_operations mdc_io_ops = {
+static const struct cl_io_operations mdc_io_ops = {
 	.op = {
 		[CIT_READ] = {
 			.cio_iter_init	= osc_io_rw_iter_init,
diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c
index 7668d51..b431087 100644
--- a/fs/lustre/obdclass/llog.c
+++ b/fs/lustre/obdclass/llog.c
@@ -100,7 +100,7 @@ int llog_handle_put(const struct lu_env *env, struct llog_handle *loghandle)
 	int rc = 0;
 
 	if (refcount_dec_and_test(&loghandle->lgh_refcount)) {
-		struct llog_operations *lop;
+		const struct llog_operations *lop;
 
 		rc = llog_handle2ops(loghandle, &lop);
 		if (!rc) {
@@ -118,7 +118,7 @@ static int llog_read_header(const struct lu_env *env,
 			    struct llog_handle *handle,
 			    struct obd_uuid *uuid)
 {
-	struct llog_operations *lop;
+	const struct llog_operations *lop;
 	int rc;
 
 	rc = llog_handle2ops(handle, &lop);
diff --git a/fs/lustre/obdclass/llog_obd.c b/fs/lustre/obdclass/llog_obd.c
index f652eed..82f96ed 100644
--- a/fs/lustre/obdclass/llog_obd.c
+++ b/fs/lustre/obdclass/llog_obd.c
@@ -135,7 +135,7 @@ int llog_cleanup(const struct lu_env *env, struct llog_ctxt *ctxt)
 
 int llog_setup(const struct lu_env *env, struct obd_device *obd,
 	       struct obd_llog_group *olg, int index,
-	       struct obd_device *disk_obd, struct llog_operations *op)
+	       struct obd_device *disk_obd, const struct llog_operations *op)
 {
 	struct llog_ctxt *ctxt;
 	int rc = 0;
diff --git a/fs/lustre/ptlrpc/llog_client.c b/fs/lustre/ptlrpc/llog_client.c
index aeefa8f..79764cf 100644
--- a/fs/lustre/ptlrpc/llog_client.c
+++ b/fs/lustre/ptlrpc/llog_client.c
@@ -352,7 +352,7 @@ static int llog_client_close(const struct lu_env *env,
 	return 0;
 }
 
-struct llog_operations llog_client_ops = {
+const struct llog_operations llog_client_ops = {
 	.lop_next_block		= llog_client_next_block,
 	.lop_prev_block		= llog_client_prev_block,
 	.lop_read_header	= llog_client_read_header,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 32/41] lustre: mark strings in char arrays as const
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (30 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 31/41] lustre: change various operations structs to const James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 33/41] lustre: convert snprintf to scnprintf as appropriate James Simmons
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Most array of strings are marked 'const', but the strings within the
arrays often aren't.
This patch mark all strings in const arrays of strings as const.
This allows them to be placed in read-only memory.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 734d6eb11b572a9a ("LU-6142 lustre: mark strings in char arrays as const")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39742
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_cksum.h           | 2 +-
 fs/lustre/llite/lproc_llite.c           | 2 +-
 include/uapi/linux/lustre/lustre_user.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/include/obd_cksum.h b/fs/lustre/include/obd_cksum.h
index f7d316b..08c1cb9 100644
--- a/fs/lustre/include/obd_cksum.h
+++ b/fs/lustre/include/obd_cksum.h
@@ -127,7 +127,7 @@ enum cksum_types obd_cksum_type_select(const char *obd_name,
 /* Checksum algorithm names. Must be defined in the same order as the
  * OBD_CKSUM_* flags.
  */
-#define DECLARE_CKSUM_NAME const char *cksum_name[] = {"crc32", "adler", \
+#define DECLARE_CKSUM_NAME const char *const cksum_name[] = {"crc32", "adler", \
 	"crc32c", "reserved", "t10ip512", "t10ip4K", "t10crc512", "t10crc4K"}
 
 typedef u16 (obd_dif_csum_fn) (void *, unsigned int);
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 5d1e2f4..a2e61e1 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -995,7 +995,7 @@ static ssize_t default_easize_store(struct kobject *kobj,
 
 static int ll_sbi_flags_seq_show(struct seq_file *m, void *v)
 {
-	const char *str[] = LL_SBI_FLAGS;
+	const char *const str[] = LL_SBI_FLAGS;
 	struct super_block *sb = m->private;
 	int flags = ll_s2sbi(sb)->ll_flags;
 	int i = 0;
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 0f195a4..7d4a9e9 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1213,7 +1213,7 @@ enum changelog_rec_type {
 
 static inline const char *changelog_type2str(int type)
 {
-	static const char *changelog_str[] = {
+	static const char *const changelog_str[] = {
 		"MARK",  "CREAT", "MKDIR", "HLINK", "SLINK", "MKNOD", "UNLNK",
 		"RMDIR", "RENME", "RNMTO", "OPEN",  "CLOSE", "LYOUT", "TRUNC",
 		"SATTR", "XATTR", "HSM",   "MTIME", "CTIME", "ATIME", "",
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 33/41] lustre: convert snprintf to scnprintf as appropriate
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (31 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 32/41] lustre: mark strings in char arrays as const James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 34/41] lustre: remove non-static 'inline' markings James Simmons
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The return value of snprintf() is the number of bytes that would have
been copies into the buffer if it was large enough.
Many places in the code use it as though it were the number of bytes
actually copied.  In practice this (almost?) never makes a difference.
However it is poor style to use the wrong interfaces as it might one
day be copied to somewhere that it does make a difference.

So change these instances of snprintf to scnprintf which DOES return
the number of bytes actually copied.
This is all places where the return value is simply returned to the
call, and a couple of others.

Also change the declared buffer size in a couple of places to the
actual buffer size (PAGE_SIZE in these cases).

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: a03765b2da70fb92 ("LU-6142 lustre: convert snprintf to scnprintf as appropriate")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39744
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/lproc_llite.c  | 30 +++++++++++++++---------------
 fs/lustre/llite/pcc.c          | 38 +++++++++++++++++++-------------------
 fs/lustre/obdclass/obd_sysfs.c |  6 +++---
 3 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index a2e61e1..ec241a4 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -726,7 +726,7 @@ static ssize_t statahead_running_max_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, 16, "%u\n", sbi->ll_sa_running_max);
+	return scnprintf(buf, 16, "%u\n", sbi->ll_sa_running_max);
 }
 
 static ssize_t statahead_running_max_store(struct kobject *kobj,
@@ -882,7 +882,7 @@ static ssize_t statfs_max_age_show(struct kobject *kobj, struct attribute *attr,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_statfs_max_age);
+	return scnprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_statfs_max_age);
 }
 
 static ssize_t statfs_max_age_store(struct kobject *kobj,
@@ -920,8 +920,8 @@ static ssize_t max_easize_show(struct kobject *kobj,
 		return rc;
 
 	/* Limit xattr size returned to userspace based on kernel maximum */
-	return snprintf(buf, PAGE_SIZE, "%u\n",
-			ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen);
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen);
 }
 LUSTRE_RO_ATTR(max_easize);
 
@@ -951,8 +951,8 @@ static ssize_t default_easize_show(struct kobject *kobj,
 		return rc;
 
 	/* Limit xattr size returned to userspace based on kernel maximum */
-	return snprintf(buf, PAGE_SIZE, "%u\n",
-			ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen);
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 ealen > XATTR_SIZE_MAX ? XATTR_SIZE_MAX : ealen);
 }
 
 /**
@@ -1094,8 +1094,8 @@ static ssize_t max_read_ahead_async_active_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%u\n",
-			sbi->ll_ra_info.ra_async_max_active);
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 sbi->ll_ra_info.ra_async_max_active);
 }
 
 static ssize_t max_read_ahead_async_active_store(struct kobject *kobj,
@@ -1139,8 +1139,8 @@ static ssize_t read_ahead_async_file_threshold_mb_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%lu\n",
-	     PAGES_TO_MiB(sbi->ll_ra_info.ra_async_pages_per_file_threshold));
+	return scnprintf(buf, PAGE_SIZE, "%lu\n",
+			 PAGES_TO_MiB(sbi->ll_ra_info.ra_async_pages_per_file_threshold));
 }
 
 static ssize_t
@@ -1260,8 +1260,8 @@ static ssize_t file_heat_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%u\n",
-			!!(sbi->ll_flags & LL_SBI_FILE_HEAT));
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 !!(sbi->ll_flags & LL_SBI_FILE_HEAT));
 }
 
 static ssize_t file_heat_store(struct kobject *kobj,
@@ -1296,8 +1296,8 @@ static ssize_t heat_decay_percentage_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%u\n",
-		       (sbi->ll_heat_decay_weight * 100 + 128) / 256);
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 (sbi->ll_heat_decay_weight * 100 + 128) / 256);
 }
 
 static ssize_t heat_decay_percentage_store(struct kobject *kobj,
@@ -1330,7 +1330,7 @@ static ssize_t heat_period_second_show(struct kobject *kobj,
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 
-	return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_heat_period_second);
+	return scnprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_heat_period_second);
 }
 
 static ssize_t heat_period_second_store(struct kobject *kobj,
diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c
index 28ca9cb..297189c9 100644
--- a/fs/lustre/llite/pcc.c
+++ b/fs/lustre/llite/pcc.c
@@ -1088,15 +1088,15 @@ void pcc_inode_free(struct inode *inode)
 #define MAX_PCC_DATABASE_PATH (6 * 5 + FID_NOBRACE_LEN + 1)
 static int pcc_fid2dataset_path(char *buf, int sz, struct lu_fid *fid)
 {
-	return snprintf(buf, sz, "%04x/%04x/%04x/%04x/%04x/%04x/"
-			DFID_NOBRACE,
-			(fid)->f_oid       & 0xFFFF,
-			(fid)->f_oid >> 16 & 0xFFFF,
-			(unsigned int)((fid)->f_seq       & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 16 & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 32 & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 48 & 0xFFFF),
-			PFID(fid));
+	return scnprintf(buf, sz, "%04x/%04x/%04x/%04x/%04x/%04x/"
+			 DFID_NOBRACE,
+			 (fid)->f_oid       & 0xFFFF,
+			 (fid)->f_oid >> 16 & 0xFFFF,
+			 (unsigned int)((fid)->f_seq       & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 16 & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 32 & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 48 & 0xFFFF),
+			 PFID(fid));
 }
 
 static inline const struct cred *pcc_super_cred(struct super_block *sb)
@@ -1163,16 +1163,16 @@ static int pcc_get_layout_info(struct inode *inode, struct cl_layout *clt)
 static int pcc_fid2dataset_fullpath(char *buf, int sz, struct lu_fid *fid,
 				    struct pcc_dataset *dataset)
 {
-	return snprintf(buf, sz, "%s/%04x/%04x/%04x/%04x/%04x/%04x/"
-			DFID_NOBRACE,
-			dataset->pccd_pathname,
-			(fid)->f_oid       & 0xFFFF,
-			(fid)->f_oid >> 16 & 0xFFFF,
-			(unsigned int)((fid)->f_seq       & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 16 & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 32 & 0xFFFF),
-			(unsigned int)((fid)->f_seq >> 48 & 0xFFFF),
-			PFID(fid));
+	return scnprintf(buf, sz, "%s/%04x/%04x/%04x/%04x/%04x/%04x/"
+			 DFID_NOBRACE,
+			 dataset->pccd_pathname,
+			 (fid)->f_oid       & 0xFFFF,
+			 (fid)->f_oid >> 16 & 0xFFFF,
+			 (unsigned int)((fid)->f_seq       & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 16 & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 32 & 0xFFFF),
+			 (unsigned int)((fid)->f_seq >> 48 & 0xFFFF),
+			 PFID(fid));
 }
 
 /* Must be called with pcci->pcci_lock held */
diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c
index 5fc638f..e6fb1b9 100644
--- a/fs/lustre/obdclass/obd_sysfs.c
+++ b/fs/lustre/obdclass/obd_sysfs.c
@@ -218,7 +218,7 @@ static ssize_t pinger_show(struct kobject *kobj, struct attribute *attr,
 static ssize_t jobid_var_show(struct kobject *kobj, struct attribute *attr,
 			      char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_var);
+	return scnprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_var);
 }
 
 static ssize_t jobid_var_store(struct kobject *kobj, struct attribute *attr,
@@ -252,7 +252,7 @@ static ssize_t jobid_var_store(struct kobject *kobj, struct attribute *attr,
 static ssize_t jobid_name_show(struct kobject *kobj, struct attribute *attr,
 			       char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_name);
+	return scnprintf(buf, PAGE_SIZE, "%s\n", obd_jobid_name);
 }
 
 static ssize_t jobid_name_store(struct kobject *kobj, struct attribute *attr,
@@ -283,7 +283,7 @@ static ssize_t jobid_this_session_show(struct kobject *kobj,
 	rcu_read_lock();
 	jid = jobid_current();
 	if (jid)
-		ret = snprintf(buf, PAGE_SIZE, "%s\n", jid);
+		ret = scnprintf(buf, PAGE_SIZE, "%s\n", jid);
 	rcu_read_unlock();
 	return ret;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 34/41] lustre: remove non-static 'inline' markings.
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (32 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 33/41] lustre: convert snprintf to scnprintf as appropriate James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 35/41] lustre: llite: use is_root_inode() James Simmons
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

There is rarely any point in marking a non-static function as
'inline'.  The result is to compile a state-alone function that other
files can refer to, and also to inline the code where it is used in
the same file.

In many cases the non-static inline functions are not used in the same
file, so the 'inline' marking has no effect.  In other cases it may
have an effect, but it can only be needed in highly performance
critical situations where a function call must be avoided, and that
doesn't seem like in any of these cases.

So just remove the "inline".

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: f0736a6a52ed9581 ("LU-6142 lustre: remove non-static 'inline' markings.")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40289
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index d37f0a9..3d34b3b 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -114,7 +114,7 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 
 #define llcrypto_free_ctx	kfree
 
-inline bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi)
+bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi)
 {
 	return unlikely(sbi->ll_flags & LL_SBI_TEST_DUMMY_ENCRYPTION);
 }
@@ -126,12 +126,12 @@ static bool ll_dummy_context(struct inode *inode)
 	return sbi ? ll_sbi_has_test_dummy_encryption(sbi) : false;
 }
 
-inline bool ll_sbi_has_encrypt(struct ll_sb_info *sbi)
+bool ll_sbi_has_encrypt(struct ll_sb_info *sbi)
 {
 	return sbi->ll_flags & LL_SBI_ENCRYPT;
 }
 
-inline void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set)
+void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set)
 {
 	if (set)
 		sbi->ll_flags |= LL_SBI_ENCRYPT;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 35/41] lustre: llite: use is_root_inode()
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (33 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 34/41] lustre: remove non-static 'inline' markings James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 36/41] lnet: libcfs: discard cfs_firststr James Simmons
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Lustre has multiple tests to see if a given inode is the root of the
filesystem.  Linux has (since 3.19) a helper function is_root_inode().
Use that throughout.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: fca56be02b8fe074 ("LU-6142 lustre: use is_root_inode()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c | 2 +-
 fs/lustre/llite/file.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 3d34b3b..0598b3c 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -100,7 +100,7 @@ static int ll_set_context(struct inode *inode, const void *ctx, size_t len,
 	}
 
 	/* Encrypting the root directory is not allowed */
-	if (inode->i_ino == inode->i_sb->s_root->d_inode->i_ino)
+	if (is_root_inode(inode))
 		return -EPERM;
 
 	dentry = (struct dentry *)fs_data;
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 60b6ac4..c6d53b1 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -421,7 +421,7 @@ int ll_file_release(struct inode *inode, struct file *file)
 		libcfs_debug_dumplog();
 
 out:
-	if (!rc && inode->i_sb->s_root != file_dentry(file))
+	if (!rc && !is_root_inode(inode))
 		ll_stats_ops_tally(sbi, LPROC_LL_RELEASE,
 				   ktime_us_delta(ktime_get(), kstart));
 	return rc;
@@ -4455,7 +4455,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 	 * by checking the migrate FID against the FID of the
 	 * filesystem root.
 	 */
-	if (child_inode == parent->i_sb->s_root->d_inode) {
+	if (is_root_inode(child_inode)) {
 		rc = -EINVAL;
 		goto out_iput;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 36/41] lnet: libcfs: discard cfs_firststr
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (34 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 35/41] lustre: llite: use is_root_inode() James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 37/41] lnet: place wire protocol data int own headers James Simmons
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The effect of cfs_firststr() can easily achieved with
skip_space() and strsep().

So use that instead.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: ee5eb07d2f41ac60 ("LU-6142 libcfs: discard cfs_firststr")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/40860
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/libcfs_string.c | 28 ----------------------------
 1 file changed, 28 deletions(-)

diff --git a/net/lnet/libcfs/libcfs_string.c b/net/lnet/libcfs/libcfs_string.c
index b042de5..66a108c 100644
--- a/net/lnet/libcfs/libcfs_string.c
+++ b/net/lnet/libcfs/libcfs_string.c
@@ -115,34 +115,6 @@ int cfs_str2mask(const char *str, const char *(*bit2str)(int bit),
 	return 0;
 }
 
-/* get the first string out of @str */
-char *cfs_firststr(char *str, size_t size)
-{
-	size_t i = 0;
-	char *end;
-
-	/* trim leading spaces */
-	while (i < size && *str && isspace(*str)) {
-		++i;
-		++str;
-	}
-
-	/* string with all spaces */
-	if (*str == '\0')
-		goto out;
-
-	end = str;
-	while (i < size && *end != '\0' && !isspace(*end)) {
-		++i;
-		++end;
-	}
-
-	*end = '\0';
-out:
-	return str;
-}
-EXPORT_SYMBOL(cfs_firststr);
-
 /**
  * Extracts tokens from strings.
  *
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 37/41] lnet: place wire protocol data int own headers
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (35 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 36/41] lnet: libcfs: discard cfs_firststr James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 38/41] lnet: libcfs: use wait_event_timeout() in tracefiled() James Simmons
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

This macro adds nothing of value, and make the code harder to
read for new readers so it was remove for the Linux client.
We still want to keep track of what data structures are
transmitted over the wire and ensure the protocol does not get
broken. Move the wire protocol structures to their own header
files and add wire checking.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: 6ae187404a8d7f8a ("LU-12678 lnet: discard WIRE_ATTR")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/37914
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 include/uapi/linux/lnet/libcfs_debug.h |   1 +
 include/uapi/linux/lnet/libcfs_ioctl.h |   1 +
 include/uapi/linux/lnet/lnet-dlc.h     |   1 +
 include/uapi/linux/lnet/lnet-idl.h     | 241 +++++++++++++++++++++++++++++++++
 include/uapi/linux/lnet/lnet-types.h   | 203 +--------------------------
 include/uapi/linux/lnet/lnetctl.h      |   1 +
 include/uapi/linux/lnet/lnetst.h       |   2 +
 include/uapi/linux/lnet/nidstr.h       |   1 +
 include/uapi/linux/lnet/socklnd.h      |   1 +
 net/lnet/klnds/o2iblnd/o2iblnd-idl.h   | 157 +++++++++++++++++++++
 net/lnet/klnds/o2iblnd/o2iblnd.c       | 166 ++++++++++++++++++++++-
 net/lnet/klnds/o2iblnd/o2iblnd.h       | 114 +---------------
 net/lnet/lnet/api-ni.c                 |  49 ++++++-
 13 files changed, 616 insertions(+), 322 deletions(-)
 create mode 100644 include/uapi/linux/lnet/lnet-idl.h
 create mode 100644 net/lnet/klnds/o2iblnd/o2iblnd-idl.h

diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h
index 6255331..b720e06 100644
--- a/include/uapi/linux/lnet/libcfs_debug.h
+++ b/include/uapi/linux/lnet/libcfs_debug.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index d0b29c52..5a46a48 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index ca1f8ae..e775dfe 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: LGPL-2.0+ WITH Linux-syscall-note */
 /*
  * LGPL HEADER START
  *
diff --git a/include/uapi/linux/lnet/lnet-idl.h b/include/uapi/linux/lnet/lnet-idl.h
new file mode 100644
index 0000000..a5b1414
--- /dev/null
+++ b/include/uapi/linux/lnet/lnet-idl.h
@@ -0,0 +1,241 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2012, 2017, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ */
+
+#ifndef __UAPI_LNET_IDL_H__
+#define __UAPI_LNET_IDL_H__
+
+#include <linux/types.h>
+
+/************************************************************************
+ * Core LNet wire message format.
+ * These are sent in sender's byte order (i.e. receiver flips).
+ */
+
+/**
+ * Address of an end-point in an LNet network.
+ *
+ * A node can have multiple end-points and hence multiple addresses.
+ * An LNet network can be a simple network (e.g. tcp0) or a network of
+ * LNet networks connected by LNet routers. Therefore an end-point address
+ * has two parts: network ID, and address within a network.
+ *
+ * \see LNET_NIDNET, LNET_NIDADDR, and LNET_MKNID.
+ */
+typedef __u64 lnet_nid_t;
+
+/**
+ * ID of a process in a node. Shortened as PID to distinguish from
+ * lnet_process_id, the global process ID.
+ */
+typedef __u32 lnet_pid_t;
+
+/* Packed version of struct lnet_process_id to transfer via network */
+struct lnet_process_id_packed {
+	lnet_nid_t nid;
+	lnet_pid_t pid;	/* node id / process id */
+} __attribute__((packed));
+
+/* The wire handle's interface cookie only matches one network interface in
+ * one epoch (i.e. new cookie when the interface restarts or the node
+ * reboots).  The object cookie only matches one object on that interface
+ * during that object's lifetime (i.e. no cookie re-use).
+ */
+struct lnet_handle_wire {
+	__u64 wh_interface_cookie;
+	__u64 wh_object_cookie;
+} __attribute__((packed));
+
+enum lnet_msg_type {
+	LNET_MSG_ACK = 0,
+	LNET_MSG_PUT,
+	LNET_MSG_GET,
+	LNET_MSG_REPLY,
+	LNET_MSG_HELLO,
+};
+
+/* The variant fields of the portals message header are aligned on an 8
+ * byte boundary in the message header.  Note that all types used in these
+ * wire structs MUST be fixed size and the smaller types are placed at the
+ * end.
+ */
+struct lnet_ack {
+	struct lnet_handle_wire	dst_wmd;
+	__u64			match_bits;
+	__u32			mlength;
+} __attribute__((packed));
+
+struct lnet_put {
+	struct lnet_handle_wire	ack_wmd;
+	__u64			match_bits;
+	__u64			hdr_data;
+	__u32			ptl_index;
+	__u32			offset;
+} __attribute__((packed));
+
+struct lnet_get {
+	struct lnet_handle_wire	return_wmd;
+	__u64			match_bits;
+	__u32			ptl_index;
+	__u32			src_offset;
+	__u32			sink_length;
+} __attribute__((packed));
+
+struct lnet_reply {
+	struct lnet_handle_wire	dst_wmd;
+} __attribute__((packed));
+
+struct lnet_hello {
+	__u64			incarnation;
+	__u32			type;
+} __attribute__((packed));
+
+struct lnet_hdr {
+	lnet_nid_t	dest_nid;
+	lnet_nid_t	src_nid;
+	lnet_pid_t	dest_pid;
+	lnet_pid_t	src_pid;
+	__u32		type;		/* enum lnet_msg_type */
+	__u32		payload_length;	/* payload data to follow */
+	/*<------__u64 aligned------->*/
+	union {
+		struct lnet_ack		ack;
+		struct lnet_put		put;
+		struct lnet_get		get;
+		struct lnet_reply	reply;
+		struct lnet_hello	hello;
+	} msg;
+} __attribute__((packed));
+
+/* A HELLO message contains a magic number and protocol version
+ * code in the header's dest_nid, the peer's NID in the src_nid, and
+ * LNET_MSG_HELLO in the type field.  All other common fields are zero
+ * (including payload_size; i.e. no payload).
+ * This is for use by byte-stream LNDs (e.g. TCP/IP) to check the peer is
+ * running the same protocol and to find out its NID. These LNDs should
+ * exchange HELLO messages when a connection is first established.  Individual
+ * LNDs can put whatever else they fancy in lnet_hdr::msg.
+ */
+struct lnet_magicversion {
+	__u32	magic;		/* LNET_PROTO_TCP_MAGIC */
+	__u16	version_major;	/* increment on incompatible change */
+	__u16	version_minor;	/* increment on compatible change */
+} __attribute__((packed));
+
+/* PROTO MAGIC for LNDs */
+#define LNET_PROTO_IB_MAGIC		0x0be91b91
+#define LNET_PROTO_GNI_MAGIC		0xb00fbabe /* ask Kim */
+#define LNET_PROTO_TCP_MAGIC		0xeebc0ded
+#define LNET_PROTO_ACCEPTOR_MAGIC	0xacce7100
+#define LNET_PROTO_PING_MAGIC		0x70696E67 /* 'ping' */
+
+/* Placeholder for a future "unified" protocol across all LNDs */
+/* Current LNDs that receive a request with this magic will respond
+ * with a "stub" reply using their current protocol */
+#define LNET_PROTO_MAGIC		0x45726963 /* ! */
+
+#define LNET_PROTO_TCP_VERSION_MAJOR	1
+#define LNET_PROTO_TCP_VERSION_MINOR	0
+
+/* Acceptor connection request */
+struct lnet_acceptor_connreq {
+	__u32	acr_magic;	/* PTL_ACCEPTOR_PROTO_MAGIC */
+	__u32	acr_version;	/* protocol version */
+	__u64	acr_nid;	/* target NID */
+} __attribute__((packed));
+
+#define LNET_PROTO_ACCEPTOR_VERSION	1
+
+struct lnet_counters_common {
+	__u32	lcc_msgs_alloc;
+	__u32	lcc_msgs_max;
+	__u32	lcc_errors;
+	__u32	lcc_send_count;
+	__u32	lcc_recv_count;
+	__u32	lcc_route_count;
+	__u32	lcc_drop_count;
+	__u64	lcc_send_length;
+	__u64	lcc_recv_length;
+	__u64	lcc_route_length;
+	__u64	lcc_drop_length;
+} __attribute__((packed));
+
+
+#define LNET_NI_STATUS_UP	0x15aac0de
+#define LNET_NI_STATUS_DOWN	0xdeadface
+#define LNET_NI_STATUS_INVALID	0x00000000
+
+struct lnet_ni_status {
+	lnet_nid_t ns_nid;
+	__u32      ns_status;
+	__u32      ns_unused;
+} __attribute__((packed));
+
+/*
+ * NB: value of these features equal to LNET_PROTO_PING_VERSION_x
+ * of old LNet, so there shouldn't be any compatibility issue
+ */
+#define LNET_PING_FEAT_INVAL		(0)		/* no feature */
+#define LNET_PING_FEAT_BASE		(1 << 0)	/* just a ping */
+#define LNET_PING_FEAT_NI_STATUS	(1 << 1)	/* return NI status */
+#define LNET_PING_FEAT_RTE_DISABLED	(1 << 2)        /* Routing enabled */
+#define LNET_PING_FEAT_MULTI_RAIL	(1 << 3)        /* Multi-Rail aware */
+#define LNET_PING_FEAT_DISCOVERY	(1 << 4)	/* Supports Discovery */
+
+/*
+ * All ping feature bits fit to hit the wire.
+ * In lnet_assert_wire_constants() this is compared against its open-coded
+ * value, and in lnet_ping_target_update() it is used to verify that no
+ * unknown bits have been set.
+ * New feature bits can be added, just be aware that this does change the
+ * over-the-wire protocol.
+ */
+#define LNET_PING_FEAT_BITS		(LNET_PING_FEAT_BASE | \
+					 LNET_PING_FEAT_NI_STATUS | \
+					 LNET_PING_FEAT_RTE_DISABLED | \
+					 LNET_PING_FEAT_MULTI_RAIL | \
+					 LNET_PING_FEAT_DISCOVERY)
+
+struct lnet_ping_info {
+	__u32			pi_magic;
+	__u32			pi_features;
+	lnet_pid_t		pi_pid;
+	__u32			pi_nnis;
+	struct lnet_ni_status	pi_ni[0];
+} __attribute__((packed));
+
+#define LNET_PING_INFO_SIZE(NNIDS) \
+	offsetof(struct lnet_ping_info, pi_ni[NNIDS])
+#define LNET_PING_INFO_LONI(PINFO)      ((PINFO)->pi_ni[0].ns_nid)
+#define LNET_PING_INFO_SEQNO(PINFO)     ((PINFO)->pi_ni[0].ns_status)
+
+#endif
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 5bf9917..4d0b6f9 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
@@ -34,6 +35,7 @@
 #define __LNET_TYPES_H__
 
 #include <linux/types.h>
+#include <linux/lnet/lnet-idl.h>
 
 /** \addtogroup lnet
  * @{
@@ -50,23 +52,6 @@
  */
 #define LNET_RESERVED_PORTAL	0
 
-/**
- * Address of an end-point in an LNet network.
- *
- * A node can have multiple end-points and hence multiple addresses.
- * An LNet network can be a simple network (e.g. tcp0) or a network of
- * LNet networks connected by LNet routers. Therefore an end-point address
- * has two parts: network ID, and address within a network.
- *
- * \see LNET_NIDNET, LNET_NIDADDR, and LNET_MKNID.
- */
-typedef __u64 lnet_nid_t;
-/**
- * ID of a process in a node. Shortened as PID to distinguish from
- * lnet_process_id, the global process ID.
- */
-typedef __u32 lnet_pid_t;
-
 /** wildcard NID that matches any end-point address */
 #define LNET_NID_ANY	((lnet_nid_t)(-1))
 /** wildcard PID that matches any lnet_pid_t */
@@ -114,186 +99,6 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
 
 #define LNET_NET_ANY LNET_NIDNET(LNET_NID_ANY)
 
-/* Packed version of lnet_process_id to transfer via network */
-struct lnet_process_id_packed {
-	/* node id / process id */
-	lnet_nid_t	nid;
-	lnet_pid_t	pid;
-} __packed;
-
-/*
- * The wire handle's interface cookie only matches one network interface in
- * one epoch (i.e. new cookie when the interface restarts or the node
- * reboots).  The object cookie only matches one object on that interface
- * during that object's lifetime (i.e. no cookie re-use).
- */
-struct lnet_handle_wire {
-	__u64	wh_interface_cookie;
-	__u64	wh_object_cookie;
-} __packed;
-
-enum lnet_msg_type {
-	LNET_MSG_ACK = 0,
-	LNET_MSG_PUT,
-	LNET_MSG_GET,
-	LNET_MSG_REPLY,
-	LNET_MSG_HELLO,
-};
-
-/*
- * The variant fields of the portals message header are aligned on an 8
- * byte boundary in the message header.  Note that all types used in these
- * wire structs MUST be fixed size and the smaller types are placed at the
- * end.
- */
-struct lnet_ack {
-	struct lnet_handle_wire	dst_wmd;
-	__u64			match_bits;
-	__u32			mlength;
-} __packed;
-
-struct lnet_put {
-	struct lnet_handle_wire	ack_wmd;
-	__u64			match_bits;
-	__u64			hdr_data;
-	__u32			ptl_index;
-	__u32			offset;
-} __packed;
-
-struct lnet_get {
-	struct lnet_handle_wire	return_wmd;
-	__u64			match_bits;
-	__u32			ptl_index;
-	__u32			src_offset;
-	__u32			sink_length;
-} __packed;
-
-struct lnet_reply {
-	struct lnet_handle_wire	dst_wmd;
-} __packed;
-
-struct lnet_hello {
-	__u64			incarnation;
-	__u32			type;
-} __packed;
-
-struct lnet_hdr {
-	lnet_nid_t	dest_nid;
-	lnet_nid_t	src_nid;
-	lnet_pid_t	dest_pid;
-	lnet_pid_t	src_pid;
-	__u32		type;		/* enum lnet_msg_type */
-	__u32		payload_length;	/* payload data to follow */
-	/*<------__u64 aligned------->*/
-	union {
-		struct lnet_ack		ack;
-		struct lnet_put		put;
-		struct lnet_get		get;
-		struct lnet_reply	reply;
-		struct lnet_hello	hello;
-	} msg;
-} __packed;
-
-/*
- * NB: value of these features equal to LNET_PROTO_PING_VERSION_x
- * of old LNet, so there shouldn't be any compatibility issue
- */
-#define LNET_PING_FEAT_INVAL		(0)		/* no feature */
-#define LNET_PING_FEAT_BASE		(1 << 0)	/* just a ping */
-#define LNET_PING_FEAT_NI_STATUS	(1 << 1)	/* return NI status */
-#define LNET_PING_FEAT_RTE_DISABLED	(1 << 2)	/* Routing enabled */
-#define LNET_PING_FEAT_MULTI_RAIL	(1 << 3)	/* Multi-Rail aware */
-#define LNET_PING_FEAT_DISCOVERY	(1 << 4)	/* Supports Discovery */
-
-/*
- * All ping feature bits fit to hit the wire.
- * In lnet_assert_wire_constants() this is compared against its open-coded
- * value, and in lnet_ping_target_update() it is used to verify that no
- * unknown bits have been set.
- * New feature bits can be added, just be aware that this does change the
- * over-the-wire protocol.
- */
-#define LNET_PING_FEAT_BITS		(LNET_PING_FEAT_BASE | \
-					 LNET_PING_FEAT_NI_STATUS | \
-					 LNET_PING_FEAT_RTE_DISABLED | \
-					 LNET_PING_FEAT_MULTI_RAIL | \
-					 LNET_PING_FEAT_DISCOVERY)
-
-/*
- * A HELLO message contains a magic number and protocol version
- * code in the header's dest_nid, the peer's NID in the src_nid, and
- * LNET_MSG_HELLO in the type field.  All other common fields are zero
- * (including payload_size; i.e. no payload).
- * This is for use by byte-stream LNDs (e.g. TCP/IP) to check the peer is
- * running the same protocol and to find out its NID. These LNDs should
- * exchange HELLO messages when a connection is first established.  Individual
- * LNDs can put whatever else they fancy in struct lnet_hdr::msg.
- */
-struct lnet_magicversion {
-	__u32	magic;		/* LNET_PROTO_TCP_MAGIC */
-	__u16	version_major;	/* increment on incompatible change */
-	__u16	version_minor;	/* increment on compatible change */
-} __packed;
-
-/* PROTO MAGIC for LNDs */
-#define LNET_PROTO_IB_MAGIC		0x0be91b91
-#define LNET_PROTO_GNI_MAGIC		0xb00fbabe /* ask Kim */
-#define LNET_PROTO_TCP_MAGIC		0xeebc0ded
-#define LNET_PROTO_ACCEPTOR_MAGIC	0xacce7100
-#define LNET_PROTO_PING_MAGIC		0x70696E67 /* 'ping' */
-
-/* Placeholder for a future "unified" protocol across all LNDs */
-/*
- * Current LNDs that receive a request with this magic will respond with a
- * "stub" reply using their current protocol
- */
-#define LNET_PROTO_MAGIC		0x45726963 /* ! */
-
-#define LNET_PROTO_TCP_VERSION_MAJOR	1
-#define LNET_PROTO_TCP_VERSION_MINOR	0
-
-/* Acceptor connection request */
-struct lnet_acceptor_connreq {
-	__u32	acr_magic;		/* PTL_ACCEPTOR_PROTO_MAGIC */
-	__u32	acr_version;		/* protocol version */
-	__u64	acr_nid;		/* target NID */
-} __packed;
-
-#define LNET_PROTO_ACCEPTOR_VERSION	1
-
-struct lnet_ni_status {
-	lnet_nid_t	ns_nid;
-	__u32		ns_status;
-	__u32		ns_unused;
-} __packed;
-
-struct lnet_ping_info {
-	__u32			pi_magic;
-	__u32			pi_features;
-	lnet_pid_t		pi_pid;
-	__u32			pi_nnis;
-	struct lnet_ni_status	pi_ni[0];
-} __packed;
-
-#define LNET_PING_INFO_SIZE(NNIDS) \
-	offsetof(struct lnet_ping_info, pi_ni[NNIDS])
-#define LNET_PING_INFO_LONI(PINFO)	((PINFO)->pi_ni[0].ns_nid)
-#define LNET_PING_INFO_SEQNO(PINFO)	((PINFO)->pi_ni[0].ns_status)
-
-struct lnet_counters_common {
-	__u32	lcc_msgs_alloc;
-	__u32	lcc_msgs_max;
-	__u32	lcc_errors;
-	__u32	lcc_send_count;
-	__u32	lcc_recv_count;
-	__u32	lcc_route_count;
-	__u32	lcc_drop_count;
-	__u64	lcc_send_length;
-	__u64	lcc_recv_length;
-	__u64	lcc_route_length;
-	__u64	lcc_drop_length;
-} __packed;
-
 struct lnet_counters_health {
 	__u32	lch_rst_alloc;
 	__u32	lch_resend_count;
@@ -315,10 +120,6 @@ struct lnet_counters {
 	struct lnet_counters_health lct_health;
 };
 
-#define LNET_NI_STATUS_UP	0x15aac0de
-#define LNET_NI_STATUS_DOWN	0xdeadface
-#define LNET_NI_STATUS_INVALID	0x00000000
-
 #define LNET_INTERFACES_NUM		16
 
 /* The minimum number of interfaces per node supported by LNet. */
diff --git a/include/uapi/linux/lnet/lnetctl.h b/include/uapi/linux/lnet/lnetctl.h
index 3c280f1..3b66ce3 100644
--- a/include/uapi/linux/lnet/lnetctl.h
+++ b/include/uapi/linux/lnet/lnetctl.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *   This file is part of Portals, http://www.sf.net/projects/lustre/
  *
diff --git a/include/uapi/linux/lnet/lnetst.h b/include/uapi/linux/lnet/lnetst.h
index dd38a90..af0435f1 100644
--- a/include/uapi/linux/lnet/lnetst.h
+++ b/include/uapi/linux/lnet/lnetst.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
@@ -556,6 +557,7 @@ struct lst_test_ping_param {
 	int	png_flags;	/* reserved flags */
 };
 
+/* Both struct srpc_counters and struct sfw_counters are sent over the wire */
 struct srpc_counters {
 	__u32 errors;
 	__u32 rpcs_sent;
diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h
index 021ee0e..d402a6a 100644
--- a/include/uapi/linux/lnet/nidstr.h
+++ b/include/uapi/linux/lnet/nidstr.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
diff --git a/include/uapi/linux/lnet/socklnd.h b/include/uapi/linux/lnet/socklnd.h
index 6453e05..50f2a13 100644
--- a/include/uapi/linux/lnet/socklnd.h
+++ b/include/uapi/linux/lnet/socklnd.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * GPL HEADER START
  *
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd-idl.h b/net/lnet/klnds/o2iblnd/o2iblnd-idl.h
new file mode 100644
index 0000000..660440c
--- /dev/null
+++ b/net/lnet/klnds/o2iblnd/o2iblnd-idl.h
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2017, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * lnet/klnds/o2iblnd/o2iblnd-idl.h
+ *
+ * Author: Eric Barton <eric@bartonsoftware.com>
+ */
+#ifndef __LNET_O2IBLND_IDL_H__
+#define __LNET_O2IBLND_IDL_H__
+
+#include <uapi/linux/lnet/lnet-idl.h>
+
+/************************************************************************
+ * IB Wire message format.
+ * These are sent in sender's byte order (i.e. receiver flips).
+ */
+
+struct kib_connparams {
+	u16			ibcp_queue_depth;
+	u16			ibcp_max_frags;
+	u32			ibcp_max_msg_size;
+} __packed;
+
+struct kib_immediate_msg {
+	struct lnet_hdr		ibim_hdr;	/* portals header */
+	char			ibim_payload[0];/* piggy-backed payload */
+} __packed;
+
+struct kib_rdma_frag {
+	u32			rf_nob;		/* # bytes this frag */
+	u64			rf_addr;	/* CAVEAT EMPTOR: misaligned!! */
+} __packed;
+
+struct kib_rdma_desc {
+	u32			rd_key;		/* local/remote key */
+	u32			rd_nfrags;	/* # fragments */
+	struct kib_rdma_frag	rd_frags[0];	/* buffer frags */
+} __packed;
+
+struct kib_putreq_msg {
+	struct lnet_hdr		ibprm_hdr;	/* portals header */
+	u64			ibprm_cookie;	/* opaque completion cookie */
+} __packed;
+
+struct kib_putack_msg {
+	u64			ibpam_src_cookie;/* reflected completion cookie */
+	u64			ibpam_dst_cookie;/* opaque completion cookie */
+	struct kib_rdma_desc	ibpam_rd;	/* sender's sink buffer */
+} __packed;
+
+struct kib_get_msg {
+	struct lnet_hdr		ibgm_hdr;	/* portals header */
+	u64			ibgm_cookie;	/* opaque completion cookie */
+	struct kib_rdma_desc	ibgm_rd;	/* rdma descriptor */
+} __packed;
+
+struct kib_completion_msg {
+	u64			ibcm_cookie;	/* opaque completion cookie */
+	s32			ibcm_status;    /* < 0 failure: >= 0 length */
+} __packed;
+
+struct kib_msg {
+	/* First 2 fields fixed FOR ALL TIME */
+	u32			ibm_magic;	/* I'm an ibnal message */
+	u16			ibm_version;	/* this is my version number */
+
+	u8			ibm_type;	/* msg type */
+	u8			ibm_credits;	/* returned credits */
+	u32			ibm_nob;	/* # bytes in whole message */
+	u32			ibm_cksum;	/* checksum (0 == no checksum) */
+	u64			ibm_srcnid;	/* sender's NID */
+	u64			ibm_srcstamp;	/* sender's incarnation */
+	u64			ibm_dstnid;	/* destination's NID */
+	u64			ibm_dststamp;	/* destination's incarnation */
+
+	union {
+		struct kib_connparams		connparams;
+		struct kib_immediate_msg	immediate;
+		struct kib_putreq_msg		putreq;
+		struct kib_putack_msg		putack;
+		struct kib_get_msg		get;
+		struct kib_completion_msg	completion;
+	} __packed ibm_u;
+} __packed;
+
+#define IBLND_MSG_MAGIC LNET_PROTO_IB_MAGIC     /* unique magic */
+
+#define IBLND_MSG_VERSION_1	0x11
+#define IBLND_MSG_VERSION_2	0x12
+#define IBLND_MSG_VERSION	IBLND_MSG_VERSION_2
+
+#define IBLND_MSG_CONNREQ	0xc0	/* connection request */
+#define IBLND_MSG_CONNACK	0xc1	/* connection acknowledge */
+#define IBLND_MSG_NOOP		0xd0	/* nothing (just credits) */
+#define IBLND_MSG_IMMEDIATE	0xd1	/* immediate */
+#define IBLND_MSG_PUT_REQ	0xd2	/* putreq (src->sink) */
+#define IBLND_MSG_PUT_NAK	0xd3	/* completion (sink->src) */
+#define IBLND_MSG_PUT_ACK	0xd4	/* putack (sink->src) */
+#define IBLND_MSG_PUT_DONE	0xd5	/* completion (src->sink) */
+#define IBLND_MSG_GET_REQ	0xd6	/* getreq (sink->src) */
+#define IBLND_MSG_GET_DONE	0xd7	/* completion (src->sink: all OK) */
+
+struct kib_rej {
+	u32			ibr_magic;	/* sender's magic */
+	u16			ibr_version;	/* sender's version */
+	u8			ibr_why;	/* reject reason */
+	u8			ibr_padding;	/* padding */
+	u64			ibr_incarnation;/* incarnation of peer_ni */
+	struct kib_connparams	ibr_cp;		/* connection parameters */
+} __packed;
+
+/* connection rejection reasons */
+#define IBLND_REJECT_CONN_RACE       1          /* You lost connection race */
+#define IBLND_REJECT_NO_RESOURCES    2          /* Out of memory/conns etc */
+#define IBLND_REJECT_FATAL           3          /* Anything else */
+
+#define IBLND_REJECT_CONN_UNCOMPAT   4          /* incompatible version peer_ni */
+#define IBLND_REJECT_CONN_STALE      5          /* stale peer_ni */
+
+/* peer_ni's rdma frags doesn't match mine */
+#define IBLND_REJECT_RDMA_FRAGS      6
+/* peer_ni's msg queue size doesn't match mine */
+#define IBLND_REJECT_MSG_QUEUE_SIZE  7
+#define IBLND_REJECT_INVALID_SRV_ID  8
+
+/***********************************************************************/
+
+#endif /* __LNET_O2IBLND_IDL_H__ */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 9147d17..f6865ad3 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2958,6 +2958,164 @@ static int kiblnd_startup(struct lnet_ni *ni)
 	.lnd_recv	= kiblnd_recv,
 };
 
+static void ko2inlnd_assert_wire_constants(void)
+{
+	BUILD_BUG_ON(IBLND_MSG_MAGIC != 0x0be91b91);
+	BUILD_BUG_ON(IBLND_MSG_VERSION_1 != 0x11);
+	BUILD_BUG_ON(IBLND_MSG_VERSION_2 != 0x12);
+	BUILD_BUG_ON(IBLND_MSG_VERSION != IBLND_MSG_VERSION_2);
+
+	BUILD_BUG_ON(IBLND_MSG_CONNREQ != 0xc0);
+	BUILD_BUG_ON(IBLND_MSG_CONNACK != 0xc1);
+	BUILD_BUG_ON(IBLND_MSG_NOOP != 0xd0);
+	BUILD_BUG_ON(IBLND_MSG_IMMEDIATE != 0xd1);
+	BUILD_BUG_ON(IBLND_MSG_PUT_REQ != 0xd2);
+	BUILD_BUG_ON(IBLND_MSG_PUT_NAK != 0xd3);
+	BUILD_BUG_ON(IBLND_MSG_PUT_ACK != 0xd4);
+	BUILD_BUG_ON(IBLND_MSG_PUT_DONE != 0xd5);
+	BUILD_BUG_ON(IBLND_MSG_GET_REQ != 0xd6);
+	BUILD_BUG_ON(IBLND_MSG_GET_DONE != 0xd7);
+
+	BUILD_BUG_ON(IBLND_REJECT_CONN_RACE != 1);
+	BUILD_BUG_ON(IBLND_REJECT_NO_RESOURCES != 2);
+	BUILD_BUG_ON(IBLND_REJECT_FATAL != 3);
+	BUILD_BUG_ON(IBLND_REJECT_CONN_UNCOMPAT != 4);
+	BUILD_BUG_ON(IBLND_REJECT_CONN_STALE != 5);
+	BUILD_BUG_ON(IBLND_REJECT_RDMA_FRAGS != 6);
+	BUILD_BUG_ON(IBLND_REJECT_MSG_QUEUE_SIZE != 7);
+	BUILD_BUG_ON(IBLND_REJECT_INVALID_SRV_ID != 8);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_connparams) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_connparams, ibcp_queue_depth) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_connparams *)0)->ibcp_queue_depth) != 2);
+	BUILD_BUG_ON((int)offsetof(struct kib_connparams, ibcp_max_frags) != 2);
+	BUILD_BUG_ON((int)sizeof(((struct kib_connparams *)0)->ibcp_max_frags) != 2);
+	BUILD_BUG_ON((int)offsetof(struct kib_connparams, ibcp_max_msg_size) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct kib_connparams *)0)->ibcp_max_msg_size) != 4);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_immediate_msg) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_immediate_msg, ibim_hdr) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_immediate_msg *)0)->ibim_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_immediate_msg, ibim_payload) != 72);
+	BUILD_BUG_ON((int)sizeof(((struct kib_immediate_msg *)0)->ibim_payload) != 0);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_rdma_frag) != 12);
+	BUILD_BUG_ON((int)offsetof(struct kib_rdma_frag, rf_nob) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_rdma_frag *)0)->rf_nob) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_rdma_frag, rf_addr) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct kib_rdma_frag *)0)->rf_addr) != 8);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_rdma_desc) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_rdma_desc, rd_key) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_rdma_desc *)0)->rd_key) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_rdma_desc, rd_nfrags) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct kib_rdma_desc *)0)->rd_nfrags) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_rdma_desc, rd_frags) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct kib_rdma_desc *)0)->rd_frags) != 0);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_putreq_msg) != 80);
+	BUILD_BUG_ON((int)offsetof(struct kib_putreq_msg, ibprm_hdr) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_putreq_msg *)0)->ibprm_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_putreq_msg, ibprm_cookie) != 72);
+	BUILD_BUG_ON((int)sizeof(((struct kib_putreq_msg *)0)->ibprm_cookie) != 8);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_putack_msg) != 24);
+	BUILD_BUG_ON((int)offsetof(struct kib_putack_msg, ibpam_src_cookie) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_putack_msg *)0)->ibpam_src_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_putack_msg, ibpam_dst_cookie) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct kib_putack_msg *)0)->ibpam_dst_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_putack_msg, ibpam_rd) != 16);
+	BUILD_BUG_ON((int)sizeof(((struct kib_putack_msg *)0)->ibpam_rd) != 8);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_get_msg) != 88);
+	BUILD_BUG_ON((int)offsetof(struct kib_get_msg, ibgm_hdr) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_get_msg *)0)->ibgm_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_get_msg, ibgm_cookie) != 72);
+	BUILD_BUG_ON((int)sizeof(((struct kib_get_msg *)0)->ibgm_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_get_msg, ibgm_rd) != 80);
+	BUILD_BUG_ON((int)sizeof(((struct kib_get_msg *)0)->ibgm_rd) != 8);
+
+	BUILD_BUG_ON((int)sizeof(struct kib_completion_msg) != 12);
+	BUILD_BUG_ON((int)offsetof(struct kib_completion_msg, ibcm_cookie) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_completion_msg *)0)->ibcm_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_completion_msg, ibcm_status) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct kib_completion_msg *)0)->ibcm_status) != 4);
+
+	/* Checks for struct kib_msg */
+	//BUILD_BUG_ON((int)sizeof(struct kib_msg) != 12);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_magic) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_magic) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_version) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_version) != 2);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_type) != 6);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_type) != 1);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_credits) != 7);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_credits) != 1);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_nob) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_nob) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_cksum) != 12);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_cksum) != 4);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_srcnid) != 16);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_srcnid) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_srcstamp) != 24);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_srcstamp) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_dstnid) != 32);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_dstnid) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_dststamp) != 40);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_dststamp) != 8);
+
+	/* Connparams */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.connparams.ibcp_queue_depth) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.connparams.ibcp_queue_depth) != 2);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.connparams.ibcp_max_frags) != 50);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.connparams.ibcp_max_frags) != 2);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.connparams.ibcp_max_msg_size) != 52);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.connparams.ibcp_max_msg_size) != 4);
+
+	/* Immediate message */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.immediate.ibim_hdr) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.immediate.ibim_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.immediate.ibim_payload) != 120);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.immediate.ibim_payload) != 0);
+
+	/* PUT req message */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.putreq.ibprm_hdr) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.putreq.ibprm_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.putreq.ibprm_cookie) != 120);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.putreq.ibprm_cookie) != 8);
+
+	/* Put ACK */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.putack.ibpam_src_cookie) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.putack.ibpam_src_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.putack.ibpam_dst_cookie) != 56);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.putack.ibpam_dst_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.putack.ibpam_rd) != 64);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.putack.ibpam_rd) != 8);
+
+	/* GET message */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.get.ibgm_hdr) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.get.ibgm_hdr) != 72);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.get.ibgm_cookie) != 120);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.get.ibgm_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.get.ibgm_rd) != 128);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.get.ibgm_rd) != 8);
+
+	/* Completion message */
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.completion.ibcm_cookie) != 48);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.completion.ibcm_cookie) != 8);
+	BUILD_BUG_ON((int)offsetof(struct kib_msg, ibm_u.completion.ibcm_status) != 56);
+	BUILD_BUG_ON((int)sizeof(((struct kib_msg *)0)->ibm_u.completion.ibcm_status) != 4);
+
+	/* Sanity checks */
+	BUILD_BUG_ON(sizeof(struct kib_msg) > IBLND_MSG_SIZE);
+	BUILD_BUG_ON(offsetof(struct kib_msg,
+		     ibm_u.get.ibgm_rd.rd_frags[IBLND_MAX_RDMA_FRAGS]) >
+		     IBLND_MSG_SIZE);
+	BUILD_BUG_ON(offsetof(struct kib_msg,
+		     ibm_u.putack.ibpam_rd.rd_frags[IBLND_MAX_RDMA_FRAGS]) >
+		     IBLND_MSG_SIZE);
+}
+
 static void __exit ko2iblnd_exit(void)
 {
 	lnet_unregister_lnd(&the_o2iblnd);
@@ -2967,13 +3125,7 @@ static int __init ko2iblnd_init(void)
 {
 	int rc;
 
-	BUILD_BUG_ON(sizeof(struct kib_msg) > IBLND_MSG_SIZE);
-	BUILD_BUG_ON(offsetof(struct kib_msg,
-			  ibm_u.get.ibgm_rd.rd_frags[IBLND_MAX_RDMA_FRAGS])
-			  > IBLND_MSG_SIZE);
-	BUILD_BUG_ON(offsetof(struct kib_msg,
-			  ibm_u.putack.ibpam_rd.rd_frags[IBLND_MAX_RDMA_FRAGS])
-			  > IBLND_MSG_SIZE);
+	ko2inlnd_assert_wire_constants();
 
 	kiblnd_tunables_init();
 
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 12d220c..8db03bd 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -64,6 +64,7 @@
 #define DEBUG_SUBSYSTEM S_LND
 
 #include <linux/lnet/lib-lnet.h>
+#include "o2iblnd-idl.h"
 
 #define IBLND_PEER_HASH_SIZE		101	/* # peer_ni lists */
 
@@ -376,119 +377,6 @@ struct kib_data {
 #define IBLND_INIT_DATA		1
 #define IBLND_INIT_ALL		2
 
-/************************************************************************
- * IB Wire message format.
- * These are sent in sender's byte order (i.e. receiver flips).
- */
-
-struct kib_connparams {
-	u16			ibcp_queue_depth;
-	u16			ibcp_max_frags;
-	u32			ibcp_max_msg_size;
-} __packed;
-
-struct kib_immediate_msg {
-	struct lnet_hdr		ibim_hdr;	 /* portals header */
-	char			ibim_payload[0]; /* piggy-backed payload */
-} __packed;
-
-struct kib_rdma_frag {
-	u32			rf_nob;		/* # bytes this frag */
-	u64			rf_addr;	/* CAVEAT EMPTOR: misaligned!! */
-} __packed;
-
-struct kib_rdma_desc {
-	u32			rd_key;		/* local/remote key */
-	u32			rd_nfrags;	/* # fragments */
-	struct kib_rdma_frag	rd_frags[0];	/* buffer frags */
-} __packed;
-
-struct kib_putreq_msg {
-	struct lnet_hdr		ibprm_hdr;	/* portals header */
-	u64			ibprm_cookie;	/* opaque completion cookie */
-} __packed;
-
-struct kib_putack_msg {
-	u64			ibpam_src_cookie; /* reflected completion cookie */
-	u64			ibpam_dst_cookie; /* opaque completion cookie */
-	struct kib_rdma_desc	ibpam_rd;	  /* sender's sink buffer */
-} __packed;
-
-struct kib_get_msg {
-	struct lnet_hdr		ibgm_hdr;	/* portals header */
-	u64			ibgm_cookie;	/* opaque completion cookie */
-	struct kib_rdma_desc	ibgm_rd;	/* rdma descriptor */
-} __packed;
-
-struct kib_completion_msg {
-	u64			ibcm_cookie;	/* opaque completion cookie */
-	s32			ibcm_status;	/* < 0 failure: >= 0 length */
-} __packed;
-
-struct kib_msg {
-	/* First 2 fields fixed FOR ALL TIME */
-	u32			ibm_magic;	/* I'm an ibnal message */
-	u16			ibm_version;	/* this is my version number */
-
-	u8			ibm_type;	/* msg type */
-	u8			ibm_credits;	/* returned credits */
-	u32			ibm_nob;	/* # bytes in whole message */
-	u32			ibm_cksum;	/* checksum (0 == no checksum) */
-	u64			ibm_srcnid;	/* sender's NID */
-	u64			ibm_srcstamp;	/* sender's incarnation */
-	u64			ibm_dstnid;	/* destination's NID */
-	u64			ibm_dststamp;	/* destination's incarnation */
-
-	union {
-		struct kib_connparams		connparams;
-		struct kib_immediate_msg	immediate;
-		struct kib_putreq_msg		putreq;
-		struct kib_putack_msg		putack;
-		struct kib_get_msg		get;
-		struct kib_completion_msg	completion;
-	} __packed ibm_u;
-} __packed;
-
-#define IBLND_MSG_MAGIC		LNET_PROTO_IB_MAGIC /* unique magic */
-
-#define IBLND_MSG_VERSION_1	0x11
-#define IBLND_MSG_VERSION_2	0x12
-#define IBLND_MSG_VERSION	IBLND_MSG_VERSION_2
-
-#define IBLND_MSG_CONNREQ	0xc0	/* connection request */
-#define IBLND_MSG_CONNACK	0xc1	/* connection acknowledge */
-#define IBLND_MSG_NOOP		0xd0	/* nothing (just credits) */
-#define IBLND_MSG_IMMEDIATE	0xd1	/* immediate */
-#define IBLND_MSG_PUT_REQ	0xd2	/* putreq (src->sink) */
-#define IBLND_MSG_PUT_NAK	0xd3	/* completion (sink->src) */
-#define IBLND_MSG_PUT_ACK	0xd4	/* putack (sink->src) */
-#define IBLND_MSG_PUT_DONE	0xd5	/* completion (src->sink) */
-#define IBLND_MSG_GET_REQ	0xd6	/* getreq (sink->src) */
-#define IBLND_MSG_GET_DONE	0xd7	/* completion (src->sink: all OK) */
-
-struct kib_rej {
-	u32			ibr_magic;	/* sender's magic */
-	u16			ibr_version;	/* sender's version */
-	u8			ibr_why;	/* reject reason */
-	u8			ibr_padding;	/* padding */
-	u64			ibr_incarnation;/* incarnation of peer_ni */
-	struct kib_connparams	ibr_cp;		/* connection parameters */
-} __packed;
-
-/* connection rejection reasons */
-#define IBLND_REJECT_CONN_RACE		1 /* You lost connection race */
-#define IBLND_REJECT_NO_RESOURCES	2 /* Out of memory/conns etc */
-#define IBLND_REJECT_FATAL		3 /* Anything else */
-#define IBLND_REJECT_CONN_UNCOMPAT	4 /* incompatible version peer_ni */
-#define IBLND_REJECT_CONN_STALE		5 /* stale peer_ni */
-/* peer_ni's rdma frags doesn't match mine */
-#define IBLND_REJECT_RDMA_FRAGS		6
-/* peer_ni's msg queue size doesn't match mine */
-#define IBLND_REJECT_MSG_QUEUE_SIZE	7
-#define IBLND_REJECT_INVALID_SRV_ID	8
-
-/***********************************************************************/
-
 struct kib_rx {					/* receive message */
 	struct list_head	rx_list;	/* queue for attention */
 	struct kib_conn        *rx_conn;	/* owning conn */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index f121d69..542cc2e 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -689,7 +689,17 @@ static void lnet_assert_wire_constants(void)
 	BUILD_BUG_ON(LNET_MSG_REPLY != 3);
 	BUILD_BUG_ON(LNET_MSG_HELLO != 4);
 
-	/* Checks for struct ptl_handle_wire_t */
+	BUILD_BUG_ON((int)sizeof(lnet_nid_t) != 8);
+	BUILD_BUG_ON((int)sizeof(lnet_pid_t) != 4);
+
+	/* Checks for struct lnet_process_id_packed */
+	BUILD_BUG_ON((int)sizeof(struct lnet_process_id_packed) != 12);
+	BUILD_BUG_ON((int)offsetof(struct lnet_process_id_packed, nid) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_process_id_packed *)0)->nid) != 8);
+	BUILD_BUG_ON((int)offsetof(struct lnet_process_id_packed, pid) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_process_id_packed *)0)->pid) != 4);
+
+	/* Checks for struct lnet_handle_wire */
 	BUILD_BUG_ON((int)sizeof(struct lnet_handle_wire) != 16);
 	BUILD_BUG_ON((int)offsetof(struct lnet_handle_wire, wh_interface_cookie) != 0);
 	BUILD_BUG_ON((int)sizeof(((struct lnet_handle_wire *)0)->wh_interface_cookie) != 8);
@@ -801,6 +811,43 @@ static void lnet_assert_wire_constants(void)
 	BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_nnis) != 4);
 	BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_ni) != 16);
 	BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_ni) != 0);
+
+	/* Acceptor connection request */
+	BUILD_BUG_ON(LNET_PROTO_ACCEPTOR_VERSION != 1);
+
+	/* Checks for struct lnet_acceptor_connreq */
+	BUILD_BUG_ON((int)sizeof(struct lnet_acceptor_connreq) != 16);
+	BUILD_BUG_ON((int)offsetof(struct lnet_acceptor_connreq, acr_magic) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_acceptor_connreq *)0)->acr_magic) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_acceptor_connreq, acr_version) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_acceptor_connreq *)0)->acr_version) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_acceptor_connreq, acr_nid) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_acceptor_connreq *)0)->acr_nid) != 8);
+
+	/* Checks for struct lnet_counters_common */
+	BUILD_BUG_ON((int)sizeof(struct lnet_counters_common) != 60);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_msgs_alloc) != 0);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_msgs_alloc) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_msgs_max) != 4);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_msgs_max) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_errors) != 8);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_errors) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_send_count) != 12);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_send_count) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_recv_count) != 16);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_recv_count) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_route_count) != 20);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_route_count) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_drop_count) != 24);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_drop_count) != 4);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_send_length) != 28);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_send_length) != 8);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_recv_length) != 36);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_recv_length) != 8);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_route_length) != 44);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_route_length) != 8);
+	BUILD_BUG_ON((int)offsetof(struct lnet_counters_common, lcc_drop_length) != 52);
+	BUILD_BUG_ON((int)sizeof(((struct lnet_counters_common *)0)->lcc_drop_length) != 8);
 }
 
 static struct lnet_lnd *
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 38/41] lnet: libcfs: use wait_event_timeout() in tracefiled().
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (36 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 37/41] lnet: place wire protocol data int own headers James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 39/41] lnet: use init_wait() rather than init_waitqueue_entry() James Simmons
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

By using wait_event_timeout() we can make it more clear what is being
waited for, and when the loop terminates.

WC-bug-id: https://jira.whamcloud.com/browse/LU-9859
Lustre-commit: 0269ac4a0069b0e4 ("LU-9859 libcfs: use wait_event_timeout() in tracefiled().")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39293
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/tracefile.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index 14fcb2a..4e1900d 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -1085,13 +1085,18 @@ static int tracefiled(void *arg)
 
 	complete(&tctl->tctl_start);
 
-	while (1) {
-		wait_queue_entry_t __wait;
-
-		pc.pc_want_daemon_pages = 0;
-		collect_pages(&pc);
+	pc.pc_want_daemon_pages = 0;
+
+	while (!last_loop) {
+		wait_event_timeout(tctl->tctl_waitq,
+				   ({ collect_pages(&pc);
+				     !list_empty(&pc.pc_pages); }) ||
+				   atomic_read(&tctl->tctl_shutdown),
+				   HZ);
+		if (atomic_read(&tctl->tctl_shutdown))
+			last_loop = 1;
 		if (list_empty(&pc.pc_pages))
-			goto end_loop;
+			continue;
 
 		filp = NULL;
 		down_read(&cfs_tracefile_sem);
@@ -1110,18 +1115,19 @@ static int tracefiled(void *arg)
 		if (!filp) {
 			put_pages_on_daemon_list(&pc);
 			__LASSERT(list_empty(&pc.pc_pages));
-			goto end_loop;
+			continue;
 		}
 
 		list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) {
+			struct dentry *de = file_dentry(filp);
 			static loff_t f_pos;
 
 			__LASSERT_TAGE_INVARIANT(tage);
 
 			if (f_pos >= (off_t)cfs_tracefile_size)
 				f_pos = 0;
-			else if (f_pos > i_size_read(file_inode(filp)))
-				f_pos = i_size_read(file_inode(filp));
+			else if (f_pos > i_size_read(de->d_inode))
+				f_pos = i_size_read(de->d_inode);
 
 			buf = kmap(tage->page);
 			rc = kernel_write(filp, buf, tage->used, &f_pos);
@@ -1158,19 +1164,6 @@ static int tracefiled(void *arg)
 			pr_err("Lustre: There are %d pages unwritten\n", i);
 		}
 		__LASSERT(list_empty(&pc.pc_pages));
-end_loop:
-		if (atomic_read(&tctl->tctl_shutdown)) {
-			if (!last_loop) {
-				last_loop = 1;
-				continue;
-			} else {
-				break;
-			}
-		}
-		init_wait(&__wait);
-		add_wait_queue(&tctl->tctl_waitq, &__wait);
-		schedule_timeout_interruptible(HZ);
-		remove_wait_queue(&tctl->tctl_waitq, &__wait);
 	}
 	complete(&tctl->tctl_stop);
 	return 0;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 39/41] lnet: use init_wait() rather than init_waitqueue_entry()
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (37 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 38/41] lnet: libcfs: use wait_event_timeout() in tracefiled() James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 40/41] lnet: discard LNET_MD_PHYS James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 41/41] lnet: o2iblnd: convert peers hash table to hashtable.h James Simmons
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

init_waitqueue_entry(foo, current)

is equivalent to

  init_wait(foo)

So use the shorter version.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: dd0e7523e10202d2 ("LU-12678 lnet: use init_wait() rather than init_waitqueue_entry()")
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39295
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 +++---
 net/lnet/klnds/socklnd/socklnd_cb.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index e29cb4b..9f9afce 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -3389,7 +3389,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	int peer_index = 0;
 	unsigned long deadline = jiffies;
 
-	init_waitqueue_entry(&wait, current);
+	init_wait(&wait);
 	kiblnd_data.kib_connd = current;
 
 	spin_lock_irqsave(lock, flags);
@@ -3680,7 +3680,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	int did_something;
 	int rc;
 
-	init_waitqueue_entry(&wait, current);
+	init_wait(&wait);
 
 	sched = kiblnd_data.kib_scheds[KIB_THREAD_CPT(id)];
 
@@ -3812,7 +3812,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	LASSERT(*kiblnd_tunables.kib_dev_failover);
 
-	init_waitqueue_entry(&wait, current);
+	init_wait(&wait);
 	write_lock_irqsave(glock, flags);
 
 	while (!kiblnd_data.kib_shutdown) {
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index a1c0c3d..7fa2d58 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -2075,7 +2075,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	wait_queue_entry_t wait;
 	int cons_retry = 0;
 
-	init_waitqueue_entry(&wait, current);
+	init_wait(&wait);
 
 	spin_lock_bh(connd_lock);
 
@@ -2458,7 +2458,7 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	int peer_index = 0;
 	time64_t deadline = ktime_get_seconds();
 
-	init_waitqueue_entry(&wait, current);
+	init_wait(&wait);
 
 	spin_lock_bh(&ksocknal_data.ksnd_reaper_lock);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 40/41] lnet: discard LNET_MD_PHYS
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (38 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 39/41] lnet: use init_wait() rather than init_waitqueue_entry() James Simmons
@ 2021-04-05  0:51 ` James Simmons
  2021-04-05  0:51 ` [lustre-devel] [PATCH 41/41] lnet: o2iblnd: convert peers hash table to hashtable.h James Simmons
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

This macro has no value and is never set.
It claims "compatibility with Cray Portals", yet cray-dvs
   git://github.com/glennklockwood/cray-dvs.git
does not use it in any non-trivial way.

Much has changed in lnet and lib-md since 2007 when this
value was added - it seems likely that this really
is dead.

So remove it.  If/when this results in problems, it can
easily be re-added and more details can be provided at
that time.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: aa57e829867caa9 ("LU-12678 lnet: discard LNET_MD_PHYS")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39301
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/pers.c              | 1 -
 include/uapi/linux/lnet/lnet-types.h | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/fs/lustre/ptlrpc/pers.c b/fs/lustre/ptlrpc/pers.c
index ecbc9d3..78b8ce2 100644
--- a/fs/lustre/ptlrpc/pers.c
+++ b/fs/lustre/ptlrpc/pers.c
@@ -50,7 +50,6 @@ void ptlrpc_fill_bulk_md(struct lnet_md *md, struct ptlrpc_bulk_desc *desc,
 
 	LASSERT(mdidx < desc->bd_md_max_brw);
 	LASSERT(desc->bd_iov_count <= PTLRPC_MAX_BRW_PAGES);
-	LASSERT(!(md->options & (LNET_MD_KIOV | LNET_MD_PHYS)));
 
 	/* just send a lnet header */
 	if (mdidx >= desc->bd_md_count) {
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 4d0b6f9..43800ae 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -357,9 +357,6 @@ struct lnet_md {
 /** See struct lnet_md::options. */
 #define LNET_MD_NO_TRACK_RESPONSE	(1 << 11)
 
-/* For compatibility with Cray Portals */
-#define LNET_MD_PHYS		0
-
 /** Infinite threshold on MD operations. See lnet_md::threshold */
 #define LNET_MD_THRESH_INF	(-1)
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [lustre-devel] [PATCH 41/41] lnet: o2iblnd: convert peers hash table to hashtable.h
  2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
                   ` (39 preceding siblings ...)
  2021-04-05  0:51 ` [lustre-devel] [PATCH 40/41] lnet: discard LNET_MD_PHYS James Simmons
@ 2021-04-05  0:51 ` James Simmons
  40 siblings, 0 replies; 42+ messages in thread
From: James Simmons @ 2021-04-05  0:51 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Using a hashtable.h hashtable, rather than bespoke code, has several
advantages:

  - the table is comprised of hlist_head, rather than list_head, so
    it consumes less memory (though we need to make it a little bigger
    as it must be a power-of-2)
  - there are existing macros for easily walking the whole table
  - it uses a "real" hash function rather than "mod a prime number".

In some ways, rhashtable might be even better, but it can change the
ordering of objects in the table are arbitrary moments, and that could
hurt the user-space API.  It also does not support the partitioned
walking that ksocknal_check_peer_timeouts() depends on.

Note that new peers are inserted at the top of a hash chain, rather
than appended at the end.  I don't think that should be a problem.

Also various white-space cleanups etc.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12678
Lustre-commit: c66668387a11492e ("LU-12678 o2iblnd: convert peers hash table to hashtable.h")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/39303
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    | 101 +++++++++++++++---------------------
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  19 ++-----
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  12 ++---
 3 files changed, 52 insertions(+), 80 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index f6865ad3..c8cebf6 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -339,7 +339,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer_ni **peerp,
 	peer_ni->ibp_queue_depth_mod = 0;	/* try to use the default */
 	atomic_set(&peer_ni->ibp_refcount, 1);  /* 1 ref for caller */
 
-	INIT_LIST_HEAD(&peer_ni->ibp_list);
+	INIT_HLIST_NODE(&peer_ni->ibp_list);
 	INIT_LIST_HEAD(&peer_ni->ibp_conns);
 	INIT_LIST_HEAD(&peer_ni->ibp_tx_queue);
 
@@ -385,10 +385,10 @@ struct kib_peer_ni *kiblnd_find_peer_locked(struct lnet_ni *ni, lnet_nid_t nid)
 	 * the caller is responsible for accounting the additional reference
 	 * that this creates
 	 */
-	struct list_head *peer_list = kiblnd_nid2peerlist(nid);
 	struct kib_peer_ni *peer_ni;
 
-	list_for_each_entry(peer_ni, peer_list, ibp_list) {
+	hash_for_each_possible(kiblnd_data.kib_peers, peer_ni,
+			       ibp_list, nid) {
 		LASSERT(!kiblnd_peer_idle(peer_ni));
 
 		/*
@@ -415,7 +415,7 @@ void kiblnd_unlink_peer_locked(struct kib_peer_ni *peer_ni)
 	LASSERT(list_empty(&peer_ni->ibp_conns));
 
 	LASSERT(kiblnd_peer_active(peer_ni));
-	list_del_init(&peer_ni->ibp_list);
+	hlist_del_init(&peer_ni->ibp_list);
 	/* lose peerlist's ref */
 	kiblnd_peer_decref(peer_ni);
 }
@@ -429,24 +429,20 @@ static int kiblnd_get_peer_info(struct lnet_ni *ni, int index,
 
 	read_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
-	for (i = 0; i < kiblnd_data.kib_peer_hash_size; i++) {
-		list_for_each_entry(peer_ni, &kiblnd_data.kib_peers[i],
-				    ibp_list) {
-			LASSERT(!kiblnd_peer_idle(peer_ni));
+	hash_for_each(kiblnd_data.kib_peers, i, peer_ni, ibp_list) {
+		LASSERT(!kiblnd_peer_idle(peer_ni));
 
-			if (peer_ni->ibp_ni != ni)
-				continue;
+		if (peer_ni->ibp_ni != ni)
+			continue;
 
-			if (index-- > 0)
-				continue;
+		if (index-- > 0)
+			continue;
 
-			*nidp = peer_ni->ibp_nid;
-			*count = atomic_read(&peer_ni->ibp_refcount);
+		*nidp = peer_ni->ibp_nid;
+		*count = atomic_read(&peer_ni->ibp_refcount);
 
-			read_unlock_irqrestore(&kiblnd_data.kib_global_lock,
-					       flags);
-			return 0;
-		}
+		read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
+		return 0;
 	}
 
 	read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags);
@@ -476,7 +472,7 @@ static void kiblnd_del_peer_locked(struct kib_peer_ni *peer_ni)
 static int kiblnd_del_peer(struct lnet_ni *ni, lnet_nid_t nid)
 {
 	LIST_HEAD(zombies);
-	struct kib_peer_ni *pnxt;
+	struct hlist_node *pnxt;
 	struct kib_peer_ni *peer_ni;
 	int lo;
 	int hi;
@@ -487,16 +483,16 @@ static int kiblnd_del_peer(struct lnet_ni *ni, lnet_nid_t nid)
 	write_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
 	if (nid != LNET_NID_ANY) {
-		lo = kiblnd_nid2peerlist(nid) - kiblnd_data.kib_peers;
-		hi = kiblnd_nid2peerlist(nid) - kiblnd_data.kib_peers;
+		lo = hash_min(nid, HASH_BITS(kiblnd_data.kib_peers));
+		hi = lo;
 	} else {
 		lo = 0;
-		hi = kiblnd_data.kib_peer_hash_size - 1;
+		hi = HASH_SIZE(kiblnd_data.kib_peers) - 1;
 	}
 
 	for (i = lo; i <= hi; i++) {
-		list_for_each_entry_safe(peer_ni, pnxt,
-					 &kiblnd_data.kib_peers[i], ibp_list) {
+		hlist_for_each_entry_safe(peer_ni, pnxt,
+					  &kiblnd_data.kib_peers[i], ibp_list) {
 			LASSERT(!kiblnd_peer_idle(peer_ni));
 
 			if (peer_ni->ibp_ni != ni)
@@ -533,25 +529,21 @@ static struct kib_conn *kiblnd_get_conn_by_idx(struct lnet_ni *ni, int index)
 
 	read_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
-	for (i = 0; i < kiblnd_data.kib_peer_hash_size; i++) {
-		list_for_each_entry(peer_ni, &kiblnd_data.kib_peers[i],
-				    ibp_list) {
-			LASSERT(!kiblnd_peer_idle(peer_ni));
+	hash_for_each(kiblnd_data.kib_peers, i, peer_ni, ibp_list) {
+		LASSERT(!kiblnd_peer_idle(peer_ni));
 
-			if (peer_ni->ibp_ni != ni)
-				continue;
+		if (peer_ni->ibp_ni != ni)
+			continue;
 
-			list_for_each_entry(conn, &peer_ni->ibp_conns,
-					    ibc_list) {
-				if (index-- > 0)
-					continue;
+		list_for_each_entry(conn, &peer_ni->ibp_conns,
+				    ibc_list) {
+			if (index-- > 0)
+				continue;
 
-				kiblnd_conn_addref(conn);
-				read_unlock_irqrestore(
-					&kiblnd_data.kib_global_lock,
-					flags);
-				return conn;
-			}
+			kiblnd_conn_addref(conn);
+			read_unlock_irqrestore(&kiblnd_data.kib_global_lock,
+					       flags);
+			return conn;
 		}
 	}
 
@@ -1014,7 +1006,7 @@ int kiblnd_close_stale_conns_locked(struct kib_peer_ni *peer_ni,
 static int kiblnd_close_matching_conns(struct lnet_ni *ni, lnet_nid_t nid)
 {
 	struct kib_peer_ni *peer_ni;
-	struct kib_peer_ni *pnxt;
+	struct hlist_node *pnxt;
 	int lo;
 	int hi;
 	int i;
@@ -1024,16 +1016,16 @@ static int kiblnd_close_matching_conns(struct lnet_ni *ni, lnet_nid_t nid)
 	write_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
 	if (nid != LNET_NID_ANY) {
-		lo = kiblnd_nid2peerlist(nid) - kiblnd_data.kib_peers;
-		hi = kiblnd_nid2peerlist(nid) - kiblnd_data.kib_peers;
+		lo = hash_min(nid, HASH_BITS(kiblnd_data.kib_peers));
+		hi = lo;
 	} else {
 		lo = 0;
-		hi = kiblnd_data.kib_peer_hash_size - 1;
+		hi = HASH_SIZE(kiblnd_data.kib_peers) - 1;
 	}
 
 	for (i = lo; i <= hi; i++) {
-		list_for_each_entry_safe(peer_ni, pnxt,
-					 &kiblnd_data.kib_peers[i], ibp_list) {
+		hlist_for_each_entry_safe(peer_ni, pnxt,
+					  &kiblnd_data.kib_peers[i], ibp_list) {
 			LASSERT(!kiblnd_peer_idle(peer_ni));
 
 			if (peer_ni->ibp_ni != ni)
@@ -2499,6 +2491,7 @@ void kiblnd_destroy_dev(struct kib_dev *dev)
 static void kiblnd_base_shutdown(void)
 {
 	struct kib_sched_info *sched;
+	struct kib_peer_ni *peer_ni;
 	int i;
 
 	LASSERT(list_empty(&kiblnd_data.kib_devs));
@@ -2509,9 +2502,8 @@ static void kiblnd_base_shutdown(void)
 
 	case IBLND_INIT_ALL:
 	case IBLND_INIT_DATA:
-		LASSERT(kiblnd_data.kib_peers);
-		for (i = 0; i < kiblnd_data.kib_peer_hash_size; i++)
-			LASSERT(list_empty(&kiblnd_data.kib_peers[i]));
+		hash_for_each(kiblnd_data.kib_peers, i, peer_ni, ibp_list)
+			LASSERT(0);
 		LASSERT(list_empty(&kiblnd_data.kib_connd_zombies));
 		LASSERT(list_empty(&kiblnd_data.kib_connd_conns));
 		LASSERT(list_empty(&kiblnd_data.kib_reconn_list));
@@ -2541,8 +2533,6 @@ static void kiblnd_base_shutdown(void)
 		break;
 	}
 
-	kvfree(kiblnd_data.kib_peers);
-
 	if (kiblnd_data.kib_scheds)
 		cfs_percpt_free(kiblnd_data.kib_scheds);
 
@@ -2628,14 +2618,7 @@ static int kiblnd_base_startup(struct net *ns)
 	INIT_LIST_HEAD(&kiblnd_data.kib_devs);
 	INIT_LIST_HEAD(&kiblnd_data.kib_failed_devs);
 
-	kiblnd_data.kib_peer_hash_size = IBLND_PEER_HASH_SIZE;
-	kiblnd_data.kib_peers = kvmalloc_array(kiblnd_data.kib_peer_hash_size,
-					       sizeof(struct list_head),
-					       GFP_KERNEL);
-	if (!kiblnd_data.kib_peers)
-		goto failed;
-	for (i = 0; i < kiblnd_data.kib_peer_hash_size; i++)
-		INIT_LIST_HEAD(&kiblnd_data.kib_peers[i]);
+	hash_init(kiblnd_data.kib_peers);
 
 	spin_lock_init(&kiblnd_data.kib_connd_lock);
 	INIT_LIST_HEAD(&kiblnd_data.kib_connd_conns);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 8db03bd..a5a66ee 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -66,7 +66,7 @@
 #include <linux/lnet/lib-lnet.h>
 #include "o2iblnd-idl.h"
 
-#define IBLND_PEER_HASH_SIZE		101	/* # peer_ni lists */
+#define IBLND_PEER_HASH_BITS		7	/* log2 of # peer_ni lists */
 
 #define IBLND_N_SCHED			2
 #define IBLND_N_SCHED_HIGH		4
@@ -342,9 +342,7 @@ struct kib_data {
 	/* stabilize net/dev/peer_ni/conn ops */
 	rwlock_t		kib_global_lock;
 	/* hash table of all my known peers */
-	struct list_head       *kib_peers;
-	/* size of kib_peers */
-	int			kib_peer_hash_size;
+	DECLARE_HASHTABLE(kib_peers, IBLND_PEER_HASH_BITS);
 	/* the connd task (serialisation assertions) */
 	void		       *kib_connd;
 	/* connections to setup/teardown */
@@ -488,7 +486,7 @@ struct kib_conn {
 #define IBLND_CONN_DISCONNECTED		5	/* disconnected */
 
 struct kib_peer_ni {
-	struct list_head	ibp_list;	/* stash on global peer_ni list */
+	struct hlist_node	ibp_list;	/* on peer_ni hash chain */
 	lnet_nid_t		ibp_nid;	/* who's on the other end(s) */
 	struct lnet_ni		*ibp_ni;	/* LNet interface */
 	struct list_head	ibp_conns;	/* all active connections */
@@ -642,20 +640,11 @@ static inline int kiblnd_timeout(void)
 		list_empty(&peer_ni->ibp_conns);
 }
 
-static inline struct list_head *
-kiblnd_nid2peerlist(lnet_nid_t nid)
-{
-	unsigned int hash =
-		((unsigned int)nid) % kiblnd_data.kib_peer_hash_size;
-
-	return &kiblnd_data.kib_peers[hash];
-}
-
 static inline int
 kiblnd_peer_active(struct kib_peer_ni *peer_ni)
 {
 	/* Am I in the peer_ni hash table? */
-	return !list_empty(&peer_ni->ibp_list);
+	return !hlist_unhashed(&peer_ni->ibp_list);
 }
 
 static inline struct kib_conn *
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9f9afce..2ebda4e 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1494,7 +1494,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		list_add_tail(&tx->tx_list, &peer_ni->ibp_tx_queue);
 
 	kiblnd_peer_addref(peer_ni);
-	list_add_tail(&peer_ni->ibp_list, kiblnd_nid2peerlist(nid));
+	hash_add(kiblnd_data.kib_peers, &peer_ni->ibp_list, nid);
 
 	write_unlock_irqrestore(g_lock, flags);
 
@@ -2533,7 +2533,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		LASSERT(!net->ibn_shutdown);
 
 		kiblnd_peer_addref(peer_ni);
-		list_add_tail(&peer_ni->ibp_list, kiblnd_nid2peerlist(nid));
+		hash_add(kiblnd_data.kib_peers, &peer_ni->ibp_list, nid);
 
 		write_unlock_irqrestore(g_lock, flags);
 	}
@@ -3257,7 +3257,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	LIST_HEAD(closes);
 	LIST_HEAD(checksends);
 	LIST_HEAD(timedout_txs);
-	struct list_head *peers = &kiblnd_data.kib_peers[idx];
+	struct hlist_head *peers = &kiblnd_data.kib_peers[idx];
 	struct kib_peer_ni *peer_ni;
 	struct kib_tx *tx_tmp, *tx;
 	struct kib_conn *conn;
@@ -3270,7 +3270,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	 */
 	write_lock_irqsave(&kiblnd_data.kib_global_lock, flags);
 
-	list_for_each_entry(peer_ni, peers, ibp_list) {
+	hlist_for_each_entry(peer_ni, peers, ibp_list) {
 		/* Check tx_deadline */
 		list_for_each_entry_safe(tx, tx_tmp, &peer_ni->ibp_tx_queue, tx_list) {
 			if (ktime_compare(ktime_get(), tx->tx_deadline) >= 0) {
@@ -3499,7 +3499,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		if (timeout <= 0) {
 			const int n = 4;
 			const int p = 1;
-			int chunk = kiblnd_data.kib_peer_hash_size;
+			int chunk = HASH_SIZE(kiblnd_data.kib_peers);
 			unsigned int lnd_timeout;
 
 			spin_unlock_irqrestore(lock, flags);
@@ -3524,7 +3524,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			for (i = 0; i < chunk; i++) {
 				kiblnd_check_conns(peer_index);
 				peer_index = (peer_index + 1) %
-					     kiblnd_data.kib_peer_hash_size;
+					HASH_SIZE(kiblnd_data.kib_peers);
 			}
 
 			deadline += p * HZ;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2021-04-05  0:53 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-05  0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 02/41] lustre: llite: make readahead aware of hints James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 03/41] lustre: lov: avoid NULL dereference in cleanup James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 04/41] lustre: llite: quiet spurious ioctl warning James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 05/41] lustre: ptlrpc: do not output error when imp_sec is freed James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 06/41] lustre: update version to 2.14.0 James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 07/41] lnet: UDSP storage and marshalled structs James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 08/41] lnet: foundation patch for selection mod James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 09/41] lnet: Preferred gateway selection James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 10/41] lnet: Select NI/peer NI with highest prio James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 11/41] lnet: select best peer and local net James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 12/41] lnet: UDSP handling James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 13/41] lnet: Apply UDSP on local and remote NIs James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 14/41] lnet: Add the kernel level Marshalling API James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 15/41] lnet: Add the kernel level De-Marshalling API James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 16/41] lnet: Add the ioctl handler for "add policy" James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 17/41] lnet: ioctl handler for "delete policy" James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 18/41] lnet: ioctl handler for get policy info James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 19/41] lustre: update version to 2.14.50 James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 20/41] lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 21/41] lustre: sec: file ioctls to handle encryption policies James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 22/41] lustre: obdclass: try to skip corrupted llog records James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 23/41] lustre: lov: fix layout generation inc for mirror split James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 24/41] lnet: modify assertion in lnet_post_send_locked James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 25/41] lustre: lov: fixes bitfield in lod qos code James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 26/41] lustre: lov: grant deadlock if same OSC in two components James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 27/41] lustre: change EWOULDBLOCK to EAGAIN James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 28/41] lsutre: ldlm: return error from ldlm_namespace_new() James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 29/41] lustre: llite: remove unused ll_teardown_mmaps() James Simmons
2021-04-05  0:50 ` [lustre-devel] [PATCH 30/41] lustre: lov: style cleanups in lov_set_osc_active() James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 31/41] lustre: change various operations structs to const James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 32/41] lustre: mark strings in char arrays as const James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 33/41] lustre: convert snprintf to scnprintf as appropriate James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 34/41] lustre: remove non-static 'inline' markings James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 35/41] lustre: llite: use is_root_inode() James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 36/41] lnet: libcfs: discard cfs_firststr James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 37/41] lnet: place wire protocol data int own headers James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 38/41] lnet: libcfs: use wait_event_timeout() in tracefiled() James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 39/41] lnet: use init_wait() rather than init_waitqueue_entry() James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 40/41] lnet: discard LNET_MD_PHYS James Simmons
2021-04-05  0:51 ` [lustre-devel] [PATCH 41/41] lnet: o2iblnd: convert peers hash table to hashtable.h James Simmons

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox