All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022
@ 2022-08-04  1:37 James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 01/32] lustre: mdc: Remove entry from list before freeing James Simmons
                   ` (31 more replies)
  0 siblings, 32 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Sync to latest OpenSFS tree except for crush2 support which exposed
several hidden bugs. Will need to resolve the issue. Also need to
see how Lustre's handling of fscrypt not fully implemented for 2.14.

Alexey Lyashkov (1):
  lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU.

Andreas Dilger (1):
  lustre: llite: add projid to debug logs

Chris Horn (4):
  lnet: socklnd: Duplicate ksock_conn_cb
  lnet: Ensure round robin across nets
  lnet: Define KFILND network type
  lnet: Adjust niov checks for large MD

Cyril Bordage (1):
  lnet: o2iblnd: add debug messages for IB

Etienne AUJAMES (1):
  lustre: llog: Add LLOG_SKIP_PLAIN to skip llog plain

Gian-Carlo DeFazio (1):
  lnet: asym route inconsistency warning

Hongchao Zhang (1):
  lustre: quota: skip non-exist or inact tgt for lfs_quota

James Simmons (1):
  lustre: ec: code to add support for M to N parity

John L. Hammond (4):
  lustre: echo: remove client operations from echo objects
  lustre: clio: remove cl_page_export() and cl_page_is_vmlocked()
  lustre: clio: remove cpo_own and cpo_disown
  lustre: clio: remove cpo_assume, cpo_unassume, cpo_fini

Lai Siyao (5):
  lustre: llite: enforce ROOT default on subdir mount
  lustre: lmv: support striped LMVs
  lustre: mdc: pack default LMV in open reply
  lustre: llite: use max default EA size to get default LMV
  lustre: llite: pass dmv inherit depth instead of dir depth

Mikhail Pershin (1):
  lustre: client: able to cleanup devices manually

Mr NeilBrown (3):
  lnet: discard some peer_ni lookup functions
  lnet: change lnet_*_peer_ni to take struct lnet_nid
  lnet: libcfs: debugfs file_operation should have an owner

Oleg Drokin (1):
  lustre: mdc: Remove entry from list before freeing

Patrick Farrell (2):
  lustre: flr: Don't assume RDONLY implies SOM
  lustre: ldlm: Prioritize blocking callbacks

Qian Yingjin (2):
  lustre: som: disabling xattr cache for LSOM on client
  lustre: llite: dont restart directIO with IOCB_NOWAIT

Sebastien Buisson (2):
  lustre: enc: enc-unaware clients get ENOKEY if file not found
  lustre: sec: handle read-only flag

Serguei Smirnov (1):
  lnet: o2iblnd: debug message is missing a newline

 fs/lustre/Makefile                      |   2 +-
 fs/lustre/ec/Makefile                   |   4 +
 fs/lustre/ec/ec_base.c                  | 352 +++++++++++++++++++++
 fs/lustre/include/cl_object.h           |  84 +----
 fs/lustre/include/erasure_code.h        | 142 +++++++++
 fs/lustre/include/lustre_lmv.h          |  29 +-
 fs/lustre/include/lustre_log.h          |  18 +-
 fs/lustre/include/lustre_net.h          |   4 +-
 fs/lustre/include/obd_support.h         |   8 +-
 fs/lustre/ldlm/ldlm_lockd.c             |  39 ++-
 fs/lustre/llite/crypto.c                |  35 ++-
 fs/lustre/llite/dir.c                   |  98 +++---
 fs/lustre/llite/file.c                  |  10 +-
 fs/lustre/llite/llite_internal.h        |   7 +
 fs/lustre/llite/llite_lib.c             | 101 +++++-
 fs/lustre/llite/namei.c                 |   4 +-
 fs/lustre/llite/rw.c                    |  53 ++--
 fs/lustre/llite/rw26.c                  |   6 +-
 fs/lustre/llite/vvp_dev.c               |   3 +-
 fs/lustre/llite/vvp_internal.h          |   3 -
 fs/lustre/llite/vvp_page.c              | 188 +----------
 fs/lustre/llite/xattr.c                 |   3 +-
 fs/lustre/llite/xattr_cache.c           |   4 +
 fs/lustre/lmv/lmv_obd.c                 |  29 +-
 fs/lustre/lov/lov_io.c                  |   7 -
 fs/lustre/lov/lov_obd.c                 |  21 +-
 fs/lustre/lov/lov_page.c                |   2 +-
 fs/lustre/mdc/mdc_changelog.c           |   8 +-
 fs/lustre/mdc/mdc_lib.c                 |   1 +
 fs/lustre/mdc/mdc_locks.c               |   2 +
 fs/lustre/mdc/mdc_request.c             |   7 +-
 fs/lustre/obdclass/cl_io.c              |  15 +-
 fs/lustre/obdclass/cl_page.c            | 264 ++++++++--------
 fs/lustre/obdecho/echo_client.c         | 542 +-------------------------------
 fs/lustre/osc/osc_lock.c                |   2 +-
 fs/lustre/osc/osc_request.c             |   3 +
 fs/lustre/ptlrpc/import.c               |   4 +-
 fs/lustre/ptlrpc/layout.c               |   1 +
 fs/lustre/ptlrpc/niobuf.c               |   3 +-
 fs/lustre/ptlrpc/pers.c                 |   3 +
 fs/lustre/ptlrpc/wiretest.c             |   2 +
 include/linux/libcfs/libcfs.h           |   4 +-
 include/linux/lnet/lib-lnet.h           |  12 +-
 include/linux/lnet/lib-types.h          |  14 +-
 include/uapi/linux/lnet/lnet-types.h    |   2 +
 include/uapi/linux/lnet/nidstr.h        |   1 +
 include/uapi/linux/lustre/lustre_user.h |   5 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |   2 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |  23 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c     |  50 ++-
 net/lnet/klnds/socklnd/socklnd.c        |  19 +-
 net/lnet/klnds/socklnd/socklnd_cb.c     |   5 +-
 net/lnet/libcfs/module.c                |  43 ++-
 net/lnet/lnet/api-ni.c                  |  40 ++-
 net/lnet/lnet/lib-md.c                  |   3 +
 net/lnet/lnet/lib-move.c                | 114 ++++---
 net/lnet/lnet/module.c                  |   1 +
 net/lnet/lnet/nidstrings.c              |  10 +
 net/lnet/lnet/peer.c                    | 237 ++++++--------
 net/lnet/lnet/router.c                  |  21 +-
 net/lnet/lnet/router_proc.c             |  10 +-
 net/lnet/lnet/udsp.c                    |   8 +-
 62 files changed, 1399 insertions(+), 1338 deletions(-)
 create mode 100644 fs/lustre/ec/Makefile
 create mode 100644 fs/lustre/ec/ec_base.c
 create mode 100644 fs/lustre/include/erasure_code.h

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 01/32] lustre: mdc: Remove entry from list before freeing
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 02/32] lustre: flr: Don't assume RDONLY implies SOM James Simmons
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

mdc_changelog_cdev_init forgot to remove entries from list if
chardev allocation failed

Fixes: dcedf3009a71 ("lustre: changelog: support large number of MDT")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15901
Lustre-commit: 441ec2296a0938dd3 ("LU-15901 mdc: Remove entry from list before freeing")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47480
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_changelog.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c
index d366720..36d7fdd 100644
--- a/fs/lustre/mdc/mdc_changelog.c
+++ b/fs/lustre/mdc/mdc_changelog.c
@@ -837,7 +837,7 @@ int mdc_changelog_cdev_init(struct obd_device *obd)
 
 	rc = chlg_minor_alloc(&minor);
 	if (rc)
-		goto out_unlock;
+		goto out_listrm;
 
 	device_initialize(&entry->ced_device);
 	entry->ced_device.devt = MKDEV(MAJOR(mdc_changelog_dev), minor);
@@ -866,6 +866,7 @@ int mdc_changelog_cdev_init(struct obd_device *obd)
 out_minor:
 	chlg_minor_free(minor);
 
+out_listrm:
 	list_del_init(&obd->u.cli.cl_chg_dev_linkage);
 	list_del(&entry->ced_link);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 02/32] lustre: flr: Don't assume RDONLY implies SOM
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 01/32] lustre: mdc: Remove entry from list before freeing James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 03/32] lustre: echo: remove client operations from echo objects James Simmons
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

In lov_io_slice_mirror_init, the client code assumes that
the LCM_FL_RDONLY flag in the layout implies SOM and skips
glimpse if it sees one.  The RDONLY flag means the mirrors
are in sync, which has historically implied SOM is valid.

To start with, using LCM_FL_RDONLY to imply SOM is sort of
a layering violation.  SOM is only communicated from the
MDS when it is valid, and the client already skips glimpse
in that case, so this duplicates functionality from the
higher layers.

More seriously, patch:
"LU-14526 flr: mirror split downgrade SOM"
(https://review.whamcloud.com/43168/)
Made it possible to have LCM_FL_RDONLY but not strict SOM,
so this assumption is no longer correct.

The fix is to not look at LCM_FL_RDONLY when deciding
whether to glimpse a file for size.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15609
Lustre-commit: 250108ad754cfa932 ("LU-15609 flr: Don't assume RDONLY implies SOM")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46666
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h | 7 +++----
 fs/lustre/lov/lov_io.c          | 7 -------
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 0732fe9a..b6c8a72 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -515,10 +515,9 @@
 #define OBD_FAIL_UNKNOWN_LMV_STRIPE			0x1901
 
 /* FLR */
-#define OBD_FAIL_FLR_GLIMPSE_IMMUTABLE			0x1A00
-#define OBD_FAIL_FLR_LV_DELAY			0x1A01
-#define OBD_FAIL_FLR_LV_INC			0x1A02
-#define OBD_FAIL_FLR_RANDOM_PICK_MIRROR	0x1A03
+#define OBD_FAIL_FLR_LV_DELAY				0x1A01
+#define OBD_FAIL_FLR_LV_INC				0x1A02
+#define OBD_FAIL_FLR_RANDOM_PICK_MIRROR			0x1A03
 
 /* LNet is allocated failure locations 0xe000 to 0xffff */
 /* Assign references to moved code to reduce code changes */
diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index b535092..32f028b 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -540,13 +540,6 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj,
 	case CIT_GLIMPSE:
 		lio->lis_pos = 0;
 		lio->lis_endpos = OBD_OBJECT_EOF;
-
-		if (lov_flr_state(obj) == LCM_FL_RDONLY &&
-		    !OBD_FAIL_CHECK(OBD_FAIL_FLR_GLIMPSE_IMMUTABLE)) {
-			/* SoM is accurate, no need glimpse */
-			result = 1;
-			goto out;
-		}
 		break;
 
 	case CIT_MISC:
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 03/32] lustre: echo: remove client operations from echo objects
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 01/32] lustre: mdc: Remove entry from list before freeing James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 02/32] lustre: flr: Don't assume RDONLY implies SOM James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 04/32] lustre: clio: remove cl_page_export() and cl_page_is_vmlocked() James Simmons
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

Remove the client (io, page, lock) operations from echo_client
objects. This will facilitate the simplification of CLIO.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10994
Lustre-commit: 6060ee55b194e37e8 ("LU-10994 echo: remove client operations from echo objects")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47240
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdecho/echo_client.c | 542 +---------------------------------------
 1 file changed, 8 insertions(+), 534 deletions(-)

diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index 4cc046a..f25ea41 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -70,7 +70,6 @@ struct echo_object {
 	struct echo_device	       *eo_dev;
 	struct list_head		eo_obj_chain;
 	struct lov_oinfo	       *eo_oinfo;
-	atomic_t			eo_npages;
 	int				eo_deleted;
 };
 
@@ -79,19 +78,6 @@ struct echo_object_conf {
 	struct lov_oinfo	      **eoc_oinfo;
 };
 
-struct echo_page {
-	struct cl_page_slice		ep_cl;
-	unsigned long			ep_lock;
-};
-
-struct echo_lock {
-	struct cl_lock_slice		el_cl;
-	struct list_head		el_chain;
-	struct echo_object	       *el_object;
-	u64				el_cookie;
-	atomic_t			el_refcount;
-};
-
 static int echo_client_setup(const struct lu_env *env,
 			     struct obd_device *obd,
 			     struct lustre_cfg *lcfg);
@@ -100,48 +86,34 @@ static int echo_client_setup(const struct lu_env *env,
 /** \defgroup echo_helpers Helper functions
  * @{
  */
-static inline struct echo_device *cl2echo_dev(const struct cl_device *dev)
+static struct echo_device *cl2echo_dev(const struct cl_device *dev)
 {
 	return container_of_safe(dev, struct echo_device, ed_cl);
 }
 
-static inline struct cl_device *echo_dev2cl(struct echo_device *d)
+static struct cl_device *echo_dev2cl(struct echo_device *d)
 {
 	return &d->ed_cl;
 }
 
-static inline struct echo_device *obd2echo_dev(const struct obd_device *obd)
+static struct echo_device *obd2echo_dev(const struct obd_device *obd)
 {
 	return cl2echo_dev(lu2cl_dev(obd->obd_lu_dev));
 }
 
-static inline struct cl_object *echo_obj2cl(struct echo_object *eco)
+static struct cl_object *echo_obj2cl(struct echo_object *eco)
 {
 	return &eco->eo_cl;
 }
 
-static inline struct echo_object *cl2echo_obj(const struct cl_object *o)
+static struct echo_object *cl2echo_obj(const struct cl_object *o)
 {
 	return container_of(o, struct echo_object, eo_cl);
 }
 
-static inline struct echo_page *cl2echo_page(const struct cl_page_slice *s)
-{
-	return container_of(s, struct echo_page, ep_cl);
-}
-
-static inline struct echo_lock *cl2echo_lock(const struct cl_lock_slice *s)
-{
-	return container_of(s, struct echo_lock, el_cl);
-}
-
-static inline struct cl_lock *echo_lock2cl(const struct echo_lock *ecl)
-{
-	return ecl->el_cl.cls_lock;
-}
-
 static struct lu_context_key echo_thread_key;
-static inline struct echo_thread_info *echo_env_info(const struct lu_env *env)
+
+static struct echo_thread_info *echo_env_info(const struct lu_env *env)
 {
 	struct echo_thread_info *info;
 
@@ -158,16 +130,10 @@ struct echo_object_conf *cl2echo_conf(const struct cl_object_conf *c)
 
 /** @} echo_helpers */
 static int cl_echo_object_put(struct echo_object *eco);
-static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
-			      struct page **pages, int npages, int async);
 
 struct echo_thread_info {
 	struct echo_object_conf		eti_conf;
 	struct lustre_md		eti_md;
-
-	struct cl_2queue		eti_queue;
-	struct cl_io			eti_io;
-	struct cl_lock			eti_lock;
 	struct lu_fid			eti_fid;
 	struct lu_fid			eti_fid2;
 };
@@ -177,18 +143,12 @@ struct echo_session_info {
 	unsigned long			dummy;
 };
 
-static struct kmem_cache *echo_lock_kmem;
 static struct kmem_cache *echo_object_kmem;
 static struct kmem_cache *echo_thread_kmem;
 static struct kmem_cache *echo_session_kmem;
 
 static struct lu_kmem_descr echo_caches[] = {
 	{
-		.ckd_cache = &echo_lock_kmem,
-		.ckd_name  = "echo_lock_kmem",
-		.ckd_size  = sizeof(struct echo_lock)
-	},
-	{
 		.ckd_cache = &echo_object_kmem,
 		.ckd_name  = "echo_object_kmem",
 		.ckd_size  = sizeof(struct echo_object)
@@ -208,191 +168,6 @@ struct echo_session_info {
 	}
 };
 
-/** \defgroup echo_page Page operations
- *
- * Echo page operations.
- *
- * @{
- */
-static int echo_page_own(const struct lu_env *env,
-			 const struct cl_page_slice *slice,
-			 struct cl_io *io, int nonblock)
-{
-	struct echo_page *ep = cl2echo_page(slice);
-
-	if (nonblock) {
-		if (test_and_set_bit(0, &ep->ep_lock))
-			return -EAGAIN;
-	} else {
-		while (test_and_set_bit(0, &ep->ep_lock))
-			wait_on_bit(&ep->ep_lock, 0, TASK_UNINTERRUPTIBLE);
-	}
-	return 0;
-}
-
-static void echo_page_disown(const struct lu_env *env,
-			     const struct cl_page_slice *slice,
-			     struct cl_io *io)
-{
-	struct echo_page *ep = cl2echo_page(slice);
-
-	LASSERT(test_bit(0, &ep->ep_lock));
-	clear_and_wake_up_bit(0, &ep->ep_lock);
-}
-
-static void echo_page_discard(const struct lu_env *env,
-			      const struct cl_page_slice *slice,
-			      struct cl_io *unused)
-{
-	cl_page_delete(env, slice->cpl_page);
-}
-
-static int echo_page_is_vmlocked(const struct lu_env *env,
-				 const struct cl_page_slice *slice)
-{
-	if (test_bit(0, &cl2echo_page(slice)->ep_lock))
-		return -EBUSY;
-	return -ENODATA;
-}
-
-static void echo_page_completion(const struct lu_env *env,
-				 const struct cl_page_slice *slice,
-				 int ioret)
-{
-	LASSERT(slice->cpl_page->cp_sync_io);
-}
-
-static void echo_page_fini(const struct lu_env *env,
-			   struct cl_page_slice *slice,
-			   struct pagevec *pvec)
-{
-	struct echo_object *eco = cl2echo_obj(slice->cpl_obj);
-
-	atomic_dec(&eco->eo_npages);
-	put_page(slice->cpl_page->cp_vmpage);
-}
-
-static int echo_page_prep(const struct lu_env *env,
-			  const struct cl_page_slice *slice,
-			  struct cl_io *unused)
-{
-	return 0;
-}
-
-static int echo_page_print(const struct lu_env *env,
-			   const struct cl_page_slice *slice,
-			   void *cookie, lu_printer_t printer)
-{
-	struct echo_page *ep = cl2echo_page(slice);
-
-	(*printer)(env, cookie, LUSTRE_ECHO_CLIENT_NAME "-page@%p %d vm@%p\n",
-		   ep, test_bit(0, &ep->ep_lock),
-		   slice->cpl_page->cp_vmpage);
-	return 0;
-}
-
-static const struct cl_page_operations echo_page_ops = {
-	.cpo_own		= echo_page_own,
-	.cpo_disown		= echo_page_disown,
-	.cpo_discard		= echo_page_discard,
-	.cpo_fini		= echo_page_fini,
-	.cpo_print		= echo_page_print,
-	.cpo_is_vmlocked	= echo_page_is_vmlocked,
-	.io = {
-		[CRT_READ] = {
-			.cpo_prep	= echo_page_prep,
-			.cpo_completion	= echo_page_completion,
-		},
-		[CRT_WRITE] = {
-			.cpo_prep	= echo_page_prep,
-			.cpo_completion	= echo_page_completion,
-		}
-	}
-};
-
-/** @} echo_page */
-
-/** \defgroup echo_lock Locking
- *
- * echo lock operations
- *
- * @{
- */
-static void echo_lock_fini(const struct lu_env *env,
-			   struct cl_lock_slice *slice)
-{
-	struct echo_lock *ecl = cl2echo_lock(slice);
-
-	LASSERT(list_empty(&ecl->el_chain));
-	kmem_cache_free(echo_lock_kmem, ecl);
-}
-
-static const struct cl_lock_operations echo_lock_ops = {
-	.clo_fini		= echo_lock_fini,
-};
-
-/** @} echo_lock */
-
-/** \defgroup echo_cl_ops cl_object operations
- *
- * operations for cl_object
- *
- * @{
- */
-static int echo_page_init(const struct lu_env *env, struct cl_object *obj,
-			  struct cl_page *page, pgoff_t index)
-{
-	struct echo_page *ep = cl_object_page_slice(obj, page);
-	struct echo_object *eco = cl2echo_obj(obj);
-
-	get_page(page->cp_vmpage);
-	/*
-	 * ep_lock is similar to the lock_page() lock, and
-	 * cannot usefully be monitored by lockdep.
-	 * So just a bit in an "unsigned long" and use the
-	 * wait_on_bit() interface to wait for the bit to be clera.
-	 */
-	ep->ep_lock = 0;
-	cl_page_slice_add(page, &ep->ep_cl, obj, &echo_page_ops);
-	atomic_inc(&eco->eo_npages);
-	return 0;
-}
-
-static int echo_io_init(const struct lu_env *env, struct cl_object *obj,
-			struct cl_io *io)
-{
-	return 0;
-}
-
-static int echo_lock_init(const struct lu_env *env,
-			  struct cl_object *obj, struct cl_lock *lock,
-			  const struct cl_io *unused)
-{
-	struct echo_lock *el;
-
-	el = kmem_cache_zalloc(echo_lock_kmem, GFP_NOFS);
-	if (el) {
-		cl_lock_slice_add(lock, &el->el_cl, obj, &echo_lock_ops);
-		el->el_object = cl2echo_obj(obj);
-		INIT_LIST_HEAD(&el->el_chain);
-		atomic_set(&el->el_refcount, 0);
-	}
-	return !el ? -ENOMEM : 0;
-}
-
-static int echo_conf_set(const struct lu_env *env, struct cl_object *obj,
-			 const struct cl_object_conf *conf)
-{
-	return 0;
-}
-
-static const struct cl_object_operations echo_cl_obj_ops = {
-	.coo_page_init		= echo_page_init,
-	.coo_lock_init		= echo_lock_init,
-	.coo_io_init		= echo_io_init,
-	.coo_conf_set		= echo_conf_set
-};
-
 /** @} echo_cl_ops */
 
 /** \defgroup echo_lu_ops lu_object operations
@@ -434,8 +209,7 @@ static int echo_object_init(const struct lu_env *env, struct lu_object *obj,
 	*econf->eoc_oinfo = NULL;
 
 	eco->eo_dev = ed;
-	atomic_set(&eco->eo_npages, 0);
-	cl_object_page_init(lu2cl(obj), sizeof(struct echo_page));
+	cl_object_page_init(lu2cl(obj), 0);
 
 	spin_lock(&ec->ec_lock);
 	list_add_tail(&eco->eo_obj_chain, &ec->ec_objects);
@@ -455,8 +229,6 @@ static void echo_object_delete(const struct lu_env *env, struct lu_object *obj)
 
 	ec = eco->eo_dev->ed_ec;
 
-	LASSERT(atomic_read(&eco->eo_npages) == 0);
-
 	spin_lock(&ec->ec_lock);
 	list_del_init(&eco->eo_obj_chain);
 	spin_unlock(&ec->ec_lock);
@@ -527,7 +299,6 @@ static struct lu_object *echo_object_alloc(const struct lu_env *env,
 		lu_object_init(obj, &hdr->coh_lu, dev);
 		lu_object_add_top(&hdr->coh_lu, obj);
 
-		eco->eo_cl.co_ops = &echo_cl_obj_ops;
 		obj->lo_ops = &echo_lu_obj_ops;
 	}
 	return obj;
@@ -741,15 +512,6 @@ static struct lu_device *echo_device_fini(const struct lu_env *env,
 	return NULL;
 }
 
-static void echo_lock_release(const struct lu_env *env,
-			      struct echo_lock *ecl,
-			      int still_used)
-{
-	struct cl_lock *clk = echo_lock2cl(ecl);
-
-	cl_lock_release(env, clk);
-}
-
 static struct lu_device *echo_device_free(const struct lu_env *env,
 					  struct lu_device *d)
 {
@@ -934,193 +696,6 @@ static int cl_echo_object_put(struct echo_object *eco)
 	return 0;
 }
 
-static int __cl_echo_enqueue(struct lu_env *env, struct echo_object *eco,
-			     u64 start, u64 end, int mode,
-			     u64 *cookie, u32 enqflags)
-{
-	struct cl_io *io;
-	struct cl_lock *lck;
-	struct cl_object *obj;
-	struct cl_lock_descr *descr;
-	struct echo_thread_info *info;
-	int rc = -ENOMEM;
-
-	info = echo_env_info(env);
-	io = &info->eti_io;
-	lck = &info->eti_lock;
-	obj = echo_obj2cl(eco);
-
-	memset(lck, 0, sizeof(*lck));
-	descr = &lck->cll_descr;
-	descr->cld_obj = obj;
-	descr->cld_start = cl_index(obj, start);
-	descr->cld_end = cl_index(obj, end);
-	descr->cld_mode = mode == LCK_PW ? CLM_WRITE : CLM_READ;
-	descr->cld_enq_flags = enqflags;
-	io->ci_obj = obj;
-
-	rc = cl_lock_request(env, io, lck);
-	if (rc == 0) {
-		struct echo_client_obd *ec = eco->eo_dev->ed_ec;
-		struct echo_lock *el;
-
-		el = cl2echo_lock(cl_lock_at(lck, &echo_device_type));
-		spin_lock(&ec->ec_lock);
-		if (list_empty(&el->el_chain)) {
-			list_add(&el->el_chain, &ec->ec_locks);
-			el->el_cookie = ++ec->ec_unique;
-		}
-		atomic_inc(&el->el_refcount);
-		*cookie = el->el_cookie;
-		spin_unlock(&ec->ec_lock);
-	}
-	return rc;
-}
-
-static int __cl_echo_cancel(struct lu_env *env, struct echo_device *ed,
-			    u64 cookie)
-{
-	struct echo_client_obd *ec = ed->ed_ec;
-	struct echo_lock *ecl = NULL;
-	int found = 0, still_used = 0;
-
-	spin_lock(&ec->ec_lock);
-	list_for_each_entry(ecl, &ec->ec_locks, el_chain) {
-		CDEBUG(D_INFO, "ecl: %p, cookie: %#llx\n", ecl, ecl->el_cookie);
-		found = (ecl->el_cookie == cookie);
-		if (found) {
-			if (atomic_dec_and_test(&ecl->el_refcount))
-				list_del_init(&ecl->el_chain);
-			else
-				still_used = 1;
-			break;
-		}
-	}
-	spin_unlock(&ec->ec_lock);
-
-	if (!found)
-		return -ENOENT;
-
-	echo_lock_release(env, ecl, still_used);
-	return 0;
-}
-
-static void echo_commit_callback(const struct lu_env *env, struct cl_io *io,
-				 struct pagevec *pvec)
-{
-	struct echo_thread_info *info;
-	struct cl_2queue *queue;
-	int i = 0;
-
-	info = echo_env_info(env);
-	LASSERT(io == &info->eti_io);
-
-	queue = &info->eti_queue;
-
-	for (i = 0; i < pagevec_count(pvec); i++) {
-		struct page *vmpage = pvec->pages[i];
-		struct cl_page *page = (struct cl_page *)vmpage->private;
-
-		cl_page_list_add(&queue->c2_qout, page, true);
-	}
-}
-
-static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
-			      struct page **pages, int npages, int async)
-{
-	struct lu_env *env;
-	struct echo_thread_info *info;
-	struct cl_object *obj = echo_obj2cl(eco);
-	struct echo_device *ed  = eco->eo_dev;
-	struct cl_2queue *queue;
-	struct cl_io *io;
-	struct cl_page *clp;
-	struct lustre_handle lh = { 0 };
-	size_t page_size = cl_page_size(obj);
-	u16 refcheck;
-	int rc;
-	int i;
-
-	LASSERT((offset & ~PAGE_MASK) == 0);
-	LASSERT(ed->ed_next);
-	env = cl_env_get(&refcheck);
-	if (IS_ERR(env))
-		return PTR_ERR(env);
-
-	info = echo_env_info(env);
-	io = &info->eti_io;
-	queue = &info->eti_queue;
-
-	cl_2queue_init(queue);
-
-	io->ci_ignore_layout = 1;
-	rc = cl_io_init(env, io, CIT_MISC, obj);
-	if (rc < 0)
-		goto out;
-	LASSERT(rc == 0);
-
-	rc = __cl_echo_enqueue(env, eco, offset,
-			       offset + npages * PAGE_SIZE - 1,
-			       rw == READ ? LCK_PR : LCK_PW, &lh.cookie,
-			       CEF_NEVER);
-	if (rc < 0)
-		goto error_lock;
-
-	for (i = 0; i < npages; i++) {
-		LASSERT(pages[i]);
-		clp = cl_page_find(env, obj, cl_index(obj, offset),
-				   pages[i], CPT_TRANSIENT);
-		if (IS_ERR(clp)) {
-			rc = PTR_ERR(clp);
-			break;
-		}
-		LASSERT(clp->cp_type == CPT_TRANSIENT);
-
-		rc = cl_page_own(env, io, clp);
-		if (rc) {
-			LASSERT(clp->cp_state == CPS_FREEING);
-			cl_page_put(env, clp);
-			break;
-		}
-		/*
-		 * Add a page to the incoming page list of 2-queue.
-		 */
-		cl_page_list_add(&queue->c2_qin, clp, true);
-
-		/* drop the reference count for cl_page_find, so that the page
-		 * will be freed in cl_2queue_fini.
-		 */
-		cl_page_put(env, clp);
-		cl_page_clip(env, clp, 0, page_size);
-
-		offset += page_size;
-	}
-
-	if (rc == 0) {
-		enum cl_req_type typ = rw == READ ? CRT_READ : CRT_WRITE;
-
-		async = async && (typ == CRT_WRITE);
-		if (async)
-			rc = cl_io_commit_async(env, io, &queue->c2_qin,
-						0, PAGE_SIZE,
-						echo_commit_callback);
-		else
-			rc = cl_io_submit_sync(env, io, typ, queue, 0);
-		CDEBUG(D_INFO, "echo_client %s write returns %d\n",
-		       async ? "async" : "sync", rc);
-	}
-
-	__cl_echo_cancel(env, ed, lh.cookie);
-error_lock:
-	cl_2queue_discard(env, io, queue);
-	cl_2queue_disown(env, io, queue);
-	cl_2queue_fini(env, queue);
-	cl_io_fini(env, io);
-out:
-	cl_env_put(env, &refcheck);
-	return rc;
-}
-
 /** @} echo_exports */
 
 static u64 last_object_id;
@@ -1264,101 +839,6 @@ static int echo_client_page_debug_check(struct page *page, u64 id,
 	return rc;
 }
 
-static int echo_client_kbrw(struct echo_device *ed, int rw, struct obdo *oa,
-			    struct echo_object *eco, u64 offset,
-			    u64 count, int async)
-{
-	u32 npages;
-	struct brw_page	*pga;
-	struct brw_page	*pgp;
-	struct page **pages;
-	u64 off;
-	int i;
-	int rc;
-	int verify;
-	gfp_t gfp_mask;
-	int brw_flags = 0;
-
-	verify = (ostid_id(&oa->o_oi) != ECHO_PERSISTENT_OBJID &&
-		  (oa->o_valid & OBD_MD_FLFLAGS) != 0 &&
-		  (oa->o_flags & OBD_FL_DEBUG_CHECK) != 0);
-
-	gfp_mask = ((ostid_id(&oa->o_oi) & 2) == 0) ? GFP_KERNEL : GFP_HIGHUSER;
-
-	LASSERT(rw == OBD_BRW_WRITE || rw == OBD_BRW_READ);
-
-	if (count <= 0 ||
-	    (count & (~PAGE_MASK)) != 0)
-		return -EINVAL;
-
-	/* XXX think again with misaligned I/O */
-	npages = count >> PAGE_SHIFT;
-
-	if (rw == OBD_BRW_WRITE)
-		brw_flags = OBD_BRW_ASYNC;
-
-	pga = kcalloc(npages, sizeof(*pga), GFP_NOFS);
-	if (!pga)
-		return -ENOMEM;
-
-	pages = kvmalloc_array(npages, sizeof(*pages),
-			       GFP_KERNEL | __GFP_ZERO);
-	if (!pages) {
-		kfree(pga);
-		return -ENOMEM;
-	}
-
-	for (i = 0, pgp = pga, off = offset;
-	     i < npages;
-	     i++, pgp++, off += PAGE_SIZE) {
-		LASSERT(!pgp->pg);      /* for cleanup */
-
-		rc = -ENOMEM;
-		pgp->pg = alloc_page(gfp_mask);
-		if (!pgp->pg)
-			goto out;
-
-		/* set mapping so page is not considered encrypted */
-		pgp->pg->mapping = ECHO_MAPPING_UNENCRYPTED;
-		pages[i] = pgp->pg;
-		pgp->count = PAGE_SIZE;
-		pgp->off = off;
-		pgp->flag = brw_flags;
-
-		if (verify)
-			echo_client_page_debug_setup(pgp->pg, rw,
-						     ostid_id(&oa->o_oi), off,
-						     pgp->count);
-	}
-
-	/* brw mode can only be used at client */
-	LASSERT(ed->ed_next);
-	rc = cl_echo_object_brw(eco, rw, offset, pages, npages, async);
-
-out:
-	if (rc != 0 || rw != OBD_BRW_READ)
-		verify = 0;
-
-	for (i = 0, pgp = pga; i < npages; i++, pgp++) {
-		if (!pgp->pg)
-			continue;
-
-		if (verify) {
-			int vrc;
-
-			vrc = echo_client_page_debug_check(pgp->pg,
-							   ostid_id(&oa->o_oi),
-							   pgp->off, pgp->count);
-			if (vrc != 0 && rc == 0)
-				rc = vrc;
-		}
-		__free_page(pgp->pg);
-	}
-	kfree(pga);
-	kvfree(pages);
-	return rc;
-}
-
 static int echo_client_prep_commit(const struct lu_env *env,
 				   struct obd_export *exp, int rw,
 				   struct obdo *oa, struct echo_object *eco,
@@ -1491,12 +971,6 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw,
 		data->ioc_plen1 = PTLRPC_MAX_BRW_SIZE;
 
 	switch (test_mode) {
-	case 1:
-		/* fall through */
-	case 2:
-		rc = echo_client_kbrw(ed, rw, oa, eco, data->ioc_offset,
-				      data->ioc_count, async);
-		break;
 	case 3:
 		rc = echo_client_prep_commit(env, ec->ec_exp, rw, oa, eco,
 					     data->ioc_offset, data->ioc_count,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 04/32] lustre: clio: remove cl_page_export() and cl_page_is_vmlocked()
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (2 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 03/32] lustre: echo: remove client operations from echo objects James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 05/32] lustre: clio: remove cpo_own and cpo_disown James Simmons
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

Remove cl_page_export() and cl_page_is_vmlocked(), replacing them with
direct calls to PageSetUptodate() and PageLoecked().

WC-bug-id: https://jira.whamcloud.com/browse/LU-10994
Lustre-commit: 3d52a7c5753e80e78 ("LU-10994 clio: remove cl_page_export() and cl_page_is_vmlocked()")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47241
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 22 ---------------------
 fs/lustre/llite/file.c        |  2 +-
 fs/lustre/llite/rw.c          |  4 ++--
 fs/lustre/llite/vvp_page.c    | 31 +----------------------------
 fs/lustre/lov/lov_page.c      |  2 +-
 fs/lustre/obdclass/cl_io.c    |  1 -
 fs/lustre/obdclass/cl_page.c  | 46 -------------------------------------------
 fs/lustre/osc/osc_lock.c      |  2 +-
 8 files changed, 6 insertions(+), 104 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 5be89d6..db5f610 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -872,26 +872,6 @@ struct cl_page_operations {
 			     const struct cl_page_slice *slice,
 			     struct cl_io *io);
 	/**
-	 * Announces whether the page contains valid data or not by @uptodate.
-	 *
-	 * \see cl_page_export()
-	 * \see vvp_page_export()
-	 */
-	void  (*cpo_export)(const struct lu_env *env,
-			    const struct cl_page_slice *slice, int uptodate);
-	/**
-	 * Checks whether underlying VM page is locked (in the suitable
-	 * sense). Used for assertions.
-	 *
-	 * Return:	-EBUSY means page is protected by a lock of a given
-	 *		mode;
-	 *		-ENODATA when page is not protected by a lock;
-	 *		0 this layer cannot decide. (Should never happen.)
-	 */
-	int (*cpo_is_vmlocked)(const struct lu_env *env,
-			       const struct cl_page_slice *slice);
-
-	/**
 	 * Update file attributes when all we have is this page.  Used for tiny
 	 * writes to update attributes when we don't have a full cl_io.
 	 */
@@ -2346,10 +2326,8 @@ int cl_page_flush(const struct lu_env *env, struct cl_io *io,
 void cl_page_discard(const struct lu_env *env, struct cl_io *io,
 		     struct cl_page *pg);
 void cl_page_delete(const struct lu_env *env, struct cl_page *pg);
-int cl_page_is_vmlocked(const struct lu_env *env, const struct cl_page *pg);
 void cl_page_touch(const struct lu_env *env, const struct cl_page *pg,
 		   size_t to);
-void cl_page_export(const struct lu_env *env, struct cl_page *pg, int uptodate);
 loff_t cl_offset(const struct cl_object *obj, pgoff_t idx);
 pgoff_t cl_index(const struct cl_object *obj, loff_t offset);
 size_t cl_page_size(const struct cl_object *obj);
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index efe117d..0e71b3a 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -587,7 +587,7 @@ void ll_dom_finish_open(struct inode *inode, struct ptlrpc_request *req)
 			put_page(vmpage);
 			break;
 		}
-		cl_page_export(env, page, 1);
+		SetPageUptodate(vmpage);
 		cl_page_put(env, page);
 		unlock_page(vmpage);
 		put_page(vmpage);
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index c807217..478ef1b 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1664,7 +1664,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	cl_2queue_init(queue);
 	if (uptodate) {
 		vpg->vpg_ra_used = 1;
-		cl_page_export(env, page, 1);
+		SetPageUptodate(page->cp_vmpage);
 		cl_page_disown(env, io, page);
 	} else {
 		anchor = &vvp_env_info(env)->vti_anchor;
@@ -1908,7 +1908,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		/* export the page and skip io stack */
 		if (result == 0) {
 			vpg->vpg_ra_used = 1;
-			cl_page_export(env, page, 1);
+			SetPageUptodate(vmpage);
 		} else {
 			ll_ra_stats_inc_sbi(sbi, RA_STAT_FAILED_FAST_READ);
 		}
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 7744e9b..82ce5ab 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -170,26 +170,6 @@ static void vvp_page_delete(const struct lu_env *env,
 	 */
 }
 
-static void vvp_page_export(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    int uptodate)
-{
-	struct page *vmpage = cl2vm_page(slice);
-
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-	if (uptodate)
-		SetPageUptodate(vmpage);
-	else
-		ClearPageUptodate(vmpage);
-}
-
-static int vvp_page_is_vmlocked(const struct lu_env *env,
-				const struct cl_page_slice *slice)
-{
-	return PageLocked(cl2vm_page(slice)) ? -EBUSY : -ENODATA;
-}
-
 static int vvp_page_prep_read(const struct lu_env *env,
 			      const struct cl_page_slice *slice,
 			      struct cl_io *unused)
@@ -260,7 +240,7 @@ static void vvp_page_completion_read(const struct lu_env *env,
 
 	if (ioret == 0)  {
 		if (!vpg->vpg_defer_uptodate)
-			cl_page_export(env, page, 1);
+			SetPageUptodate(vmpage);
 	} else if (vpg->vpg_defer_uptodate) {
 		vpg->vpg_defer_uptodate = 0;
 		if (ioret == -EAGAIN) {
@@ -382,8 +362,6 @@ static int vvp_page_fail(const struct lu_env *env,
 	.cpo_disown		= vvp_page_disown,
 	.cpo_discard		= vvp_page_discard,
 	.cpo_delete		= vvp_page_delete,
-	.cpo_export		= vvp_page_export,
-	.cpo_is_vmlocked	= vvp_page_is_vmlocked,
 	.cpo_fini		= vvp_page_fini,
 	.cpo_print		= vvp_page_print,
 	.io = {
@@ -412,15 +390,8 @@ static void vvp_transient_page_discard(const struct lu_env *env,
 	cl_page_delete(env, page);
 }
 
-static int vvp_transient_page_is_vmlocked(const struct lu_env *env,
-					  const struct cl_page_slice *slice)
-{
-	return -EBUSY;
-}
-
 static const struct cl_page_operations vvp_transient_page_ops = {
 	.cpo_discard		= vvp_transient_page_discard,
-	.cpo_is_vmlocked	= vvp_transient_page_is_vmlocked,
 	.cpo_print		= vvp_page_print,
 };
 
diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c
index bd6ba79..a22b71f 100644
--- a/fs/lustre/lov/lov_page.c
+++ b/fs/lustre/lov/lov_page.c
@@ -144,7 +144,7 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj,
 	addr = kmap(page->cp_vmpage);
 	memset(addr, 0, cl_page_size(obj));
 	kunmap(page->cp_vmpage);
-	cl_page_export(env, page, 1);
+	SetPageUptodate(page->cp_vmpage);
 	return 0;
 }
 
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 6dd029a..c97ac0f 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -849,7 +849,6 @@ void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
 		      struct cl_page *page)
 {
 	LASSERT(plist->pl_nr > 0);
-	LASSERT(cl_page_is_vmlocked(env, page));
 
 	list_del_init(&page->cp_batch);
 	--plist->pl_nr;
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 9326743..b5b5448 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -760,52 +760,6 @@ void cl_page_delete(const struct lu_env *env, struct cl_page *pg)
 }
 EXPORT_SYMBOL(cl_page_delete);
 
-/**
- * Marks page up-to-date.
- *
- * Call cl_page_operations::cpo_export() through all layers top-to-bottom. The
- * layer responsible for VM interaction has to mark/clear page as up-to-date
- * by the @uptodate argument.
- *
- * \see cl_page_operations::cpo_export()
- */
-void cl_page_export(const struct lu_env *env, struct cl_page *cl_page,
-		    int uptodate)
-{
-	const struct cl_page_slice *slice;
-	int i;
-
-	cl_page_slice_for_each(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_export)
-			(*slice->cpl_ops->cpo_export)(env, slice, uptodate);
-	}
-}
-EXPORT_SYMBOL(cl_page_export);
-
-/**
- * Returns true, if @cl_page is VM locked in a suitable sense by the calling
- * thread.
- */
-int cl_page_is_vmlocked(const struct lu_env *env,
-			const struct cl_page *cl_page)
-{
-	const struct cl_page_slice *slice;
-	int result;
-
-	slice = cl_page_slice_get(cl_page, 0);
-	PASSERT(env, cl_page, slice->cpl_ops->cpo_is_vmlocked);
-	/*
-	 * Call ->cpo_is_vmlocked() directly instead of going through
-	 * CL_PAGE_INVOKE(), because cl_page_is_vmlocked() is used by
-	 * cl_page_invariant().
-	 */
-	result = slice->cpl_ops->cpo_is_vmlocked(env, slice);
-	PASSERT(env, cl_page, result == -EBUSY || result == -ENODATA);
-
-	return result == -EBUSY;
-}
-EXPORT_SYMBOL(cl_page_is_vmlocked);
-
 void cl_page_touch(const struct lu_env *env,
 		   const struct cl_page *cl_page, size_t to)
 {
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index eb3cb58..c8f8502 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -644,7 +644,7 @@ static bool weigh_cb(const struct lu_env *env, struct cl_io *io,
 		struct osc_page *ops = pvec[i];
 		struct cl_page *page = ops->ops_cl.cpl_page;
 
-		if (cl_page_is_vmlocked(env, page) ||
+		if (PageLocked(page->cp_vmpage) ||
 		    PageDirty(page->cp_vmpage) ||
 		    PageWriteback(page->cp_vmpage))
 			return false;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 05/32] lustre: clio: remove cpo_own and cpo_disown
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (3 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 04/32] lustre: clio: remove cl_page_export() and cl_page_is_vmlocked() James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 06/32] lustre: clio: remove cpo_assume, cpo_unassume, cpo_fini James Simmons
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

Remove the cpo_own and cpo_disown methods from struct
cl_page_operations. These methods were only implemented by the vvp
layer so they can be inlined into cl_page_own0() and
cl_page_disown(). Move most of vvp_page_discard() and all of
vvp_transient_page_discard() into cl_page_discard().

WC-bug-id: https://jira.whamcloud.com/browse/LU-10994
Lustre-commit: 81c6dc423ce4c62a6 ("LU-10994 clio: remove cpo_own and cpo_disown")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47372
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h  | 33 ++++----------
 fs/lustre/llite/llite_lib.c    |  2 +-
 fs/lustre/llite/rw.c           | 49 +++++++++++----------
 fs/lustre/llite/rw26.c         |  6 +--
 fs/lustre/llite/vvp_dev.c      |  3 +-
 fs/lustre/llite/vvp_internal.h |  3 --
 fs/lustre/llite/vvp_page.c     | 86 ++++++------------------------------
 fs/lustre/obdclass/cl_io.c     | 12 +++--
 fs/lustre/obdclass/cl_page.c   | 99 +++++++++++++++++++++++-------------------
 9 files changed, 112 insertions(+), 181 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index db5f610..4460ae1 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -764,6 +764,10 @@ struct cl_page {
 	 * creation.
 	 */
 	enum cl_page_type		 cp_type:CP_TYPE_BITS;
+	unsigned int			 cp_defer_uptodate:1,
+					 cp_ra_updated:1,
+					 cp_ra_used:1;
+
 	/* which slab kmem index this memory allocated from */
 	short int			 cp_kmem_index;
 
@@ -822,7 +826,7 @@ enum cl_req_type {
  *
  * Methods taking an @io argument are for the activity happening in the
  * context of given @io. Page is assumed to be owned by that io, except for
- * the obvious cases (like cl_page_operations::cpo_own()).
+ * the obvious cases.
  *
  * \see vvp_page_ops, lov_page_ops, osc_page_ops
  */
@@ -834,25 +838,6 @@ struct cl_page_operations {
 	 */
 
 	/**
-	 * Called when @io acquires this page into the exclusive
-	 * ownership. When this method returns, it is guaranteed that the is
-	 * not owned by other io, and no transfer is going on against
-	 * it. Optional.
-	 *
-	 * \see cl_page_own()
-	 * \see vvp_page_own(), lov_page_own()
-	 */
-	int  (*cpo_own)(const struct lu_env *env,
-			const struct cl_page_slice *slice,
-			struct cl_io *io, int nonblock);
-	/** Called when ownership it yielded. Optional.
-	 *
-	 * \see cl_page_disown()
-	 * \see vvp_page_disown()
-	 */
-	void (*cpo_disown)(const struct lu_env *env,
-			   const struct cl_page_slice *slice, struct cl_io *io);
-	/**
 	 * Called for a page that is already "owned" by @io from VM point of
 	 * view. Optional.
 	 *
@@ -2290,8 +2275,7 @@ void cl_page_unassume(const struct lu_env *env,
 		      struct cl_io *io, struct cl_page *pg);
 void cl_page_disown(const struct lu_env *env,
 		    struct cl_io *io, struct cl_page *page);
-void __cl_page_disown(const struct lu_env *env,
-		      struct cl_io *io, struct cl_page *pg);
+void __cl_page_disown(const struct lu_env *env, struct cl_page *pg);
 int cl_page_is_owned(const struct cl_page *pg, const struct cl_io *io);
 
 /** @} ownership */
@@ -2544,14 +2528,13 @@ void cl_page_list_splice(struct cl_page_list *list,
 void cl_page_list_del(const struct lu_env *env,
 		      struct cl_page_list *plist, struct cl_page *page);
 void cl_page_list_disown(const struct lu_env *env,
-			 struct cl_io *io, struct cl_page_list *plist);
+			 struct cl_page_list *plist);
 void cl_page_list_discard(const struct lu_env *env,
 			  struct cl_io *io, struct cl_page_list *plist);
 void cl_page_list_fini(const struct lu_env *env, struct cl_page_list *plist);
 
 void cl_2queue_init(struct cl_2queue *queue);
-void cl_2queue_disown(const struct lu_env *env, struct cl_io *io,
-		      struct cl_2queue *queue);
+void cl_2queue_disown(const struct lu_env *env, struct cl_2queue *queue);
 void cl_2queue_discard(const struct lu_env *env, struct cl_io *io,
 		       struct cl_2queue *queue);
 void cl_2queue_fini(const struct lu_env *env, struct cl_2queue *queue);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 6adbf10..b55a30f 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1955,7 +1955,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 queuefini2:
 		cl_2queue_discard(env, io, queue);
 queuefini1:
-		cl_2queue_disown(env, io, queue);
+		cl_2queue_disown(env, queue);
 		cl_2queue_fini(env, queue);
 	}
 
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 478ef1b..7c4b8ec 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -195,10 +195,9 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 	enum ra_stat which = _NR_RA_STAT; /* keep gcc happy */
 	struct cl_object *clob = io->ci_obj;
 	struct inode *inode = vvp_object_inode(clob);
-	const char *msg = NULL;
-	struct cl_page *page;
-	struct vvp_page *vpg;
 	struct page *vmpage = NULL;
+	const char *msg = NULL;
+	struct cl_page *cp;
 	int rc = 0;
 
 	switch (hint) {
@@ -233,34 +232,35 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 		goto out;
 	}
 
-	page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
-	if (IS_ERR(page)) {
+	cp = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
+	if (IS_ERR(cp)) {
 		which = RA_STAT_FAILED_GRAB_PAGE;
 		msg = "cl_page_find failed";
-		rc = PTR_ERR(page);
+		rc = PTR_ERR(cp);
 		goto out;
 	}
 
-	lu_ref_add(&page->cp_reference, "ra", current);
-	cl_page_assume(env, io, page);
-	vpg = cl2vvp_page(cl_object_page_slice(clob, page));
-	if (!vpg->vpg_defer_uptodate && !PageUptodate(vmpage)) {
+	lu_ref_add(&cp->cp_reference, "ra", current);
+	cl_page_assume(env, io, cp);
+
+	if (!cp->cp_defer_uptodate && !PageUptodate(vmpage)) {
 		if (hint == MAYNEED) {
-			vpg->vpg_defer_uptodate = 1;
-			vpg->vpg_ra_used = 0;
+			cp->cp_defer_uptodate = 1;
+			cp->cp_ra_used = 0;
 		}
-		cl_page_list_add(queue, page, true);
+
+		cl_page_list_add(queue, cp, true);
 	} else {
 		/* skip completed pages */
-		cl_page_unassume(env, io, page);
+		cl_page_unassume(env, io, cp);
 		/* This page is already uptodate, returning a positive number
 		 * to tell the callers about this
 		 */
 		rc = 1;
 	}
 
-	lu_ref_del(&page->cp_reference, "ra", current);
-	cl_page_put(env, page);
+	lu_ref_del(&cp->cp_reference, "ra", current);
+	cl_page_put(env, cp);
 out:
 	if (vmpage) {
 		if (rc)
@@ -695,7 +695,7 @@ static void ll_readahead_handle_work(struct work_struct *wq)
 	cl_page_list_discard(env, io, &queue->c2_qin);
 
 	/* Unlock unsent read pages in case of error. */
-	cl_page_list_disown(env, io, &queue->c2_qin);
+	cl_page_list_disown(env, &queue->c2_qin);
 
 	cl_2queue_fini(env, queue);
 out_io_fini:
@@ -1649,9 +1649,9 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 		unlockpage = false;
 
 	vpg = cl2vvp_page(cl_object_page_slice(page->cp_obj, page));
-	uptodate = vpg->vpg_defer_uptodate;
+	uptodate = page->cp_defer_uptodate;
 
-	if (ll_readahead_enabled(sbi) && !vpg->vpg_ra_updated && ras) {
+	if (ll_readahead_enabled(sbi) && !page->cp_ra_updated && ras) {
 		enum ras_update_flags flags = 0;
 
 		if (uptodate)
@@ -1663,7 +1663,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 
 	cl_2queue_init(queue);
 	if (uptodate) {
-		vpg->vpg_ra_used = 1;
+		page->cp_ra_used = 1;
 		SetPageUptodate(page->cp_vmpage);
 		cl_page_disown(env, io, page);
 	} else {
@@ -1740,7 +1740,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 	cl_page_list_discard(env, io, &queue->c2_qin);
 
 	/* Unlock unsent read pages in case of error. */
-	cl_page_list_disown(env, io, &queue->c2_qin);
+	cl_page_list_disown(env, &queue->c2_qin);
 
 	cl_2queue_fini(env, queue);
 
@@ -1881,7 +1881,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 		}
 
 		vpg = cl2vvp_page(cl_object_page_slice(page->cp_obj, page));
-		if (vpg->vpg_defer_uptodate) {
+		if (page->cp_defer_uptodate) {
 			enum ras_update_flags flags = LL_RAS_HIT;
 
 			if (lcc && lcc->lcc_type == LCC_MMAP)
@@ -1894,7 +1894,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
 			 */
 			ras_update(sbi, inode, ras, vvp_index(vpg), flags, io);
 			/* avoid duplicate ras_update() call */
-			vpg->vpg_ra_updated = 1;
+			page->cp_ra_updated = 1;
 
 			if (ll_use_fast_io(file, ras, vvp_index(vpg)))
 				result = 0;
@@ -1907,11 +1907,12 @@ int ll_readpage(struct file *file, struct page *vmpage)
 
 		/* export the page and skip io stack */
 		if (result == 0) {
-			vpg->vpg_ra_used = 1;
+			page->cp_ra_used = 1;
 			SetPageUptodate(vmpage);
 		} else {
 			ll_ra_stats_inc_sbi(sbi, RA_STAT_FAILED_FAST_READ);
 		}
+
 		/* release page refcount before unlocking the page to ensure
 		 * the object won't be destroyed in the calling path of
 		 * cl_page_put(). Please see comment in ll_releasepage().
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 8b379ca..7147f0f 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -286,7 +286,7 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i)
 	}
 
 	cl_2queue_discard(env, io, queue);
-	cl_2queue_disown(env, io, queue);
+	cl_2queue_disown(env, queue);
 	cl_2queue_fini(env, queue);
 	return rc;
 }
@@ -468,8 +468,8 @@ static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io,
 		goto out;
 	}
 
-	if (vpg->vpg_defer_uptodate) {
-		vpg->vpg_ra_used = 1;
+	if (pg->cp_defer_uptodate) {
+		pg->cp_ra_used = 1;
 		result = 0;
 		goto out;
 	}
diff --git a/fs/lustre/llite/vvp_dev.c b/fs/lustre/llite/vvp_dev.c
index 0c417d8..99335bd 100644
--- a/fs/lustre/llite/vvp_dev.c
+++ b/fs/lustre/llite/vvp_dev.c
@@ -435,11 +435,10 @@ static void vvp_pgcache_page_show(const struct lu_env *env,
 
 	vpg = cl2vvp_page(cl_page_at(page, &vvp_device_type));
 	vmpage = vpg->vpg_page;
-	seq_printf(seq, " %5i | %p %p %s %s %s | %p " DFID "(%p) %lu %u [",
+	seq_printf(seq, " %5i | %p %p %s %s | %p " DFID "(%p) %lu %u [",
 		   0 /* gen */,
 		   vpg, page,
 		   "none",
-		   vpg->vpg_defer_uptodate ? "du" : "- ",
 		   PageWriteback(vmpage) ? "wb" : "-",
 		   vmpage, PFID(ll_inode2fid(vmpage->mapping->host)),
 		   vmpage->mapping->host, vmpage->index,
diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h
index b5e1df2..0e0da76 100644
--- a/fs/lustre/llite/vvp_internal.h
+++ b/fs/lustre/llite/vvp_internal.h
@@ -213,9 +213,6 @@ struct vvp_object {
  */
 struct vvp_page {
 	struct cl_page_slice	vpg_cl;
-	unsigned int		vpg_defer_uptodate:1,
-				vpg_ra_updated:1,
-				vpg_ra_used:1;
 	/** VM page */
 	struct page		*vpg_page;
 };
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 82ce5ab..8875a62 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -73,32 +73,6 @@ static void vvp_page_fini(const struct lu_env *env,
 	}
 }
 
-static int vvp_page_own(const struct lu_env *env,
-			const struct cl_page_slice *slice, struct cl_io *io,
-			int nonblock)
-{
-	struct vvp_page *vpg = cl2vvp_page(slice);
-	struct page *vmpage = vpg->vpg_page;
-
-	LASSERT(vmpage);
-	if (nonblock) {
-		if (!trylock_page(vmpage))
-			return -EAGAIN;
-
-		if (unlikely(PageWriteback(vmpage))) {
-			unlock_page(vmpage);
-			return -EAGAIN;
-		}
-
-		return 0;
-	}
-
-	lock_page(vmpage);
-	wait_on_page_writeback(vmpage);
-
-	return 0;
-}
-
 static void vvp_page_assume(const struct lu_env *env,
 			    const struct cl_page_slice *slice,
 			    struct cl_io *unused)
@@ -120,31 +94,15 @@ static void vvp_page_unassume(const struct lu_env *env,
 	LASSERT(PageLocked(vmpage));
 }
 
-static void vvp_page_disown(const struct lu_env *env,
-			    const struct cl_page_slice *slice, struct cl_io *io)
-{
-	struct page *vmpage = cl2vm_page(slice);
-
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-
-	unlock_page(cl2vm_page(slice));
-}
-
 static void vvp_page_discard(const struct lu_env *env,
 			     const struct cl_page_slice *slice,
 			     struct cl_io *unused)
 {
-	struct page *vmpage = cl2vm_page(slice);
-	struct vvp_page *vpg = cl2vvp_page(slice);
+	struct cl_page *cp = slice->cpl_page;
+	struct page *vmpage = cp->cp_vmpage;
 
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-
-	if (vpg->vpg_defer_uptodate && !vpg->vpg_ra_used && vmpage->mapping)
+	if (cp->cp_defer_uptodate && !cp->cp_ra_used && vmpage->mapping)
 		ll_ra_stats_inc(vmpage->mapping->host, RA_STAT_DISCARDED);
-
-	generic_error_remove_page(vmpage->mapping, vmpage);
 }
 
 static void vvp_page_delete(const struct lu_env *env,
@@ -227,22 +185,21 @@ static void vvp_page_completion_read(const struct lu_env *env,
 				     const struct cl_page_slice *slice,
 				     int ioret)
 {
-	struct vvp_page *vpg = cl2vvp_page(slice);
-	struct page *vmpage = vpg->vpg_page;
-	struct cl_page *page = slice->cpl_page;
-	struct inode *inode = vvp_object_inode(page->cp_obj);
+	struct cl_page *cp = slice->cpl_page;
+	struct page *vmpage = cp->cp_vmpage;
+	struct inode *inode = vvp_object_inode(cp->cp_obj);
 
 	LASSERT(PageLocked(vmpage));
-	CL_PAGE_HEADER(D_PAGE, env, page, "completing READ with %d\n", ioret);
+	CL_PAGE_HEADER(D_PAGE, env, cp, "completing READ with %d\n", ioret);
 
-	if (vpg->vpg_defer_uptodate)
+	if (cp->cp_defer_uptodate)
 		ll_ra_count_put(ll_i2sbi(inode), 1);
 
 	if (ioret == 0)  {
-		if (!vpg->vpg_defer_uptodate)
+		if (!cp->cp_defer_uptodate)
 			SetPageUptodate(vmpage);
-	} else if (vpg->vpg_defer_uptodate) {
-		vpg->vpg_defer_uptodate = 0;
+	} else if (cp->cp_defer_uptodate) {
+		cp->cp_defer_uptodate = 0;
 		if (ioret == -EAGAIN) {
 			/* mirror read failed, it needs to destroy the page
 			 * because subpage would be from wrong osc when trying
@@ -252,7 +209,7 @@ static void vvp_page_completion_read(const struct lu_env *env,
 		}
 	}
 
-	if (!page->cp_sync_io)
+	if (!cp->cp_sync_io)
 		unlock_page(vmpage);
 }
 
@@ -329,8 +286,8 @@ static int vvp_page_print(const struct lu_env *env,
 	struct vvp_page *vpg = cl2vvp_page(slice);
 	struct page *vmpage = vpg->vpg_page;
 
-	(*printer)(env, cookie, LUSTRE_VVP_NAME "-page@%p(%d:%d) vm@%p ",
-		   vpg, vpg->vpg_defer_uptodate, vpg->vpg_ra_used, vmpage);
+	(*printer)(env, cookie,
+		   LUSTRE_VVP_NAME"-page@%p vm@%p ", vpg, vmpage);
 	if (vmpage) {
 		(*printer)(env, cookie, "%lx %d:%d %lx %lu %slru",
 			   (long)vmpage->flags, page_count(vmpage),
@@ -356,10 +313,8 @@ static int vvp_page_fail(const struct lu_env *env,
 }
 
 static const struct cl_page_operations vvp_page_ops = {
-	.cpo_own		= vvp_page_own,
 	.cpo_assume		= vvp_page_assume,
 	.cpo_unassume		= vvp_page_unassume,
-	.cpo_disown		= vvp_page_disown,
 	.cpo_discard		= vvp_page_discard,
 	.cpo_delete		= vvp_page_delete,
 	.cpo_fini		= vvp_page_fini,
@@ -378,20 +333,7 @@ static int vvp_page_fail(const struct lu_env *env,
 	},
 };
 
-static void vvp_transient_page_discard(const struct lu_env *env,
-				       const struct cl_page_slice *slice,
-				       struct cl_io *unused)
-{
-	struct cl_page *page = slice->cpl_page;
-
-	/*
-	 * For transient pages, remove it from the radix tree.
-	 */
-	cl_page_delete(env, page);
-}
-
 static const struct cl_page_operations vvp_transient_page_ops = {
-	.cpo_discard		= vvp_transient_page_discard,
 	.cpo_print		= vvp_page_print,
 };
 
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index c97ac0f..4246e17 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -911,8 +911,7 @@ void cl_page_list_splice(struct cl_page_list *src, struct cl_page_list *dst)
 /**
  * Disowns pages in a queue.
  */
-void cl_page_list_disown(const struct lu_env *env,
-			 struct cl_io *io, struct cl_page_list *plist)
+void cl_page_list_disown(const struct lu_env *env, struct cl_page_list *plist)
 {
 	struct cl_page *page;
 	struct cl_page *temp;
@@ -930,7 +929,7 @@ void cl_page_list_disown(const struct lu_env *env,
 		/*
 		 * XXX __cl_page_disown() will fail if page is not locked.
 		 */
-		__cl_page_disown(env, io, page);
+		__cl_page_disown(env, page);
 		lu_ref_del_at(&page->cp_reference, &page->cp_queue_ref, "queue",
 			      plist);
 		cl_page_put(env, page);
@@ -990,11 +989,10 @@ void cl_2queue_init(struct cl_2queue *queue)
 /**
  * Disown pages in both lists of a 2-queue.
  */
-void cl_2queue_disown(const struct lu_env *env,
-		      struct cl_io *io, struct cl_2queue *queue)
+void cl_2queue_disown(const struct lu_env *env, struct cl_2queue *queue)
 {
-	cl_page_list_disown(env, io, &queue->c2_qin);
-	cl_page_list_disown(env, io, &queue->c2_qout);
+	cl_page_list_disown(env, &queue->c2_qin);
+	cl_page_list_disown(env, &queue->c2_qout);
 }
 EXPORT_SYMBOL(cl_2queue_disown);
 
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index b5b5448..cff2c54 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -40,6 +40,7 @@
 #include <obd_class.h>
 #include <obd_support.h>
 #include <linux/list.h>
+#include <linux/pagemap.h>
 
 #include <cl_object.h>
 #include "cl_internal.h"
@@ -487,26 +488,22 @@ static void cl_page_owner_set(struct cl_page *page)
 	page->cp_owner->ci_owned_nr++;
 }
 
-void __cl_page_disown(const struct lu_env *env,
-		      struct cl_io *io, struct cl_page *cl_page)
+void __cl_page_disown(const struct lu_env *env, struct cl_page *cp)
 {
-	const struct cl_page_slice *slice;
 	enum cl_page_state state;
-	int i;
+	struct page *vmpage;
 
-	state = cl_page->cp_state;
-	cl_page_owner_clear(cl_page);
+	state = cp->cp_state;
+	cl_page_owner_clear(cp);
 
 	if (state == CPS_OWNED)
-		cl_page_state_set(env, cl_page, CPS_CACHED);
-	/*
-	 * Completion call-backs are executed in the bottom-up order, so that
-	 * uppermost layer (llite), responsible for VFS/VM interaction runs
-	 * last and can release locks safely.
-	 */
-	cl_page_slice_for_each_reverse(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_disown)
-			(*slice->cpl_ops->cpo_disown)(env, slice, io);
+		cl_page_state_set(env, cp, CPS_CACHED);
+
+	if (cp->cp_type == CPT_CACHEABLE) {
+		vmpage = cp->cp_vmpage;
+		LASSERT(vmpage);
+		LASSERT(PageLocked(vmpage));
+		unlock_page(vmpage);
 	}
 }
 
@@ -539,45 +536,51 @@ int cl_page_is_owned(const struct cl_page *pg, const struct cl_io *io)
  *		another thread, or in IO.
  *
  * \see cl_page_disown()
- * \see cl_page_operations::cpo_own()
  * \see cl_page_own_try()
  * \see cl_page_own
  */
 static int __cl_page_own(const struct lu_env *env, struct cl_io *io,
 			 struct cl_page *cl_page, int nonblock)
 {
-	const struct cl_page_slice *slice;
+	struct page *vmpage = cl_page->cp_vmpage;
 	int result = 0;
-	int i;
-
-	io = cl_io_top(io);
 
 	if (cl_page->cp_state == CPS_FREEING) {
 		result = -ENOENT;
 		goto out;
 	}
 
-	cl_page_slice_for_each(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_own)
-			result = (*slice->cpl_ops->cpo_own)(env, slice,
-							    io, nonblock);
-		if (result != 0)
-			break;
-	}
-	if (result > 0)
-		result = 0;
+	LASSERT(vmpage);
 
-	if (result == 0) {
-		PASSERT(env, cl_page, !cl_page->cp_owner);
-		cl_page->cp_owner = cl_io_top(io);
-		cl_page_owner_set(cl_page);
-		if (cl_page->cp_state != CPS_FREEING) {
-			cl_page_state_set(env, cl_page, CPS_OWNED);
-		} else {
-			__cl_page_disown(env, io, cl_page);
-			result = -ENOENT;
+	if (cl_page->cp_type == CPT_TRANSIENT) {
+		/* OK */
+	} else if (nonblock) {
+		if (!trylock_page(vmpage)) {
+			result = -EAGAIN;
+			goto out;
 		}
+
+		if (unlikely(PageWriteback(vmpage))) {
+			unlock_page(vmpage);
+			result = -EAGAIN;
+			goto out;
+		}
+	} else {
+		lock_page(vmpage);
+		wait_on_page_writeback(vmpage);
 	}
+
+	PASSERT(env, cl_page, !cl_page->cp_owner);
+	cl_page->cp_owner = cl_io_top(io);
+	cl_page_owner_set(cl_page);
+
+	if (cl_page->cp_state == CPS_FREEING) {
+		__cl_page_disown(env, cl_page);
+		result = -ENOENT;
+		goto out;
+	}
+
+	cl_page_state_set(env, cl_page, CPS_OWNED);
 out:
 	return result;
 }
@@ -672,13 +675,11 @@ void cl_page_unassume(const struct lu_env *env,
  * \post !cl_page_is_owned(pg, io)
  *
  * \see cl_page_own()
- * \see cl_page_operations::cpo_disown()
  */
 void cl_page_disown(const struct lu_env *env,
 		    struct cl_io *io, struct cl_page *pg)
 {
-	io = cl_io_top(io);
-	__cl_page_disown(env, io, pg);
+	__cl_page_disown(env, pg);
 }
 EXPORT_SYMBOL(cl_page_disown);
 
@@ -693,15 +694,25 @@ void cl_page_disown(const struct lu_env *env,
  * \see cl_page_operations::cpo_discard()
  */
 void cl_page_discard(const struct lu_env *env,
-		     struct cl_io *io, struct cl_page *cl_page)
+		     struct cl_io *io, struct cl_page *cp)
 {
 	const struct cl_page_slice *slice;
+	struct page *vmpage;
 	int i;
 
-	cl_page_slice_for_each(cl_page, slice, i) {
+	cl_page_slice_for_each(cp, slice, i) {
 		if (slice->cpl_ops->cpo_discard)
 			(*slice->cpl_ops->cpo_discard)(env, slice, io);
 	}
+
+	if (cp->cp_type == CPT_CACHEABLE) {
+		vmpage = cp->cp_vmpage;
+		LASSERT(vmpage);
+		LASSERT(PageLocked(vmpage));
+		generic_error_remove_page(vmpage->mapping, vmpage);
+	} else {
+		cl_page_delete(env, cp);
+	}
 }
 EXPORT_SYMBOL(cl_page_discard);
 
@@ -813,7 +824,7 @@ int cl_page_prep(const struct lu_env *env, struct cl_io *io,
 
 	if (cl_page->cp_type != CPT_TRANSIENT) {
 		cl_page_slice_for_each(cl_page, slice, i) {
-			if (slice->cpl_ops->cpo_own)
+			if (slice->cpl_ops->io[crt].cpo_prep)
 				result = (*slice->cpl_ops->io[crt].cpo_prep)(env,
 									     slice,
 									     io);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 06/32] lustre: clio: remove cpo_assume, cpo_unassume, cpo_fini
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (4 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 05/32] lustre: clio: remove cpo_own and cpo_disown James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 07/32] lustre: enc: enc-unaware clients get ENOKEY if file not found James Simmons
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "John L. Hammond" <jhammond@whamcloud.com>

Remove the cl_page methods cpo_assume, cpo_unassume, and
cpo_fini. These methods were only implemented by the vvp layer and so
they can be easily inlined into cl_page_assume() and
cl_page_unassume(). Remove vvp_page_delete() by inlining its contents
to cl_page_delete0().

WC-bug-id: https://jira.whamcloud.com/browse/LU-10994
Lustre-commit: 9045894fe0f503333 ("LU-10994 clio: remove cpo_assume, cpo_unassume, cpo_fini")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47373
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h |  23 --------
 fs/lustre/llite/vvp_page.c    |  71 +------------------------
 fs/lustre/obdclass/cl_page.c  | 119 +++++++++++++++++++++++++-----------------
 3 files changed, 73 insertions(+), 140 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 4460ae1..c66e98c5 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -838,25 +838,6 @@ struct cl_page_operations {
 	 */
 
 	/**
-	 * Called for a page that is already "owned" by @io from VM point of
-	 * view. Optional.
-	 *
-	 * \see cl_page_assume()
-	 * \see vvp_page_assume(), lov_page_assume()
-	 */
-	void (*cpo_assume)(const struct lu_env *env,
-			   const struct cl_page_slice *slice, struct cl_io *io);
-	/** Dual to cl_page_operations::cpo_assume(). Optional. Called
-	 * bottom-to-top when IO releases a page without actually unlocking
-	 * it.
-	 *
-	 * \see cl_page_unassume()
-	 * \see vvp_page_unassume()
-	 */
-	void (*cpo_unassume)(const struct lu_env *env,
-			     const struct cl_page_slice *slice,
-			     struct cl_io *io);
-	/**
 	 * Update file attributes when all we have is this page.  Used for tiny
 	 * writes to update attributes when we don't have a full cl_io.
 	 */
@@ -884,10 +865,6 @@ struct cl_page_operations {
 	 */
 	void (*cpo_delete)(const struct lu_env *env,
 			   const struct cl_page_slice *slice);
-	/** Destructor. Frees resources and slice itself. */
-	void (*cpo_fini)(const struct lu_env *env,
-			 struct cl_page_slice *slice,
-			 struct pagevec *pvec);
 	/**
 	 * Optional debugging helper. Prints given page slice.
 	 *
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 8875a62..db1cd7c1 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -52,48 +52,6 @@
  * Page operations.
  *
  */
-static void vvp_page_fini(const struct lu_env *env,
-			  struct cl_page_slice *slice,
-			  struct pagevec *pvec)
-{
-	struct vvp_page *vpg = cl2vvp_page(slice);
-	struct page *vmpage = vpg->vpg_page;
-
-	/*
-	 * vmpage->private was already cleared when page was moved into
-	 * VPG_FREEING state.
-	 */
-	LASSERT((struct cl_page *)vmpage->private != slice->cpl_page);
-	LASSERT(vmpage);
-	if (pvec) {
-		if (!pagevec_add(pvec, vmpage))
-			pagevec_release(pvec);
-	} else {
-		put_page(vmpage);
-	}
-}
-
-static void vvp_page_assume(const struct lu_env *env,
-			    const struct cl_page_slice *slice,
-			    struct cl_io *unused)
-{
-	struct page *vmpage = cl2vm_page(slice);
-
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-	wait_on_page_writeback(vmpage);
-}
-
-static void vvp_page_unassume(const struct lu_env *env,
-			      const struct cl_page_slice *slice,
-			      struct cl_io *unused)
-{
-	struct page *vmpage = cl2vm_page(slice);
-
-	LASSERT(vmpage);
-	LASSERT(PageLocked(vmpage));
-}
-
 static void vvp_page_discard(const struct lu_env *env,
 			     const struct cl_page_slice *slice,
 			     struct cl_io *unused)
@@ -105,29 +63,6 @@ static void vvp_page_discard(const struct lu_env *env,
 		ll_ra_stats_inc(vmpage->mapping->host, RA_STAT_DISCARDED);
 }
 
-static void vvp_page_delete(const struct lu_env *env,
-			    const struct cl_page_slice *slice)
-{
-	struct page *vmpage = cl2vm_page(slice);
-	struct cl_page *page = slice->cpl_page;
-
-	LASSERT(PageLocked(vmpage));
-	LASSERT((struct cl_page *)vmpage->private == page);
-
-	/* Drop the reference count held in vvp_page_init */
-	if (refcount_dec_and_test(&page->cp_ref)) {
-		/* It mustn't reach zero here! */
-		LASSERTF(0, "page = %p, refc reached zero\n", page);
-	}
-
-	ClearPagePrivate(vmpage);
-	vmpage->private = 0;
-	/*
-	 * Reference from vmpage to cl_page is removed, but the reference back
-	 * is still here. It is removed later in vvp_page_fini().
-	 */
-}
-
 static int vvp_page_prep_read(const struct lu_env *env,
 			      const struct cl_page_slice *slice,
 			      struct cl_io *unused)
@@ -313,11 +248,7 @@ static int vvp_page_fail(const struct lu_env *env,
 }
 
 static const struct cl_page_operations vvp_page_ops = {
-	.cpo_assume		= vvp_page_assume,
-	.cpo_unassume		= vvp_page_unassume,
 	.cpo_discard		= vvp_page_discard,
-	.cpo_delete		= vvp_page_delete,
-	.cpo_fini		= vvp_page_fini,
 	.cpo_print		= vvp_page_print,
 	.io = {
 		[CRT_READ] = {
@@ -355,7 +286,7 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 				  &vvp_transient_page_ops);
 	} else {
 		get_page(vmpage);
-		/* in cache, decref in vvp_page_delete */
+		/* in cache, decref in cl_page_delete */
 		refcount_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
 		vmpage->private = (unsigned long)page;
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index cff2c54..6319c3d 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -129,29 +129,39 @@ static void __cl_page_free(struct cl_page *cl_page, unsigned short bufsize)
 	}
 }
 
-static void cl_page_free(const struct lu_env *env, struct cl_page *cl_page,
+static void cl_page_free(const struct lu_env *env, struct cl_page *cp,
 			 struct pagevec *pvec)
 {
-	struct cl_object *obj = cl_page->cp_obj;
+	struct cl_object *obj = cp->cp_obj;
 	unsigned short bufsize = cl_object_header(obj)->coh_page_bufsize;
-	struct cl_page_slice *slice;
-	int i;
+	struct page *vmpage;
 
-	PASSERT(env, cl_page, list_empty(&cl_page->cp_batch));
-	PASSERT(env, cl_page, !cl_page->cp_owner);
-	PASSERT(env, cl_page, cl_page->cp_state == CPS_FREEING);
+	PASSERT(env, cp, list_empty(&cp->cp_batch));
+	PASSERT(env, cp, !cp->cp_owner);
+	PASSERT(env, cp, cp->cp_state == CPS_FREEING);
 
-	cl_page_slice_for_each(cl_page, slice, i) {
-		if (unlikely(slice->cpl_ops->cpo_fini))
-			slice->cpl_ops->cpo_fini(env, slice, pvec);
+	if (cp->cp_type == CPT_CACHEABLE) {
+		/* vmpage->private was already cleared when page was
+		 * moved into CPS_FREEING state.
+		 */
+		vmpage = cp->cp_vmpage;
+		LASSERT(vmpage);
+		LASSERT((struct cl_page *)vmpage->private != cp);
+
+		if (pvec) {
+			if (!pagevec_add(pvec, vmpage))
+				pagevec_release(pvec);
+		} else {
+			put_page(vmpage);
+		}
 	}
-	cl_page->cp_layer_count = 0;
-	lu_object_ref_del_at(&obj->co_lu, &cl_page->cp_obj_ref,
-			     "cl_page", cl_page);
-	if (cl_page->cp_type != CPT_TRANSIENT)
+
+	cp->cp_layer_count = 0;
+	lu_object_ref_del_at(&obj->co_lu, &cp->cp_obj_ref, "cl_page", cp);
+	if (cp->cp_type != CPT_TRANSIENT)
 		cl_object_put(env, obj);
-	lu_ref_fini(&cl_page->cp_reference);
-	__cl_page_free(cl_page, bufsize);
+	lu_ref_fini(&cp->cp_reference);
+	__cl_page_free(cp, bufsize);
 }
 
 static struct cl_page *__cl_page_alloc(struct cl_object *o)
@@ -613,28 +623,27 @@ int cl_page_own_try(const struct lu_env *env, struct cl_io *io,
  *
  * Called when page is already locked by the hosting VM.
  *
- * \pre !cl_page_is_owned(cl_page, io)
- * \post cl_page_is_owned(cl_page, io)
+ * \pre !cl_page_is_owned(cp, io)
+ * \post cl_page_is_owned(cp, io)
  *
  * \see cl_page_operations::cpo_assume()
  */
 void cl_page_assume(const struct lu_env *env,
-		    struct cl_io *io, struct cl_page *cl_page)
+		    struct cl_io *io, struct cl_page *cp)
 {
-	const struct cl_page_slice *slice;
-	int i;
-
-	io = cl_io_top(io);
+	struct page *vmpage;
 
-	cl_page_slice_for_each(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_assume)
-			(*slice->cpl_ops->cpo_assume)(env, slice, io);
+	if (cp->cp_type == CPT_CACHEABLE) {
+		vmpage = cp->cp_vmpage;
+		LASSERT(vmpage);
+		LASSERT(PageLocked(vmpage));
+		wait_on_page_writeback(vmpage);
 	}
 
-	PASSERT(env, cl_page, !cl_page->cp_owner);
-	cl_page->cp_owner = cl_io_top(io);
-	cl_page_owner_set(cl_page);
-	cl_page_state_set(env, cl_page, CPS_OWNED);
+	PASSERT(env, cp, !cp->cp_owner);
+	cp->cp_owner = cl_io_top(io);
+	cl_page_owner_set(cp);
+	cl_page_state_set(env, cp, CPS_OWNED);
 }
 EXPORT_SYMBOL(cl_page_assume);
 
@@ -644,24 +653,23 @@ void cl_page_assume(const struct lu_env *env,
  * Moves cl_page into cl_page_state::CPS_CACHED without releasing a lock
  * on the underlying VM page (as VM is supposed to do this itself).
  *
- * \pre   cl_page_is_owned(cl_page, io)
- * \post !cl_page_is_owned(cl_page, io)
+ * \pre   cl_page_is_owned(cp, io)
+ * \post !cl_page_is_owned(cp, io)
  *
  * \see cl_page_assume()
  */
 void cl_page_unassume(const struct lu_env *env,
-		      struct cl_io *io, struct cl_page *cl_page)
+		      struct cl_io *io, struct cl_page *cp)
 {
-	const struct cl_page_slice *slice;
-	int i;
+	struct page *vmpage;
 
-	io = cl_io_top(io);
-	cl_page_owner_clear(cl_page);
-	cl_page_state_set(env, cl_page, CPS_CACHED);
+	cl_page_owner_clear(cp);
+	cl_page_state_set(env, cp, CPS_CACHED);
 
-	cl_page_slice_for_each_reverse(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_unassume)
-			(*slice->cpl_ops->cpo_unassume)(env, slice, io);
+	if (cp->cp_type == CPT_CACHEABLE) {
+		vmpage = cp->cp_vmpage;
+		LASSERT(vmpage);
+		LASSERT(PageLocked(vmpage));
 	}
 }
 EXPORT_SYMBOL(cl_page_unassume);
@@ -721,24 +729,41 @@ void cl_page_discard(const struct lu_env *env,
  * cl_pages, e.g,. in a error handling cl_page_find()->__cl_page_delete()
  * path. Doesn't check page invariant.
  */
-static void __cl_page_delete(const struct lu_env *env,
-			     struct cl_page *cl_page)
+static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp)
 {
 	const struct cl_page_slice *slice;
+	struct page *vmpage;
 	int i;
 
-	PASSERT(env, cl_page, cl_page->cp_state != CPS_FREEING);
+	PASSERT(env, cp, cp->cp_state != CPS_FREEING);
 
 	/*
 	 * Sever all ways to obtain new pointers to @cl_page.
 	 */
-	cl_page_owner_clear(cl_page);
-	__cl_page_state_set(env, cl_page, CPS_FREEING);
+	cl_page_owner_clear(cp);
+	__cl_page_state_set(env, cp, CPS_FREEING);
 
-	cl_page_slice_for_each_reverse(cl_page, slice, i) {
+	cl_page_slice_for_each_reverse(cp, slice, i) {
 		if (slice->cpl_ops->cpo_delete)
 			(*slice->cpl_ops->cpo_delete)(env, slice);
 	}
+
+	if (cp->cp_type == CPT_CACHEABLE) {
+		vmpage = cp->cp_vmpage;
+		LASSERT(PageLocked(vmpage));
+		LASSERT((struct cl_page *)vmpage->private == cp);
+
+		/* Drop the reference count held in vvp_page_init */
+		refcount_dec(&cp->cp_ref);
+		ClearPagePrivate(vmpage);
+		vmpage->private = 0;
+
+		/*
+		 * The reference from vmpage to cl_page is removed,
+		 * but the reference back is still here. It is removed
+		 * later in cl_page_free().
+		 */
+	}
 }
 
 /**
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 07/32] lustre: enc: enc-unaware clients get ENOKEY if file not found
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (5 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 06/32] lustre: clio: remove cpo_assume, cpo_unassume, cpo_fini James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 08/32] lnet: socklnd: Duplicate ksock_conn_cb James Simmons
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

To reduce issues with applications running on clients without keys
or without fscrypt support that check for the existence of a file in
an encrypted directory, return -ENOKEY instead of -ENOENT.
For encryption-unaware clients, this is done on server side in the
mdt layer, by checking if clients have the OBD_CONNECT2_ENCRYPT
connection flag.
For clients without the key, this is done in llite when the searched
filename is not in encoded form.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15855
Lustre-commit: 00898697f998c095e ("LU-15855 enc: enc-unaware clients get ENOKEY if file not found")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/47349
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/crypto.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index f075b9a..ad045c3 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -233,21 +233,26 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname,
 		fid->f_ver = 0;
 	}
 	rc = fscrypt_setup_filename(dir, &dname, lookup, fname);
-	if (rc == -ENOENT && lookup &&
-	    ((is_root_inode(dir) && iname->len == strlen(dot_fscrypt_name) &&
-	      strncmp(iname->name, dot_fscrypt_name, iname->len) == 0) ||
-	     (!fscrypt_has_encryption_key(dir) &&
-	      unlikely(filename_is_volatile(iname->name, iname->len, NULL))))) {
-		/* In case of subdir mount of an encrypted directory, we allow
-		 * lookup of /.fscrypt directory.
-		 */
-		/* For purpose of migration or mirroring without enc key, we
-		 * allow lookup of volatile file without enc context.
-		 */
-		memset(fname, 0, sizeof(struct fscrypt_name));
-		fname->disk_name.name = (unsigned char *)iname->name;
-		fname->disk_name.len = iname->len;
-		rc = 0;
+	if (rc == -ENOENT && lookup) {
+		if (((is_root_inode(dir) &&
+		     iname->len == strlen(dot_fscrypt_name) &&
+		     strncmp(iname->name, dot_fscrypt_name, iname->len) == 0) ||
+		     (!fscrypt_has_encryption_key(dir) &&
+		      unlikely(filename_is_volatile(iname->name,
+						    iname->len, NULL))))) {
+			/* In case of subdir mount of an encrypted directory,
+			 * we allow lookup of /.fscrypt directory.
+			 */
+			/* For purpose of migration or mirroring without enc key,
+			 * we allow lookup of volatile file without enc context.
+			 */
+			memset(fname, 0, sizeof(struct fscrypt_name));
+			fname->disk_name.name = (unsigned char *)iname->name;
+			fname->disk_name.len = iname->len;
+			rc = 0;
+		} else if (!fscrypt_has_encryption_key(dir)) {
+			rc = -ENOKEY;
+		}
 	}
 	if (rc)
 		return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 08/32] lnet: socklnd: Duplicate ksock_conn_cb
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (6 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 07/32] lustre: enc: enc-unaware clients get ENOKEY if file not found James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 09/32] lustre: llite: enforce ROOT default on subdir mount James Simmons
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If two threads enter ksocknal_add_peer(), the first one to acquire
the ksnd_global_lock will create a ksock_peer_ni and associate a
ksock_conn_cb with it.

When the second thread acquires the ksnd_global_lock it will find the
existing ksock_peer_ni, but it does not check for an existing
ksock_conn_cb. As a result, it overwrites the existing ksock_conn_cb
(ksock_peer_ni::ksnp_conn_cb) and the ksock_conn_cb from the first
thread becomes stranded.

Modify ksocknal_add_peer() to check whether the peer_ni has an
existing ksock_conn_cb associated with it

Fixes: 3ffceb7502 ("lnet: socklnd: replace route construct")
HPE-bug-id: LUS-10956
WC-bug-id: https://jira.whamcloud.com/browse/LU-15860
Lustre-commit: 0c91d49a44e1214b5 ("LU-15860 socklnd: Duplicate ksock_conn_cb")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/47361
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 01b434f..2b08501 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -645,14 +645,17 @@ struct ksock_peer_ni *
 			 nidhash(&id->nid));
 	}
 
-	ksocknal_add_conn_cb_locked(peer_ni, conn_cb);
-
-	/* Remember conns_per_peer setting at the time
-	 * of connection initiation. It will define the
-	 * max number of conns per type for this conn_cb
-	 * while it's in use.
-	 */
-	conn_cb->ksnr_max_conns = ksocknal_get_conns_per_peer(peer_ni);
+	if (peer_ni->ksnp_conn_cb) {
+		ksocknal_conn_cb_decref(conn_cb);
+	} else {
+		ksocknal_add_conn_cb_locked(peer_ni, conn_cb);
+		/* Remember conns_per_peer setting at the time
+		 * of connection initiation. It will define the
+		 * max number of conns per type for this conn_cb
+		 * while it's in use.
+		 */
+		conn_cb->ksnr_max_conns = ksocknal_get_conns_per_peer(peer_ni);
+	}
 
 	write_unlock_bh(&ksocknal_data.ksnd_global_lock);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 09/32] lustre: llite: enforce ROOT default on subdir mount
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (7 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 08/32] lnet: socklnd: Duplicate ksock_conn_cb James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 10/32] lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU James Simmons
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

In subdirectory mount, the filesystem-wide default LMV doesn't take
effect. This fix includes the following changes:
* enforce the filesystem-wide default LMV on subdirectory mount if
  it's not set separately.
* "lfs getdirstripe -D <subdir_mount>" should print the
  filesystem-wide default LMV.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15910
Lustre-commit: a162e24d2da5e4bd6 ("LU-15910 llite: enforce ROOT default on subdir mount")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47518
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c            | 75 ++++++++++++++++++++++++----------------
 fs/lustre/llite/llite_internal.h |  3 ++
 fs/lustre/llite/llite_lib.c      | 46 ++++++++++++++++++++++--
 3 files changed, 93 insertions(+), 31 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 6eaac9a..2b63c48 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -655,10 +655,9 @@ int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 	return rc;
 }
 
-static int ll_dir_get_default_layout(struct inode *inode, void **plmm,
-				     int *plmm_size,
-				     struct ptlrpc_request **request, u64 valid,
-				     enum get_default_layout_type type)
+int ll_dir_get_default_layout(struct inode *inode, void **plmm, int *plmm_size,
+			      struct ptlrpc_request **request, u64 valid,
+			      enum get_default_layout_type type)
 {
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	struct mdt_body *body;
@@ -1627,35 +1626,53 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 				lum = (struct lmv_user_md *)lmm;
 				lli = ll_i2info(inode);
-				if (lum->lum_max_inherit == LMV_INHERIT_NONE ||
-				    (lum->lum_max_inherit > 0 &&
-				     lum->lum_max_inherit < lli->lli_dir_depth)) {
-					rc = -ENODATA;
-					goto finish_req;
-				}
+				if (lum->lum_max_inherit !=
+				    LMV_INHERIT_UNLIMITED) {
+					if (lum->lum_max_inherit ==
+						LMV_INHERIT_NONE ||
+					    lum->lum_max_inherit <
+						LMV_INHERIT_END ||
+					    lum->lum_max_inherit >
+						LMV_INHERIT_MAX ||
+					    lum->lum_max_inherit <
+						lli->lli_dir_depth) {
+						rc = -ENODATA;
+						goto finish_req;
+					}
+
+					if (lum->lum_max_inherit ==
+					    lli->lli_dir_depth) {
+						lum->lum_max_inherit =
+							LMV_INHERIT_NONE;
+						lum->lum_max_inherit_rr =
+							LMV_INHERIT_RR_NONE;
+						goto out_copy;
+					}
 
-				if (lum->lum_max_inherit ==
-				    lli->lli_dir_depth) {
-					lum->lum_max_inherit = LMV_INHERIT_NONE;
-					lum->lum_max_inherit_rr =
-						LMV_INHERIT_RR_NONE;
-					goto out_copy;
-				}
-				if (lum->lum_max_inherit > lli->lli_dir_depth &&
-				    lum->lum_max_inherit <= LMV_INHERIT_MAX)
 					lum->lum_max_inherit -=
 						lli->lli_dir_depth;
+				}
 
-				if (lum->lum_max_inherit_rr >
-					lli->lli_dir_depth &&
-				    lum->lum_max_inherit_rr <=
-					LMV_INHERIT_RR_MAX)
-					lum->lum_max_inherit_rr -=
-						lli->lli_dir_depth;
-				else if (lum->lum_max_inherit_rr ==
-						lli->lli_dir_depth)
-					lum->lum_max_inherit_rr =
-						LMV_INHERIT_RR_NONE;
+				if (lum->lum_max_inherit_rr !=
+				    LMV_INHERIT_RR_UNLIMITED) {
+					if (lum->lum_max_inherit_rr ==
+						LMV_INHERIT_NONE ||
+					    lum->lum_max_inherit_rr <
+						LMV_INHERIT_RR_END ||
+					    lum->lum_max_inherit_rr >
+						LMV_INHERIT_RR_MAX ||
+					    lum->lum_max_inherit_rr <=
+						lli->lli_dir_depth) {
+						lum->lum_max_inherit_rr =
+							LMV_INHERIT_RR_NONE;
+						goto out_copy;
+					}
+
+					if (lum->lum_max_inherit_rr >
+					    lli->lli_dir_depth)
+						lum->lum_max_inherit_rr -=
+							lli->lli_dir_depth;
+				}
 			}
 out_copy:
 			if (copy_to_user(ulmv, lmm, lmmsize))
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 70a42d4..c350440 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1155,6 +1155,9 @@ int ll_lov_getstripe_ea_info(struct inode *inode, const char *filename,
 			     struct ptlrpc_request **request);
 int ll_dir_setstripe(struct inode *inode, struct lov_user_md *lump,
 		     int set_default);
+int ll_dir_get_default_layout(struct inode *inode, void **plmm, int *plmm_size,
+			      struct ptlrpc_request **request, u64 valid,
+			      enum get_default_layout_type type);
 int ll_dir_getstripe_default(struct inode *inode, void **lmmp,
 			     int *lmm_size, struct ptlrpc_request **request,
 			     struct ptlrpc_request **root_request, u64 valid);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index b55a30f..5b80722 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2972,6 +2972,39 @@ void ll_open_cleanup(struct super_block *sb, struct req_capsule *pill)
 	ll_finish_md_op_data(op_data);
 }
 
+/* set filesystem-wide default LMV for subdir mount if it's enabled on ROOT. */
+static int ll_fileset_default_lmv_fixup(struct inode *inode,
+					struct lustre_md *md)
+{
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
+	struct ptlrpc_request *req = NULL;
+	union lmv_mds_md *lmm = NULL;
+	int size = 0;
+	int rc;
+
+	LASSERT(is_root_inode(inode));
+	LASSERT(!fid_is_root(&sbi->ll_root_fid));
+	LASSERT(!md->default_lmv);
+
+	rc = ll_dir_get_default_layout(inode, (void **)&lmm, &size, &req,
+				       OBD_MD_DEFAULT_MEA,
+				       GET_DEFAULT_LAYOUT_ROOT);
+	if (rc && rc != -ENODATA)
+		goto out;
+
+	rc = 0;
+	if (lmm && size) {
+		rc = md_unpackmd(sbi->ll_md_exp, &md->default_lmv, lmm, size);
+		if (rc < 0)
+			goto out;
+		rc = 0;
+	}
+out:
+	if (req)
+		ptlrpc_req_finished(req);
+	return rc;
+}
+
 int ll_prep_inode(struct inode **inode, struct req_capsule *pill,
 		  struct super_block *sb, struct lookup_intent *it)
 {
@@ -2993,8 +3026,17 @@ int ll_prep_inode(struct inode **inode, struct req_capsule *pill,
 	 * ll_update_lsm_md() may change md.
 	 */
 	if (it && (it->it_op & (IT_LOOKUP | IT_GETATTR)) &&
-	    S_ISDIR(md.body->mbo_mode) && !md.default_lmv)
-		default_lmv_deleted = true;
+	    S_ISDIR(md.body->mbo_mode) && !md.default_lmv) {
+		if (unlikely(*inode && is_root_inode(*inode) &&
+			     !fid_is_root(&sbi->ll_root_fid))) {
+			rc = ll_fileset_default_lmv_fixup(*inode, &md);
+			if (rc)
+				goto out;
+		}
+
+		if (!md.default_lmv)
+			default_lmv_deleted = true;
+	}
 
 	if (*inode) {
 		rc = ll_update_inode(*inode, &md);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 10/32] lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU.
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (8 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 09/32] lustre: llite: enforce ROOT default on subdir mount James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 11/32] lustre: som: disabling xattr cache for LSOM on client James Simmons
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Alexey Lyashkov, Lustre Development List

From: Alexey Lyashkov <alexey.lyashkov@hpe.com>

HPE-bug-id: LUS-10520
WC-bug-id: https://jira.whamcloud.com/browse/LU-15189
Lustre-commit: 959304eac7ec5b156 ("LU-15189 lnet: fix memory mapping.")
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/45482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
HPE-bug-id: LUS-10997
WC-bug-id: https://jira.whamcloud.com/browse/LU-15914
Lustre-commit: cb0220db3ce517b0e ("LU-15914 lnet: Fix null md deref for finalized message")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_net.h       |  4 +++-
 fs/lustre/osc/osc_request.c          |  3 +++
 fs/lustre/ptlrpc/pers.c              |  3 +++
 include/linux/lnet/lib-types.h       |  3 +--
 include/uapi/linux/lnet/lnet-types.h |  2 ++
 net/lnet/klnds/o2iblnd/o2iblnd.h     | 23 +++++++++++++++--------
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c  | 31 +++++++++++++++++++++----------
 net/lnet/lnet/lib-md.c               |  3 +++
 net/lnet/lnet/lib-move.c             | 10 ++++++----
 9 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index 7d29542..f70cc7c 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -1186,7 +1186,9 @@ struct ptlrpc_bulk_desc {
 	/** completed with failure */
 	unsigned long			bd_failure:1;
 	/** client side */
-	unsigned long			bd_registered:1;
+	unsigned long			bd_registered:1,
+	/* bulk request is RDMA transfer, use page->host as real address */
+					bd_is_rdma:1;
 	/** For serialization with callback */
 	spinlock_t			bd_lock;
 	/** {put,get}{source,sink}{kiov} */
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index d84884f..21e036e 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1416,6 +1416,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	const char *obd_name = cli->cl_import->imp_obd->obd_name;
 	struct inode *inode = NULL;
 	bool directio = false;
+	bool gpu = 0;
 	bool enable_checksum = true;
 	struct cl_page *clpage;
 
@@ -1581,6 +1582,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	if (brw_page2oap(pga[0])->oap_brw_flags & OBD_BRW_RDMA_ONLY) {
 		enable_checksum = false;
 		short_io_size = 0;
+		gpu = 1;
 	}
 
 	/* Check if read/write is small enough to be a short io. */
@@ -1632,6 +1634,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 		goto out;
 	}
 	/* NB request now owns desc and will free it when it gets freed */
+	desc->bd_is_rdma = gpu;
 no_bulk:
 	body = req_capsule_client_get(pill, &RMF_OST_BODY);
 	ioobj = req_capsule_client_get(pill, &RMF_OBD_IOOBJ);
diff --git a/fs/lustre/ptlrpc/pers.c b/fs/lustre/ptlrpc/pers.c
index e24c8e3..b35d2fe 100644
--- a/fs/lustre/ptlrpc/pers.c
+++ b/fs/lustre/ptlrpc/pers.c
@@ -58,6 +58,9 @@ void ptlrpc_fill_bulk_md(struct lnet_md *md, struct ptlrpc_bulk_desc *desc,
 		return;
 	}
 
+	if (desc->bd_is_rdma)
+		md->options |= LNET_MD_GPU_ADDR;
+
 	if (mdidx == (desc->bd_md_count - 1))
 		md->length = desc->bd_iov_count - start;
 	else
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index f7f0b0b..1827f4e 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -138,8 +138,6 @@ struct lnet_msg {
 	enum lnet_msg_hstatus	msg_health_status;
 	/* This is a recovery message */
 	bool			msg_recovery;
-	/* force an RDMA even if the message size is < 4K */
-	bool			msg_rdma_force;
 	/* the number of times a transmission has been retried */
 	int			msg_retry_count;
 	/* flag to indicate that we do not want to resend this message */
@@ -245,6 +243,7 @@ struct lnet_libmd {
  */
 #define LNET_MD_FLAG_HANDLING		BIT(3)
 #define LNET_MD_FLAG_DISCARD		BIT(4)
+#define LNET_MD_FLAG_GPU		BIT(5) /**< Special mapping needs */
 
 struct lnet_test_peer {
 	/* info about peers we are trying to fail */
diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index c5fca5c..5a2ea45 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -467,6 +467,8 @@ struct lnet_md {
 #define LNET_MD_TRACK_RESPONSE		(1 << 10)
 /** See struct lnet_md::options. */
 #define LNET_MD_NO_TRACK_RESPONSE	(1 << 11)
+/** Special page mapping handling */
+#define LNET_MD_GPU_ADDR		(1 << 13)
 
 /** Infinite threshold on MD operations. See lnet_md::threshold */
 #define LNET_MD_THRESH_INF	(-1)
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index e798695..0066e85 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -401,8 +401,9 @@ struct kib_tx {					/* transmit message */
 	struct kib_tx_pool     *tx_pool;	/* pool I'm from */
 	struct kib_conn	       *tx_conn;	/* owning conn */
 	short			tx_sending;	/* # tx callbacks outstanding */
-	short			tx_queued;	/* queued for sending */
-	short			tx_waiting;	/* waiting for peer_ni */
+	unsigned long		tx_queued:1,	/* queued for sending */
+				tx_waiting:1,	/* waiting for peer_ni */
+				tx_gpu:1;	/* force DMA */
 	int			tx_status;	/* LNET completion status */
 	enum lnet_msg_hstatus	tx_hstatus;	/* health status of the transmit */
 	ktime_t			tx_deadline;	/* completion deadline */
@@ -861,17 +862,23 @@ static inline void kiblnd_dma_unmap_single(struct ib_device *dev,
 #define KIBLND_UNMAP_ADDR_SET(p, m, a)	do {} while (0)
 #define KIBLND_UNMAP_ADDR(p, m, a)	(a)
 
-static inline int kiblnd_dma_map_sg(struct kib_hca_dev *hdev,
-				    struct scatterlist *sg, int nents,
-				    enum dma_data_direction direction)
+static inline
+int kiblnd_dma_map_sg(struct kib_hca_dev *hdev, struct kib_tx *tx)
 {
+	struct scatterlist *sg = tx->tx_frags;
+	int nents = tx->tx_nfrags;
+	enum dma_data_direction direction = tx->tx_dmadir;
+
 	return ib_dma_map_sg(hdev->ibh_ibdev, sg, nents, direction);
 }
 
-static inline void kiblnd_dma_unmap_sg(struct kib_hca_dev *hdev,
-				       struct scatterlist *sg, int nents,
-				       enum dma_data_direction direction)
+static inline
+void kiblnd_dma_unmap_sg(struct kib_hca_dev *hdev, struct kib_tx *tx)
 {
+	struct scatterlist *sg = tx->tx_frags;
+	int nents = tx->tx_nfrags;
+	enum dma_data_direction direction = tx->tx_dmadir;
+
 	ib_dma_unmap_sg(hdev->ibh_ibdev, sg, nents, direction);
 }
 
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index cb96282..01fa499 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -623,8 +623,7 @@ static void kiblnd_unmap_tx(struct kib_tx *tx)
 		kiblnd_fmr_pool_unmap(&tx->tx_fmr, tx->tx_status);
 
 	if (tx->tx_nfrags) {
-		kiblnd_dma_unmap_sg(tx->tx_pool->tpo_hdev,
-				    tx->tx_frags, tx->tx_nfrags, tx->tx_dmadir);
+		kiblnd_dma_unmap_sg(tx->tx_pool->tpo_hdev, tx);
 		tx->tx_nfrags = 0;
 	}
 }
@@ -644,9 +643,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	tx->tx_dmadir = (rd != tx->tx_rd) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
 	tx->tx_nfrags = nfrags;
 
-	rd->rd_nfrags = kiblnd_dma_map_sg(hdev, tx->tx_frags,
-					  tx->tx_nfrags, tx->tx_dmadir);
-
+	rd->rd_nfrags = kiblnd_dma_map_sg(hdev, tx);
 	for (i = 0, nob = 0; i < rd->rd_nfrags; i++) {
 		rd->rd_frags[i].rf_nob  = kiblnd_sg_dma_len(
 			hdev->ibh_ibdev, &tx->tx_frags[i]);
@@ -1076,7 +1073,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		int prev = dstidx;
 
 		if (srcidx >= srcrd->rd_nfrags) {
-			CERROR("Src buffer exhausted: %d frags\n", srcidx);
+			CERROR("Src buffer exhausted: %d frags %px\n",
+			       srcidx, tx);
 			rc = -EPROTO;
 			break;
 		}
@@ -1540,10 +1538,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct bio_vec *payload_kiov = lntmsg->msg_kiov;
 	unsigned int payload_offset = lntmsg->msg_offset;
 	unsigned int payload_nob = lntmsg->msg_len;
+	struct lnet_libmd *msg_md = lntmsg->msg_md;
 	struct iov_iter from;
 	struct kib_msg *ibmsg;
 	struct kib_rdma_desc *rd;
 	struct kib_tx *tx;
+	bool gpu;
 	int nob;
 	int rc;
 
@@ -1571,6 +1571,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		return -ENOMEM;
 	}
 	ibmsg = tx->tx_msg;
+	gpu = msg_md ? (msg_md->md_flags & LNET_MD_FLAG_GPU) : false;
 
 	switch (type) {
 	default:
@@ -1586,11 +1587,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			break;		/* send IMMEDIATE */
 
 		/* is the REPLY message too small for RDMA? */
-		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[lntmsg->msg_md->md_length]);
-		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
+		nob = offsetof(struct kib_msg,
+			       ibm_u.immediate.ibim_payload[lntmsg->msg_md->md_length]);
+		if (nob <= IBLND_MSG_SIZE && !gpu)
 			break;		/* send IMMEDIATE */
 
 		rd = &ibmsg->ibm_u.get.ibgm_rd;
+		tx->tx_gpu = gpu;
 		rc = kiblnd_setup_rd_kiov(ni, tx, rd,
 					  payload_niov, payload_kiov,
 					  payload_offset, payload_nob);
@@ -1626,9 +1629,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case LNET_MSG_PUT:
 		/* Is the payload small enough not to need RDMA? */
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[payload_nob]);
-		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
+		if (nob <= IBLND_MSG_SIZE && !gpu)
 			break;			/* send IMMEDIATE */
 
+		tx->tx_gpu = gpu;
+
 		rc = kiblnd_setup_rd_kiov(ni, tx, tx->tx_rd,
 					  payload_niov, payload_kiov,
 					  payload_offset, payload_nob);
@@ -1712,6 +1717,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct bio_vec *kiov = lntmsg->msg_kiov;
 	unsigned int offset = lntmsg->msg_offset;
 	unsigned int nob = lntmsg->msg_len;
+	struct lnet_libmd *payload_md = lntmsg->msg_md;
 	struct kib_tx *tx;
 	int rc;
 
@@ -1722,6 +1728,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		goto failed_0;
 	}
 
+	tx->tx_gpu = !!(payload_md->md_flags & LNET_MD_FLAG_GPU);
 	if (!nob)
 		rc = 0;
 	else
@@ -1784,7 +1791,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct kib_tx *tx;
 	int nob;
 	int post_credit = IBLND_POSTRX_PEER_CREDIT;
-	u64 ibprm_cookie = rxmsg->ibm_u.putreq.ibprm_cookie;
+	u64 ibprm_cookie;
 	int rc = 0;
 
 	LASSERT(iov_iter_count(to) <= rlen);
@@ -1819,6 +1826,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case IBLND_MSG_PUT_REQ: {
 		struct kib_msg *txmsg;
 		struct kib_rdma_desc *rd;
+		struct lnet_libmd *payload_md = lntmsg->msg_md;
+
+		ibprm_cookie = rxmsg->ibm_u.putreq.ibprm_cookie;
 
 		if (!iov_iter_count(to)) {
 			lnet_finalize(lntmsg, 0);
@@ -1836,6 +1846,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			break;
 		}
 
+		tx->tx_gpu = !!(payload_md->md_flags & LNET_MD_FLAG_GPU);
 		txmsg = tx->tx_msg;
 		rd = &txmsg->ibm_u.putack.ibpam_rd;
 		rc = kiblnd_setup_rd_kiov(ni, tx, rd,
diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c
index affa921..05fb666 100644
--- a/net/lnet/lnet/lib-md.c
+++ b/net/lnet/lnet/lib-md.c
@@ -192,6 +192,9 @@ struct page *
 	lmd->md_flags = (unlink == LNET_UNLINK) ? LNET_MD_FLAG_AUTO_UNLINK : 0;
 	lmd->md_bulk_handle = umd->bulk_handle;
 
+	if (umd->options & LNET_MD_GPU_ADDR)
+		lmd->md_flags |= LNET_MD_FLAG_GPU;
+
 	if (umd->options & LNET_MD_KIOV) {
 		niov = umd->length;
 		lmd->md_niov = umd->length;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 0c5bf82..53e953f 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1450,11 +1450,13 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	u32 best_sel_prio;
 	unsigned int best_dev_prio;
 	unsigned int dev_idx = UINT_MAX;
-	struct page *page = lnet_get_first_page(md, offset);
+	bool gpu = md ? (md->md_flags & LNET_MD_FLAG_GPU) : false;
+
+	if (gpu) {
+		struct page *page = lnet_get_first_page(md, offset);
 
-	msg->msg_rdma_force = lnet_is_rdma_only_page(page);
-	if (msg->msg_rdma_force)
 		dev_idx = lnet_get_dev_idx(page);
+	}
 
 	/* If there is no peer_ni that we can send to on this network,
 	 * then there is no point in looking for a new best_ni here.
@@ -1505,7 +1507,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 * All distances smaller than the NUMA range
 		 * are treated equally.
 		 */
-		if (distance < lnet_numa_range)
+		if (!gpu && distance < lnet_numa_range)
 			distance = lnet_numa_range;
 
 		/* * Select on health, selection policy, direct dma prio,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 11/32] lustre: som: disabling xattr cache for LSOM on client
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (9 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 10/32] lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 12/32] lnet: discard some peer_ni lookup functions James Simmons
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

To obtain uptodate LSOM data, currently a client needs to set
llite.*.xattr_cache =0 to disable the xattr cache on client
completely. This leads that other kinds of xattr can not be cached
on the client too.
This patch introduces a heavy-weight solution to disable caching
only for LSOM xattr data ("trusted.som") on client.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11695
Lustre-commit: 192902851d73ec246 ("LU-11695 som: disabling xattr cache for LSOM on client")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/33711
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/xattr.c       | 3 ++-
 fs/lustre/llite/xattr_cache.c | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c
index 3a342ad..11310f9 100644
--- a/fs/lustre/llite/xattr.c
+++ b/fs/lustre/llite/xattr.c
@@ -373,7 +373,8 @@ int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
 	}
 
 	if (sbi->ll_xattr_cache_enabled && type != XATTR_ACL_ACCESS_T &&
-	    (type != XATTR_SECURITY_T || strcmp(name, "security.selinux"))) {
+	    (type != XATTR_SECURITY_T || strcmp(name, "security.selinux")) &&
+	    (type != XATTR_TRUSTED_T || strcmp(name, XATTR_NAME_SOM))) {
 		rc = ll_xattr_cache_get(inode, name, buffer, size, valid);
 		if (rc == -EAGAIN)
 			goto getxattr_nocache;
diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c
index 723cc39..7e5b807 100644
--- a/fs/lustre/llite/xattr_cache.c
+++ b/fs/lustre/llite/xattr_cache.c
@@ -465,6 +465,10 @@ static int ll_xattr_cache_refill(struct inode *inode)
 			/* Filter out security.selinux, it is cached in slab */
 			CDEBUG(D_CACHE, "not caching security.selinux\n");
 			rc = 0;
+		} else if (!strcmp(xdata, XATTR_NAME_SOM)) {
+			/* Filter out trusted.som, it is not cached on client */
+			CDEBUG(D_CACHE, "not caching trusted.som\n");
+			rc = 0;
 		} else {
 			rc = ll_xattr_cache_add(&lli->lli_xattrs, xdata, xval,
 						*xsizes);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 12/32] lnet: discard some peer_ni lookup functions
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (10 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 11/32] lustre: som: disabling xattr cache for LSOM on client James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 13/32] lnet: change lnet_*_peer_ni to take struct lnet_nid James Simmons
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_nid2peerni_locked(), lnet_peer_get_ni_locked(),
lnet_find_peer4(), and lnet_find_peer_ni_locked() each have few users
left and that can call be change to use alternate versions which take
'struct lnet_nid' rather than 'lnet_nid_t'.

So convert all those callers over, and discard the older functions.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 9768d8929a305588f ("LU-10391 lnet: discard some peer_ni lookup functions")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44624
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |   6 --
 net/lnet/lnet/api-ni.c        |  26 +++---
 net/lnet/lnet/lib-move.c      |   8 +-
 net/lnet/lnet/peer.c          | 211 +++++++++++++++---------------------------
 net/lnet/lnet/router.c        |  18 ++--
 net/lnet/lnet/udsp.c          |   8 +-
 6 files changed, 110 insertions(+), 167 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index e21866b..3bdb49e 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -898,19 +898,13 @@ struct lnet_peer_net *lnet_get_next_peer_net_locked(struct lnet_peer *lp,
 struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer,
 						  struct lnet_peer_net *peer_net,
 						  struct lnet_peer_ni *prev);
-struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref,
-					    int cpt);
 struct lnet_peer_ni *lnet_peerni_by_nid_locked(struct lnet_nid *nid,
 					       struct lnet_nid *pref,
 					       int cpt);
 struct lnet_peer_ni *lnet_nid2peerni_ex(struct lnet_nid *nid);
-struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp,
-					     lnet_nid_t nid);
 struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp,
 					     struct lnet_nid *nid);
-struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid);
 struct lnet_peer_ni *lnet_peer_ni_find_locked(struct lnet_nid *nid);
-struct lnet_peer *lnet_find_peer4(lnet_nid_t nid);
 struct lnet_peer *lnet_find_peer(struct lnet_nid *nid);
 void lnet_peer_net_added(struct lnet_net *net);
 void lnet_peer_primary_nid_locked(struct lnet_nid *nid,
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 165728d..124ec86 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4381,7 +4381,8 @@ u32 lnet_get_dlc_seq_locked(void)
 			return rc;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lp = lnet_find_peer4(ping->ping_id.nid);
+		lnet_nid4_to_nid(ping->ping_id.nid, &nid);
+		lp = lnet_find_peer(&nid);
 		if (lp) {
 			ping->ping_id.nid =
 				lnet_nid_to_nid4(&lp->lp_primary_nid);
@@ -4405,7 +4406,8 @@ u32 lnet_get_dlc_seq_locked(void)
 			return rc;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		lp = lnet_find_peer4(discover->ping_id.nid);
+		lnet_nid4_to_nid(discover->ping_id.nid, &nid);
+		lp = lnet_find_peer(&nid);
 		if (lp) {
 			discover->ping_id.nid =
 				lnet_nid_to_nid4(&lp->lp_primary_nid);
@@ -4687,7 +4689,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 
 	if (nob < 8) {
 		CERROR("%s: ping info too short %d\n",
-		       libcfs_id2str(id4), nob);
+		       libcfs_idstr(&id), nob);
 		goto fail_ping_buffer_decref;
 	}
 
@@ -4695,19 +4697,19 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		lnet_swap_pinginfo(pbuf);
 	} else if (pbuf->pb_info.pi_magic != LNET_PROTO_PING_MAGIC) {
 		CERROR("%s: Unexpected magic %08x\n",
-		       libcfs_id2str(id4), pbuf->pb_info.pi_magic);
+		       libcfs_idstr(&id), pbuf->pb_info.pi_magic);
 		goto fail_ping_buffer_decref;
 	}
 
 	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_NI_STATUS)) {
 		CERROR("%s: ping w/o NI status: 0x%x\n",
-		       libcfs_id2str(id4), pbuf->pb_info.pi_features);
+		       libcfs_idstr(&id), pbuf->pb_info.pi_features);
 		goto fail_ping_buffer_decref;
 	}
 
 	if (nob < LNET_PING_INFO_SIZE(0)) {
 		CERROR("%s: Short reply %d(%d min)\n",
-		       libcfs_id2str(id4),
+		       libcfs_idstr(&id),
 		       nob, (int)LNET_PING_INFO_SIZE(0));
 		goto fail_ping_buffer_decref;
 	}
@@ -4717,7 +4719,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 
 	if (nob < LNET_PING_INFO_SIZE(n_ids)) {
 		CERROR("%s: Short reply %d(%d expected)\n",
-		       libcfs_id2str(id4),
+		       libcfs_idstr(&id),
 		       nob, (int)LNET_PING_INFO_SIZE(n_ids));
 		goto fail_ping_buffer_decref;
 	}
@@ -4739,7 +4741,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 }
 
 static int
-lnet_discover(struct lnet_process_id id, u32 force,
+lnet_discover(struct lnet_process_id id4, u32 force,
 	      struct lnet_process_id __user *ids,
 	      int n_ids)
 {
@@ -4747,14 +4749,16 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 	struct lnet_peer_ni *p;
 	struct lnet_peer *lp;
 	struct lnet_process_id *buf;
+	struct lnet_processid id;
 	int cpt;
 	int i;
 	int rc;
 
 	if (n_ids <= 0 ||
-	    id.nid == LNET_NID_ANY)
+	    id4.nid == LNET_NID_ANY)
 		return -EINVAL;
 
+	lnet_pid4_to_pid(id4, &id);
 	if (id.pid == LNET_PID_ANY)
 		id.pid = LNET_PID_LUSTRE;
 
@@ -4769,7 +4773,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 		return -ENOMEM;
 
 	cpt = lnet_net_lock_current();
-	lpni = lnet_nid2peerni_locked(id.nid, LNET_NID_ANY, cpt);
+	lpni = lnet_peerni_by_nid_locked(&id.nid, NULL, cpt);
 	if (IS_ERR(lpni)) {
 		rc = PTR_ERR(lpni);
 		goto out;
@@ -4795,7 +4799,7 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid,
 	 * and lookup the lpni again
 	 */
 	lnet_peer_ni_decref_locked(lpni);
-	lpni = lnet_find_peer_ni_locked(id.nid);
+	lpni = lnet_peer_ni_find_locked(&id.nid);
 	if (!lpni) {
 		rc = -ENOENT;
 		goto out;
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 53e953f..a514472 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1907,7 +1907,7 @@ struct lnet_ni *
 		return rc;
 	}
 
-	new_lpni = lnet_find_peer_ni_locked(lnet_nid_to_nid4(&lpni->lpni_nid));
+	new_lpni = lnet_peer_ni_find_locked(&lpni->lpni_nid);
 	if (!new_lpni) {
 		lnet_peer_ni_decref_locked(lpni);
 		return -ENOENT;
@@ -2795,7 +2795,7 @@ struct lnet_ni *
 		 * try to send it via non-multi-rail criteria
 		 */
 		if (!IS_ERR(src_lpni)) {
-			/* Drop ref taken by lnet_nid2peerni_locked() */
+			/* Drop ref taken by lnet_peerni_by_nid_locked() */
 			lnet_peer_ni_decref_locked(src_lpni);
 			src_lp = lpni->lpni_peer_net->lpn_peer;
 			if (lnet_peer_is_multi_rail(src_lp) &&
@@ -3523,7 +3523,7 @@ struct lnet_mt_event_info {
 					    ev_info, the_lnet.ln_mt_handler,
 					    true);
 			lnet_net_lock(0);
-			/* lnet_find_peer_ni_locked() grabs a refcount for
+			/* lnet_peer_ni_find_locked() grabs a refcount for
 			 * us. No need to take it explicitly.
 			 */
 			lpni = lnet_peer_ni_find_locked(&nid);
@@ -3546,7 +3546,7 @@ struct lnet_mt_event_info {
 				spin_unlock(&lpni->lpni_lock);
 			}
 
-			/* Drop the ref taken by lnet_find_peer_ni_locked() */
+			/* Drop the ref taken by lnet_peer_ni_find_locked() */
 			lnet_peer_ni_decref_locked(lpni);
 			lnet_net_unlock(0);
 		} else {
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 3909c5d..7a96a2f 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -698,24 +698,6 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 }
 
 struct lnet_peer_ni *
-lnet_find_peer_ni_locked(lnet_nid_t nid4)
-{
-	struct lnet_peer_ni *lpni;
-	struct lnet_peer_table *ptable;
-	int cpt;
-	struct lnet_nid nid;
-
-	lnet_nid4_to_nid(nid4, &nid);
-
-	cpt = lnet_nid_cpt_hash(&nid, LNET_CPT_NUMBER);
-
-	ptable = the_lnet.ln_peer_tables[cpt];
-	lpni = lnet_get_peer_ni_locked(ptable, &nid);
-
-	return lpni;
-}
-
-struct lnet_peer_ni *
 lnet_peer_ni_find_locked(struct lnet_nid *nid)
 {
 	struct lnet_peer_ni *lpni;
@@ -731,24 +713,6 @@ struct lnet_peer_ni *
 }
 
 struct lnet_peer_ni *
-lnet_peer_get_ni_locked(struct lnet_peer *lp, lnet_nid_t nid)
-{
-	struct lnet_peer_net *lpn;
-	struct lnet_peer_ni *lpni;
-
-	lpn = lnet_peer_get_net_locked(lp, LNET_NIDNET(nid));
-	if (!lpn)
-		return NULL;
-
-	list_for_each_entry(lpni, &lpn->lpn_peer_nis, lpni_peer_nis) {
-		if (lnet_nid_to_nid4(&lpni->lpni_nid) == nid)
-			return lpni;
-	}
-
-	return NULL;
-}
-
-struct lnet_peer_ni *
 lnet_peer_ni_get_locked(struct lnet_peer *lp, struct lnet_nid *nid)
 {
 	struct lnet_peer_net *lpn;
@@ -767,25 +731,6 @@ struct lnet_peer_ni *
 }
 
 struct lnet_peer *
-lnet_find_peer4(lnet_nid_t nid)
-{
-	struct lnet_peer_ni *lpni;
-	struct lnet_peer *lp = NULL;
-	int cpt;
-
-	cpt = lnet_net_lock_current();
-	lpni = lnet_find_peer_ni_locked(nid);
-	if (lpni) {
-		lp = lpni->lpni_peer_net->lpn_peer;
-		lnet_peer_addref_locked(lp);
-		lnet_peer_ni_decref_locked(lpni);
-	}
-	lnet_net_unlock(cpt);
-
-	return lp;
-}
-
-struct lnet_peer *
 lnet_find_peer(struct lnet_nid *nid)
 {
 	struct lnet_peer_ni *lpni;
@@ -1620,21 +1565,20 @@ struct lnet_peer_net *
  * Call with the lnet_api_mutex held.
  */
 static int
-lnet_peer_add(lnet_nid_t nid4, unsigned int flags)
+lnet_peer_add(struct lnet_nid *nid, unsigned int flags)
 {
-	struct lnet_nid nid;
 	struct lnet_peer *lp;
 	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
 	int rc = 0;
 
-	LASSERT(nid4 != LNET_NID_ANY);
+	LASSERT(nid);
 
 	/*
 	 * No need for the lnet_net_lock here, because the
 	 * lnet_api_mutex is held.
 	 */
-	lpni = lnet_find_peer_ni_locked(nid4);
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (lpni) {
 		/* A peer with this NID already exists. */
 		lp = lpni->lpni_peer_net->lpn_peer;
@@ -1646,13 +1590,13 @@ struct lnet_peer_net *
 		 * that an existing peer is being modified.
 		 */
 		if (lp->lp_state & LNET_PEER_CONFIGURED) {
-			if (lnet_nid_to_nid4(&lp->lp_primary_nid) != nid4)
+			if (!nid_same(&lp->lp_primary_nid, nid))
 				rc = -EEXIST;
 			else if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL)
 				rc = -EPERM;
 			goto out;
 		} else if (!(flags & LNET_PEER_CONFIGURED)) {
-			if (lnet_nid_to_nid4(&lp->lp_primary_nid) == nid4) {
+			if (nid_same(&lp->lp_primary_nid, nid)) {
 				rc = -EEXIST;
 				goto out;
 			}
@@ -1665,14 +1609,13 @@ struct lnet_peer_net *
 
 	/* Create peer, peer_net, and peer_ni. */
 	rc = -ENOMEM;
-	lnet_nid4_to_nid(nid4, &nid);
-	lp = lnet_peer_alloc(&nid);
+	lp = lnet_peer_alloc(nid);
 	if (!lp)
 		goto out;
-	lpn = lnet_peer_net_alloc(LNET_NID_NET(&nid));
+	lpn = lnet_peer_net_alloc(LNET_NID_NET(nid));
 	if (!lpn)
 		goto out_free_lp;
-	lpni = lnet_peer_ni_alloc(&nid);
+	lpni = lnet_peer_ni_alloc(nid);
 	if (!lpni)
 		goto out_free_lpn;
 
@@ -1684,7 +1627,7 @@ struct lnet_peer_net *
 	kfree(lp);
 out:
 	CDEBUG(D_NET, "peer %s NID flags %#x: %d\n",
-	       libcfs_nid2str(nid4), flags, rc);
+	       libcfs_nidstr(nid), flags, rc);
 	return rc;
 }
 
@@ -1699,17 +1642,15 @@ struct lnet_peer_net *
  *             non-multi-rail peer.
  */
 static int
-lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid4, unsigned int flags)
+lnet_peer_add_nid(struct lnet_peer *lp, struct lnet_nid *nid,
+		  unsigned int flags)
 {
 	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
-	struct lnet_nid nid;
 	int rc = 0;
 
 	LASSERT(lp);
-	LASSERT(nid4 != LNET_NID_ANY);
-
-	lnet_nid4_to_nid(nid4, &nid);
+	LASSERT(nid);
 
 	/* A configured peer can only be updated through configuration. */
 	if (!(flags & LNET_PEER_CONFIGURED)) {
@@ -1735,7 +1676,7 @@ struct lnet_peer_net *
 		goto out;
 	}
 
-	lpni = lnet_find_peer_ni_locked(nid4);
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (lpni) {
 		/*
 		 * A peer_ni already exists. This is only a problem if
@@ -1764,14 +1705,14 @@ struct lnet_peer_net *
 			}
 			lnet_peer_del(lpni->lpni_peer_net->lpn_peer);
 			lnet_peer_ni_decref_locked(lpni);
-			lpni = lnet_peer_ni_alloc(&nid);
+			lpni = lnet_peer_ni_alloc(nid);
 			if (!lpni) {
 				rc = -ENOMEM;
 				goto out_free_lpni;
 			}
 		}
 	} else {
-		lpni = lnet_peer_ni_alloc(&nid);
+		lpni = lnet_peer_ni_alloc(nid);
 		if (!lpni) {
 			rc = -ENOMEM;
 			goto out_free_lpni;
@@ -1782,9 +1723,9 @@ struct lnet_peer_net *
 	 * Get the peer_net. Check that we're not adding a second
 	 * peer_ni on a peer_net of a non-multi-rail peer.
 	 */
-	lpn = lnet_peer_get_net_locked(lp, LNET_NIDNET(nid4));
+	lpn = lnet_peer_get_net_locked(lp, LNET_NID_NET(nid));
 	if (!lpn) {
-		lpn = lnet_peer_net_alloc(LNET_NIDNET(nid4));
+		lpn = lnet_peer_net_alloc(LNET_NID_NET(nid));
 		if (!lpn) {
 			rc = -ENOMEM;
 			goto out_free_lpni;
@@ -1800,7 +1741,7 @@ struct lnet_peer_net *
 	lnet_peer_ni_decref_locked(lpni);
 out:
 	CDEBUG(D_NET, "peer %s NID %s flags %#x: %d\n",
-	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nid2str(nid4),
+	       libcfs_nidstr(&lp->lp_primary_nid), libcfs_nidstr(nid),
 	       flags, rc);
 	return rc;
 }
@@ -1811,16 +1752,16 @@ struct lnet_peer_net *
  * Call with the lnet_api_mutex held.
  */
 static int
-lnet_peer_set_primary_nid(struct lnet_peer *lp, lnet_nid_t nid,
+lnet_peer_set_primary_nid(struct lnet_peer *lp, struct lnet_nid *nid,
 			  unsigned int flags)
 {
 	struct lnet_nid old = lp->lp_primary_nid;
 	int rc = 0;
 
-	if (lnet_nid_to_nid4(&lp->lp_primary_nid) == nid)
+	if (nid_same(&lp->lp_primary_nid, nid))
 		goto out;
 
-	lnet_nid4_to_nid(nid, &lp->lp_primary_nid);
+	lp->lp_primary_nid = *nid;
 
 	rc = lnet_peer_add_nid(lp, nid, flags);
 	if (rc) {
@@ -1829,7 +1770,7 @@ struct lnet_peer_net *
 	}
 out:
 	CDEBUG(D_NET, "peer %s NID %s: %d\n",
-	       libcfs_nidstr(&old), libcfs_nid2str(nid), rc);
+	       libcfs_nidstr(&old), libcfs_nidstr(nid), rc);
 
 	return rc;
 }
@@ -1908,16 +1849,20 @@ struct lnet_peer_net *
  * being created/modified/deleted by a different thread.
  */
 int
-lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr, bool temp)
+lnet_add_peer_ni(lnet_nid_t prim_nid4, lnet_nid_t nid4, bool mr, bool temp)
 {
+	struct lnet_nid prim_nid, nid;
 	struct lnet_peer *lp = NULL;
 	struct lnet_peer_ni *lpni;
 	unsigned int flags = 0;
 
 	/* The prim_nid must always be specified */
-	if (prim_nid == LNET_NID_ANY)
+	if (prim_nid4 == LNET_NID_ANY)
 		return -EINVAL;
 
+	lnet_nid4_to_nid(prim_nid4, &prim_nid);
+	lnet_nid4_to_nid(nid4, &nid);
+
 	if (!temp)
 		flags = LNET_PEER_CONFIGURED;
 
@@ -1928,11 +1873,11 @@ struct lnet_peer_net *
 	 * If nid isn't specified, we must create a new peer with
 	 * prim_nid as its primary nid.
 	 */
-	if (nid == LNET_NID_ANY)
-		return lnet_peer_add(prim_nid, flags);
+	if (nid4 == LNET_NID_ANY)
+		return lnet_peer_add(&prim_nid, flags);
 
 	/* Look up the prim_nid, which must exist. */
-	lpni = lnet_find_peer_ni_locked(prim_nid);
+	lpni = lnet_peer_ni_find_locked(&prim_nid);
 	if (!lpni)
 		return -ENOENT;
 	lnet_peer_ni_decref_locked(lpni);
@@ -1941,14 +1886,14 @@ struct lnet_peer_net *
 	/* Peer must have been configured. */
 	if (!temp && !(lp->lp_state & LNET_PEER_CONFIGURED)) {
 		CDEBUG(D_NET, "peer %s was not configured\n",
-		       libcfs_nid2str(prim_nid));
+		       libcfs_nidstr(&prim_nid));
 		return -ENOENT;
 	}
 
 	/* Primary NID must match */
-	if (lnet_nid_to_nid4(&lp->lp_primary_nid) != prim_nid) {
+	if (!nid_same(&lp->lp_primary_nid, &prim_nid)) {
 		CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n",
-		       libcfs_nid2str(prim_nid),
+		       libcfs_nidstr(&prim_nid),
 		       libcfs_nidstr(&lp->lp_primary_nid));
 		return -ENODEV;
 	}
@@ -1956,11 +1901,11 @@ struct lnet_peer_net *
 	/* Multi-Rail flag must match. */
 	if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) {
 		CDEBUG(D_NET, "multi-rail state mismatch for peer %s\n",
-		       libcfs_nid2str(prim_nid));
+		       libcfs_nidstr(&prim_nid));
 		return -EPERM;
 	}
 
-	return lnet_peer_add_nid(lp, nid, flags);
+	return lnet_peer_add_nid(lp, &nid, flags);
 }
 
 /*
@@ -1975,24 +1920,26 @@ struct lnet_peer_net *
  * being modified/deleted by a different thread.
  */
 int
-lnet_del_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid)
+lnet_del_peer_ni(lnet_nid_t prim_nid4, lnet_nid_t nid)
 {
 	struct lnet_peer *lp;
 	struct lnet_peer_ni *lpni;
 	unsigned int flags;
+	struct lnet_nid prim_nid;
 
-	if (prim_nid == LNET_NID_ANY)
+	if (prim_nid4 == LNET_NID_ANY)
 		return -EINVAL;
+	lnet_nid4_to_nid(prim_nid4, &prim_nid);
 
-	lpni = lnet_find_peer_ni_locked(prim_nid);
+	lpni = lnet_peer_ni_find_locked(&prim_nid);
 	if (!lpni)
 		return -ENOENT;
 	lnet_peer_ni_decref_locked(lpni);
 	lp = lpni->lpni_peer_net->lpn_peer;
 
-	if (prim_nid != lnet_nid_to_nid4(&lp->lp_primary_nid)) {
+	if (!nid_same(&prim_nid, &lp->lp_primary_nid)) {
 		CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n",
-		       libcfs_nid2str(prim_nid),
+		       libcfs_nidstr(&prim_nid),
 		       libcfs_nidstr(&lp->lp_primary_nid));
 		return -ENODEV;
 	}
@@ -2001,7 +1948,7 @@ struct lnet_peer_net *
 	if (lp->lp_rtr_refcount > 0) {
 		lnet_net_unlock(LNET_LOCK_EX);
 		CERROR("%s is a router. Can not be deleted\n",
-		       libcfs_nid2str(prim_nid));
+		       libcfs_nidstr(&prim_nid));
 		return -EBUSY;
 	}
 	lnet_net_unlock(LNET_LOCK_EX);
@@ -2141,19 +2088,6 @@ struct lnet_peer_ni *
 	return lpni;
 }
 
-struct lnet_peer_ni *
-lnet_nid2peerni_locked(lnet_nid_t nid4, lnet_nid_t pref4, int cpt)
-{
-	struct lnet_nid nid, pref;
-
-	lnet_nid4_to_nid(nid4, &nid);
-	lnet_nid4_to_nid(pref4, &pref);
-	if (pref4 == LNET_NID_ANY)
-		return lnet_peerni_by_nid_locked(&nid, NULL, cpt);
-	else
-		return lnet_peerni_by_nid_locked(&nid, &pref, cpt);
-}
-
 bool
 lnet_peer_gw_discovery(struct lnet_peer *lp)
 {
@@ -2964,6 +2898,7 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	lnet_nid_t *curnis = NULL;
 	struct lnet_ni_status *addnis = NULL;
 	lnet_nid_t *delnis = NULL;
+	struct lnet_nid nid;
 	unsigned int flags;
 	int ncurnis;
 	int naddnis;
@@ -3031,7 +2966,8 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 				 * peer with the latest information we
 				 * received
 				 */
-				lpni = lnet_find_peer_ni_locked(curnis[i]);
+				lnet_nid4_to_nid(curnis[i], &nid);
+				lpni = lnet_peer_ni_find_locked(&nid);
 				if (lpni) {
 					lpni->lpni_ns_status =
 						pbuf->pb_info.pi_ni[j].ns_status;
@@ -3053,7 +2989,8 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 		goto out;
 
 	for (i = 0; i < naddnis; i++) {
-		rc = lnet_peer_add_nid(lp, addnis[i].ns_nid, flags);
+		lnet_nid4_to_nid(addnis[i].ns_nid, &nid);
+		rc = lnet_peer_add_nid(lp, &nid, flags);
 		if (rc) {
 			CERROR("Error adding NID %s to peer %s: %d\n",
 			       libcfs_nid2str(addnis[i].ns_nid),
@@ -3061,7 +2998,7 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 			if (rc == -ENOMEM)
 				goto out;
 		}
-		lpni = lnet_find_peer_ni_locked(addnis[i].ns_nid);
+		lpni = lnet_peer_ni_find_locked(&nid);
 		if (lpni) {
 			lpni->lpni_ns_status = addnis[i].ns_status;
 			lnet_peer_ni_decref_locked(lpni);
@@ -3090,7 +3027,8 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 	 * peer's lp_peer_nets list, and the peer NI for the primary NID should
 	 * be the first entry in its peer net's lpn_peer_nis list.
 	 */
-	lpni = lnet_find_peer_ni_locked(pbuf->pb_info.pi_ni[1].ns_nid);
+	lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
+	lpni = lnet_peer_ni_find_locked(&nid);
 	if (!lpni) {
 		CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n",
 		       libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid));
@@ -3286,7 +3224,7 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 {
 	struct lnet_ping_buffer *pbuf;
 	struct lnet_peer_ni *lpni;
-	lnet_nid_t nid = LNET_NID_ANY;
+	struct lnet_nid nid;
 	unsigned int flags;
 	int rc = 0;
 
@@ -3344,9 +3282,9 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 		lnet_ping_buffer_decref(pbuf);
 		goto out;
 	}
-	nid = pbuf->pb_info.pi_ni[1].ns_nid;
+	lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid);
 	if (nid_is_lo0(&lp->lp_primary_nid)) {
-		rc = lnet_peer_set_primary_nid(lp, nid, flags);
+		rc = lnet_peer_set_primary_nid(lp, &nid, flags);
 		if (rc)
 			lnet_ping_buffer_decref(pbuf);
 		else
@@ -3358,19 +3296,19 @@ static int lnet_peer_data_present(struct lnet_peer *lp)
 	 * to update the status of the nids that we currently have
 	 * recorded in that peer.
 	 */
-	} else if (lnet_nid_to_nid4(&lp->lp_primary_nid) == nid ||
+	} else if (nid_same(&lp->lp_primary_nid, &nid) ||
 		   (lnet_is_nid_in_ping_info(lnet_nid_to_nid4(&lp->lp_primary_nid),
 					     &pbuf->pb_info) &&
 		    lnet_is_discovery_disabled(lp))) {
 		rc = lnet_peer_merge_data(lp, pbuf);
 	} else {
-		lpni = lnet_find_peer_ni_locked(nid);
+		lpni = lnet_peer_ni_find_locked(&nid);
 		if (!lpni || lp == lpni->lpni_peer_net->lpn_peer) {
-			rc = lnet_peer_set_primary_nid(lp, nid, flags);
+			rc = lnet_peer_set_primary_nid(lp, &nid, flags);
 			if (rc) {
 				CERROR("Primary NID error %s versus %s: %d\n",
 				       libcfs_nidstr(&lp->lp_primary_nid),
-				       libcfs_nid2str(nid), rc);
+				       libcfs_nidstr(&nid), rc);
 				lnet_ping_buffer_decref(pbuf);
 			} else {
 				rc = lnet_peer_merge_data(lp, pbuf);
@@ -3939,19 +3877,21 @@ void lnet_peer_discovery_stop(void)
 /* Debugging */
 
 void
-lnet_debug_peer(lnet_nid_t nid)
+lnet_debug_peer(lnet_nid_t nid4)
 {
 	char *aliveness = "NA";
 	struct lnet_peer_ni *lp;
+	struct lnet_nid nid;
 	int cpt;
 
-	cpt = lnet_cpt_of_nid(nid, NULL);
+	lnet_nid4_to_nid(nid4, &nid);
+	cpt = lnet_nid2cpt(&nid, NULL);
 	lnet_net_lock(cpt);
 
-	lp = lnet_nid2peerni_locked(nid, LNET_NID_ANY, cpt);
+	lp = lnet_peerni_by_nid_locked(&nid, NULL, cpt);
 	if (IS_ERR(lp)) {
 		lnet_net_unlock(cpt);
-		CDEBUG(D_WARNING, "No peer %s\n", libcfs_nid2str(nid));
+		CDEBUG(D_WARNING, "No peer %s\n", libcfs_nidstr(&nid));
 		return;
 	}
 
@@ -4046,18 +3986,19 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	struct lnet_peer_ni_credit_info *lpni_info;
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer *lp;
-	lnet_nid_t nid;
+	struct lnet_nid nid;
+	lnet_nid_t nid4;
 	u32 size;
 	int rc;
 
-	lp = lnet_find_peer4(cfg->prcfg_prim_nid);
-
+	lnet_nid4_to_nid(cfg->prcfg_prim_nid, &nid);
+	lp = lnet_find_peer(&nid);
 	if (!lp) {
 		rc = -ENOENT;
 		goto out;
 	}
 
-	size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats) +
+	size = sizeof(nid4) + sizeof(*lpni_info) + sizeof(*lpni_stats) +
 	       sizeof(*lpni_msg_stats) + sizeof(*lpni_hstats);
 	size *= lp->lp_nnis;
 	if (size > cfg->prcfg_size) {
@@ -4094,10 +4035,10 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) {
 		if (!nid_is_nid4(&lpni->lpni_nid))
 			continue;
-		nid = lnet_nid_to_nid4(&lpni->lpni_nid);
-		if (copy_to_user(bulk, &nid, sizeof(nid)))
+		nid4 = lnet_nid_to_nid4(&lpni->lpni_nid);
+		if (copy_to_user(bulk, &nid4, sizeof(nid4)))
 			goto out_free_hstats;
-		bulk += sizeof(nid);
+		bulk += sizeof(nid4);
 
 		memset(lpni_info, 0, sizeof(*lpni_info));
 		snprintf(lpni_info->cr_aliveness, LNET_MAX_STR_LEN, "NA");
@@ -4218,12 +4159,13 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 
 /* Call with the ln_api_mutex held */
 void
-lnet_peer_ni_set_healthv(lnet_nid_t nid, int value, bool all)
+lnet_peer_ni_set_healthv(lnet_nid_t nid4, int value, bool all)
 {
 	struct lnet_peer_table *ptable;
 	struct lnet_peer *lp;
 	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
+	struct lnet_nid nid;
 	int lncpt;
 	int cpt;
 	time64_t now;
@@ -4231,11 +4173,12 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 	if (the_lnet.ln_state != LNET_STATE_RUNNING)
 		return;
 
+	lnet_nid4_to_nid(nid4, &nid);
 	now = ktime_get_seconds();
 
 	if (!all) {
 		lnet_net_lock(LNET_LOCK_EX);
-		lpni = lnet_find_peer_ni_locked(nid);
+		lpni = lnet_peer_ni_find_locked(&nid);
 		if (!lpni) {
 			lnet_net_unlock(LNET_LOCK_EX);
 			return;
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index b4f7aaa..bbef2b3 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -1199,8 +1199,7 @@ bool lnet_router_checker_active(void)
 		spin_unlock(&rtr->lp_lock);
 
 		/* find the peer_ni associated with the primary NID */
-		lpni = lnet_peer_get_ni_locked(rtr,
-					       lnet_nid_to_nid4(&rtr->lp_primary_nid));
+		lpni = lnet_peer_ni_get_locked(rtr, &rtr->lp_primary_nid);
 		if (!lpni) {
 			CDEBUG(D_NET,
 			       "Expected to find an lpni for %s, but non found\n",
@@ -1701,25 +1700,26 @@ bool lnet_router_checker_active(void)
  * when: notificaiton time.
  */
 int
-lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, bool alive, bool reset,
+lnet_notify(struct lnet_ni *ni, lnet_nid_t nid4, bool alive, bool reset,
 	    time64_t when)
 {
 	struct lnet_peer_ni *lpni = NULL;
 	struct lnet_route *route;
 	struct lnet_peer *lp;
 	time64_t now = ktime_get_seconds();
+	struct lnet_nid nid;
 	int cpt;
 
 	LASSERT(!in_interrupt());
 
 	CDEBUG(D_NET, "%s notifying %s: %s\n",
 	       !ni ? "userspace" : libcfs_nidstr(&ni->ni_nid),
-	       libcfs_nid2str(nid), alive ? "up" : "down");
+	       libcfs_nidstr(&nid), alive ? "up" : "down");
 
 	if (ni &&
-	    LNET_NID_NET(&ni->ni_nid) != LNET_NIDNET(nid)) {
+	    LNET_NID_NET(&ni->ni_nid) != LNET_NID_NET(&nid)) {
 		CWARN("Ignoring notification of %s %s by %s (different net)\n",
-		      libcfs_nid2str(nid), alive ? "birth" : "death",
+		      libcfs_nidstr(&nid), alive ? "birth" : "death",
 		      libcfs_nidstr(&ni->ni_nid));
 		return -EINVAL;
 	}
@@ -1728,7 +1728,7 @@ bool lnet_router_checker_active(void)
 	if (when > now) {
 		CWARN("Ignoring prediction from %s of %s %s %lld seconds in the future\n",
 		      ni ? libcfs_nidstr(&ni->ni_nid) : "userspace",
-		      libcfs_nid2str(nid), alive ? "up" : "down", when - now);
+		      libcfs_nidstr(&nid), alive ? "up" : "down", when - now);
 		return -EINVAL;
 	}
 
@@ -1746,11 +1746,11 @@ bool lnet_router_checker_active(void)
 		return -ESHUTDOWN;
 	}
 
-	lpni = lnet_find_peer_ni_locked(nid);
+	lpni = lnet_peer_ni_find_locked(&nid);
 	if (!lpni) {
 		/* nid not found */
 		lnet_net_unlock(0);
-		CDEBUG(D_NET, "%s not found\n", libcfs_nid2str(nid));
+		CDEBUG(D_NET, "%s not found\n", libcfs_nidstr(&nid));
 		return 0;
 	}
 
diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c
index 7fa4f88..2594df1 100644
--- a/net/lnet/lnet/udsp.c
+++ b/net/lnet/lnet/udsp.c
@@ -1052,17 +1052,19 @@ struct lnet_udsp *
 {
 	struct lnet_ni *ni;
 	struct lnet_peer_ni *lpni;
+	struct lnet_nid nid;
 
 	lnet_net_lock(0);
+	lnet_nid4_to_nid(info->cud_nid, &nid);
 	if (!info->cud_peer) {
-		ni = lnet_nid2ni_locked(info->cud_nid, 0);
+		ni = lnet_nid_to_ni_locked(&nid, 0);
 		if (ni)
 			lnet_udsp_get_ni_info(info, ni);
 	} else {
-		lpni = lnet_find_peer_ni_locked(info->cud_nid);
+		lpni = lnet_peer_ni_find_locked(&nid);
 		if (!lpni) {
 			CDEBUG(D_NET, "nid %s is not found\n",
-			       libcfs_nid2str(info->cud_nid));
+			       libcfs_nidstr(&nid));
 		} else {
 			lnet_udsp_get_peer_info(info, lpni);
 			lnet_peer_ni_decref_locked(lpni);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 13/32] lnet: change lnet_*_peer_ni to take struct lnet_nid
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (11 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 12/32] lnet: discard some peer_ni lookup functions James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:37 ` [lustre-devel] [PATCH 14/32] lnet: Ensure round robin across nets James Simmons
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

lnet_add_peer_ni() and lnet_del_peer_ni() now take
struct lnet_nid rather than lnet_nid_t.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: d9af9b5a7ee706660 ("LU-10391 lnet: change lnet_*_peer_ni to take struct lnet_nid")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44625
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  5 +--
 net/lnet/lnet/api-ni.c        | 14 +++++---
 net/lnet/lnet/peer.c          | 74 +++++++++++++++++++++----------------------
 3 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 3bdb49e..5a83190 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -930,8 +930,9 @@ bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni,
 int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, struct lnet_nid *nid);
 int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni,
 				     struct lnet_nid *nid);
-int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr, bool temp);
-int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid);
+int lnet_add_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, bool mr,
+		     bool temp);
+int lnet_del_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid);
 int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk);
 int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
 			  char alivness[LNET_MAX_STR_LEN],
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 124ec86..7c94d16 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4176,27 +4176,31 @@ u32 lnet_get_dlc_seq_locked(void)
 
 	case IOC_LIBCFS_ADD_PEER_NI: {
 		struct lnet_ioctl_peer_cfg *cfg = arg;
+		struct lnet_nid prim_nid;
 
 		if (cfg->prcfg_hdr.ioc_len < sizeof(*cfg))
 			return -EINVAL;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		rc = lnet_add_peer_ni(cfg->prcfg_prim_nid,
-				      cfg->prcfg_cfg_nid,
-				      cfg->prcfg_mr, false);
+		lnet_nid4_to_nid(cfg->prcfg_prim_nid, &prim_nid);
+		lnet_nid4_to_nid(cfg->prcfg_cfg_nid, &nid);
+		rc = lnet_add_peer_ni(&prim_nid, &nid, cfg->prcfg_mr, false);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc;
 	}
 
 	case IOC_LIBCFS_DEL_PEER_NI: {
 		struct lnet_ioctl_peer_cfg *cfg = arg;
+		struct lnet_nid prim_nid;
 
 		if (cfg->prcfg_hdr.ioc_len < sizeof(*cfg))
 			return -EINVAL;
 
 		mutex_lock(&the_lnet.ln_api_mutex);
-		rc = lnet_del_peer_ni(cfg->prcfg_prim_nid,
-				      cfg->prcfg_cfg_nid);
+		lnet_nid4_to_nid(cfg->prcfg_prim_nid, &prim_nid);
+		lnet_nid4_to_nid(cfg->prcfg_cfg_nid, &nid);
+		rc = lnet_del_peer_ni(&prim_nid,
+				      &nid);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc;
 	}
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 7a96a2f..8d81a7d 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -519,15 +519,14 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
  *  -EBUSY:  The lnet_peer_ni is the primary, and not the only peer_ni.
  */
 static int
-lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid4, unsigned int flags)
+lnet_peer_del_nid(struct lnet_peer *lp, struct lnet_nid *nid,
+		  unsigned int flags)
 {
 	struct lnet_peer_ni *lpni;
 	struct lnet_nid primary_nid = lp->lp_primary_nid;
-	struct lnet_nid nid;
 	int rc = 0;
 	bool force = (flags & LNET_PEER_RTR_NI_FORCE_DEL) ? true : false;
 
-	lnet_nid4_to_nid(nid4, &nid);
 	if (!(flags & LNET_PEER_CONFIGURED)) {
 		if (lp->lp_state & LNET_PEER_CONFIGURED) {
 			rc = -EPERM;
@@ -535,7 +534,7 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 		}
 	}
 
-	lpni = lnet_peer_ni_find_locked(&nid);
+	lpni = lnet_peer_ni_find_locked(nid);
 	if (!lpni) {
 		rc = -ENOENT;
 		goto out;
@@ -550,14 +549,14 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 	 * This function only allows deletion of the primary NID if it
 	 * is the only NID.
 	 */
-	if (nid_same(&nid, &lp->lp_primary_nid) && lp->lp_nnis != 1 && !force) {
+	if (nid_same(nid, &lp->lp_primary_nid) && lp->lp_nnis != 1 && !force) {
 		rc = -EBUSY;
 		goto out;
 	}
 
 	lnet_net_lock(LNET_LOCK_EX);
 
-	if (nid_same(&nid, &lp->lp_primary_nid) && lp->lp_nnis != 1 && force) {
+	if (nid_same(nid, &lp->lp_primary_nid) && lp->lp_nnis != 1 && force) {
 		struct lnet_peer_ni *lpni2;
 		/* assign the next peer_ni to be the primary */
 		lpni2 = lnet_get_next_peer_ni_locked(lp, NULL, lpni);
@@ -570,7 +569,7 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 
 out:
 	CDEBUG(D_NET, "peer %s NID %s flags %#x: %d\n",
-	       libcfs_nidstr(&primary_nid), libcfs_nidstr(&nid),
+	       libcfs_nidstr(&primary_nid), libcfs_nidstr(nid),
 	       flags, rc);
 
 	return rc;
@@ -1333,7 +1332,7 @@ struct lnet_peer_ni *
 int
 LNetAddPeer(lnet_nid_t *nids, u32 num_nids)
 {
-	lnet_nid_t pnid = 0;
+	struct lnet_nid pnid = LNET_ANY_NID;
 	bool mr;
 	int i, rc;
 
@@ -1350,16 +1349,21 @@ struct lnet_peer_ni *
 
 	rc = 0;
 	for (i = 0; i < num_nids; i++) {
+		struct lnet_nid nid;
+
 		if (nids[i] == LNET_NID_LO_0)
 			continue;
 
-		if (!pnid) {
-			pnid = nids[i];
-			rc = lnet_add_peer_ni(pnid, LNET_NID_ANY, mr, true);
+		lnet_nid4_to_nid(nids[i], &nid);
+		if (LNET_NID_IS_ANY(&pnid)) {
+			lnet_nid4_to_nid(nids[i], &pnid);
+			rc = lnet_add_peer_ni(&pnid, &LNET_ANY_NID, mr, true);
 		} else if (lnet_peer_discovery_disabled) {
-			rc = lnet_add_peer_ni(nids[i], LNET_NID_ANY, mr, true);
+			lnet_nid4_to_nid(nids[i], &nid);
+			rc = lnet_add_peer_ni(&nid, &LNET_ANY_NID, mr, true);
 		} else {
-			rc = lnet_add_peer_ni(pnid, nids[i], mr, true);
+			lnet_nid4_to_nid(nids[i], &nid);
+			rc = lnet_add_peer_ni(&pnid, &nid, mr, true);
 		}
 
 		if (rc && rc != -EEXIST)
@@ -1849,20 +1853,17 @@ struct lnet_peer_net *
  * being created/modified/deleted by a different thread.
  */
 int
-lnet_add_peer_ni(lnet_nid_t prim_nid4, lnet_nid_t nid4, bool mr, bool temp)
+lnet_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr,
+		 bool temp)
 {
-	struct lnet_nid prim_nid, nid;
 	struct lnet_peer *lp = NULL;
 	struct lnet_peer_ni *lpni;
 	unsigned int flags = 0;
 
 	/* The prim_nid must always be specified */
-	if (prim_nid4 == LNET_NID_ANY)
+	if (LNET_NID_IS_ANY(prim_nid))
 		return -EINVAL;
 
-	lnet_nid4_to_nid(prim_nid4, &prim_nid);
-	lnet_nid4_to_nid(nid4, &nid);
-
 	if (!temp)
 		flags = LNET_PEER_CONFIGURED;
 
@@ -1873,11 +1874,11 @@ struct lnet_peer_net *
 	 * If nid isn't specified, we must create a new peer with
 	 * prim_nid as its primary nid.
 	 */
-	if (nid4 == LNET_NID_ANY)
-		return lnet_peer_add(&prim_nid, flags);
+	if (LNET_NID_IS_ANY(nid))
+		return lnet_peer_add(prim_nid, flags);
 
 	/* Look up the prim_nid, which must exist. */
-	lpni = lnet_peer_ni_find_locked(&prim_nid);
+	lpni = lnet_peer_ni_find_locked(prim_nid);
 	if (!lpni)
 		return -ENOENT;
 	lnet_peer_ni_decref_locked(lpni);
@@ -1886,14 +1887,14 @@ struct lnet_peer_net *
 	/* Peer must have been configured. */
 	if (!temp && !(lp->lp_state & LNET_PEER_CONFIGURED)) {
 		CDEBUG(D_NET, "peer %s was not configured\n",
-		       libcfs_nidstr(&prim_nid));
+		       libcfs_nidstr(prim_nid));
 		return -ENOENT;
 	}
 
 	/* Primary NID must match */
-	if (!nid_same(&lp->lp_primary_nid, &prim_nid)) {
+	if (!nid_same(&lp->lp_primary_nid, prim_nid)) {
 		CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n",
-		       libcfs_nidstr(&prim_nid),
+		       libcfs_nidstr(prim_nid),
 		       libcfs_nidstr(&lp->lp_primary_nid));
 		return -ENODEV;
 	}
@@ -1901,11 +1902,11 @@ struct lnet_peer_net *
 	/* Multi-Rail flag must match. */
 	if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) {
 		CDEBUG(D_NET, "multi-rail state mismatch for peer %s\n",
-		       libcfs_nidstr(&prim_nid));
+		       libcfs_nidstr(prim_nid));
 		return -EPERM;
 	}
 
-	return lnet_peer_add_nid(lp, &nid, flags);
+	return lnet_peer_add_nid(lp, nid, flags);
 }
 
 /*
@@ -1920,26 +1921,24 @@ struct lnet_peer_net *
  * being modified/deleted by a different thread.
  */
 int
-lnet_del_peer_ni(lnet_nid_t prim_nid4, lnet_nid_t nid)
+lnet_del_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid)
 {
 	struct lnet_peer *lp;
 	struct lnet_peer_ni *lpni;
 	unsigned int flags;
-	struct lnet_nid prim_nid;
 
-	if (prim_nid4 == LNET_NID_ANY)
+	if (!prim_nid || LNET_NID_IS_ANY(prim_nid))
 		return -EINVAL;
-	lnet_nid4_to_nid(prim_nid4, &prim_nid);
 
-	lpni = lnet_peer_ni_find_locked(&prim_nid);
+	lpni = lnet_peer_ni_find_locked(prim_nid);
 	if (!lpni)
 		return -ENOENT;
 	lnet_peer_ni_decref_locked(lpni);
 	lp = lpni->lpni_peer_net->lpn_peer;
 
-	if (!nid_same(&prim_nid, &lp->lp_primary_nid)) {
+	if (!nid_same(prim_nid, &lp->lp_primary_nid)) {
 		CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n",
-		       libcfs_nidstr(&prim_nid),
+		       libcfs_nidstr(prim_nid),
 		       libcfs_nidstr(&lp->lp_primary_nid));
 		return -ENODEV;
 	}
@@ -1948,12 +1947,12 @@ struct lnet_peer_net *
 	if (lp->lp_rtr_refcount > 0) {
 		lnet_net_unlock(LNET_LOCK_EX);
 		CERROR("%s is a router. Can not be deleted\n",
-		       libcfs_nidstr(&prim_nid));
+		       libcfs_nidstr(prim_nid));
 		return -EBUSY;
 	}
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	if (nid == LNET_NID_ANY || nid == lnet_nid_to_nid4(&lp->lp_primary_nid))
+	if (LNET_NID_IS_ANY(nid) || nid_same(nid, &lp->lp_primary_nid))
 		return lnet_peer_del(lp);
 
 	flags = LNET_PEER_CONFIGURED;
@@ -3011,9 +3010,10 @@ static int lnet_peer_merge_data(struct lnet_peer *lp,
 		 * being told that the router changed its primary_nid
 		 * then it's okay to delete it.
 		 */
+		lnet_nid4_to_nid(delnis[i], &nid);
 		if (lp->lp_rtr_refcount > 0)
 			flags |= LNET_PEER_RTR_NI_FORCE_DEL;
-		rc = lnet_peer_del_nid(lp, delnis[i], flags);
+		rc = lnet_peer_del_nid(lp, &nid, flags);
 		if (rc) {
 			CERROR("Error deleting NID %s from peer %s: %d\n",
 			       libcfs_nid2str(delnis[i]),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 14/32] lnet: Ensure round robin across nets
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (12 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 13/32] lnet: change lnet_*_peer_ni to take struct lnet_nid James Simmons
@ 2022-08-04  1:37 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 15/32] lustre: llite: dont restart directIO with IOCB_NOWAIT James Simmons
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:37 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Introduce a global net sequence number and a peer sequence number.
These sequence numbers are used to ensure round robin selection of
local NIs and peer NIs across nets.

Also consolidate the sequence number accounting under
lnet_handle_send(). Previously the sequence number increment for
the final destination peer net/peer NI on a routed send was done
in lnet_handle_find_routed_path().

Some cleanup that is also in this patch:
 - Redundant check of null src_nid is removed from
   lnet_handle_find_routed_path() (LNET_NID_IS_ANY handles null arg)
 - Avoid comparing best_lpn with itself in
   lnet_handle_find_routed_path() on the first loop iteration
 - In lnet_find_best_ni_on_local_net() check whether we have
   a specified lp_disc_net_id outside of the loop to avoid doing
   that work on each loop iteration.

Added some debug statements to print information used when selecting
peer net/local net.

HPE-bug-id: LUS-10871
WC-bug-id: https://jira.whamcloud.com/browse/LU-15713
Lustre-commit: 05413b3d84f7d1feb ("LU-15713 lnet: Ensure round robin across nets")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/46976
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h | 11 ++++-
 net/lnet/lnet/lib-move.c       | 96 +++++++++++++++++++++++++++---------------
 2 files changed, 72 insertions(+), 35 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 1827f4e..09b9d8e 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -765,6 +765,11 @@ struct lnet_peer {
 
 	/* cached peer aliveness */
 	bool			lp_alive;
+
+	/* sequence number used to round robin traffic to this peer's
+	 * nets/NIs
+	 */
+	u32			lp_send_seq;
 };
 
 /*
@@ -1205,10 +1210,12 @@ struct lnet {
 
 	/* LND instances */
 	struct list_head		ln_nets;
-	/* network zombie list */
-	struct list_head		ln_net_zombie;
+	/* Sequence number used to round robin sends across all nets */
+	u32				ln_net_seq;
 	/* the loopback NI */
 	struct lnet_ni		       *ln_loni;
+	/* network zombie list */
+	struct list_head		ln_net_zombie;
 	/* resend messages list */
 	struct list_head		ln_msg_resend;
 	/* spin lock to protect the msg resend list */
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index a514472..6ad0963 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1658,9 +1658,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	 * local ni and local net so that we pick the next ones
 	 * in Round Robin.
 	 */
-	best_lpni->lpni_peer_net->lpn_seq++;
+	best_lpni->lpni_peer_net->lpn_peer->lp_send_seq++;
+	best_lpni->lpni_peer_net->lpn_seq =
+		best_lpni->lpni_peer_net->lpn_peer->lp_send_seq;
 	best_lpni->lpni_seq = best_lpni->lpni_peer_net->lpn_seq;
-	best_ni->ni_net->net_seq++;
+	the_lnet.ln_net_seq++;
+	best_ni->ni_net->net_seq = the_lnet.ln_net_seq;
 	best_ni->ni_seq = best_ni->ni_net->net_seq;
 
 	CDEBUG(D_NET,
@@ -1743,6 +1746,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		 * lnet_select_pathway() function and is never changed.
 		 * It's safe to use it here.
 		 */
+		final_dst_lpni->lpni_peer_net->lpn_peer->lp_send_seq++;
+		final_dst_lpni->lpni_peer_net->lpn_seq =
+			final_dst_lpni->lpni_peer_net->lpn_peer->lp_send_seq;
+		final_dst_lpni->lpni_seq =
+			final_dst_lpni->lpni_peer_net->lpn_seq;
 		msg->msg_hdr.dest_nid = final_dst_lpni->lpni_nid;
 	} else {
 		/* if we're not routing set the dest_nid to the best peer
@@ -1968,8 +1976,10 @@ struct lnet_ni *
 	int best_lpn_healthv = 0;
 	u32 best_lpn_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 
-	CDEBUG(D_NET, "using src nid %s for route restriction\n",
-	       src_nid ? libcfs_nidstr(src_nid) : "ANY");
+	CDEBUG(D_NET, "%s route (%s) from local NI %s to destination %s\n",
+	       LNET_NID_IS_ANY(&sd->sd_rtr_nid) ? "Lookup" : "Specified",
+	       libcfs_nidstr(&sd->sd_rtr_nid), libcfs_nidstr(src_nid),
+	       libcfs_nidstr(&sd->sd_dst_nid));
 
 	/* If a router nid was specified then we are replying to a GET or
 	 * sending an ACK. In this case we use the gateway associated with the
@@ -1989,8 +1999,7 @@ struct lnet_ni *
 	}
 
 	if (!route_found) {
-		if (sd->sd_msg->msg_routing ||
-		    (src_nid && !LNET_NID_IS_ANY(src_nid))) {
+		if (sd->sd_msg->msg_routing || !LNET_NID_IS_ANY(src_nid)) {
 			/* If I'm routing this message then I need to find the
 			 * next hop based on the destination NID
 			 *
@@ -2006,6 +2015,8 @@ struct lnet_ni *
 				       libcfs_nidstr(&sd->sd_dst_nid));
 				return -EHOSTUNREACH;
 			}
+			CDEBUG(D_NET, "best_rnet %s\n",
+			       libcfs_net2str(best_rnet->lrn_net));
 		} else {
 			/* we've already looked up the initial lpni using
 			 * dst_nid
@@ -2023,10 +2034,18 @@ struct lnet_ni *
 				if (!rnet)
 					continue;
 
-				if (!best_lpn) {
-					best_lpn = lpn;
-					best_rnet = rnet;
-				}
+				if (!best_lpn)
+					goto use_lpn;
+				else
+					CDEBUG(D_NET, "n[%s, %s] h[%d, %d], p[%u, %u], s[%d, %d]\n",
+					       libcfs_net2str(lpn->lpn_net_id),
+					       libcfs_net2str(best_lpn->lpn_net_id),
+					       lpn->lpn_healthv,
+					       best_lpn->lpn_healthv,
+					       lpn->lpn_sel_priority,
+					       best_lpn->lpn_sel_priority,
+					       lpn->lpn_seq,
+					       best_lpn->lpn_seq);
 
 				/* select the preferred peer net */
 				if (best_lpn_healthv > lpn->lpn_healthv)
@@ -2054,6 +2073,9 @@ struct lnet_ni *
 				return -EHOSTUNREACH;
 			}
 
+			CDEBUG(D_NET, "selected best_lpn %s\n",
+			       libcfs_net2str(best_lpn->lpn_net_id));
+
 			sd->sd_best_lpni = lnet_find_best_lpni(sd->sd_best_ni,
 							       lnet_nid_to_nid4(&sd->sd_dst_nid),
 							       lp,
@@ -2068,12 +2090,6 @@ struct lnet_ni *
 			 * NI's so update the final destination we selected
 			 */
 			sd->sd_final_dst_lpni = sd->sd_best_lpni;
-
-			/* Increment the sequence number of the remote lpni so
-			 * we can round robin over the different interfaces of
-			 * the remote lpni
-			 */
-			sd->sd_best_lpni->lpni_seq++;
 		}
 
 		/* find the best route. Restrict the selection on the net of the
@@ -2139,14 +2155,12 @@ struct lnet_ni *
 	*gw_lpni = gwni;
 	*gw_peer = gw;
 
-	/* increment the sequence numbers since now we're sure we're
-	 * going to use this path
+	/* increment the sequence number since now we're sure we're
+	 * going to use this route
 	 */
 	if (LNET_NID_IS_ANY(&sd->sd_rtr_nid)) {
 		LASSERT(best_route && last_route);
 		best_route->lr_seq = last_route->lr_seq + 1;
-		if (best_lpn)
-			best_lpn->lpn_seq++;
 	}
 
 	return 0;
@@ -2220,7 +2234,15 @@ struct lnet_ni *
 	u32 lpn_sel_prio;
 	u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 	u32 net_sel_prio;
-	bool exit = false;
+
+	/* if this is a discovery message and lp_disc_net_id is
+	 * specified then use that net to send the discovery on.
+	 */
+	if (discovery && peer->lp_disc_net_id) {
+		best_lpn = lnet_peer_get_net_locked(peer, peer->lp_disc_net_id);
+		if (best_lpn && lnet_get_net_locked(best_lpn->lpn_net_id))
+			goto select_best_ni;
+	}
 
 	/* The peer can have multiple interfaces, some of them can be on
 	 * the local network and others on a routed network. We should
@@ -2241,17 +2263,25 @@ struct lnet_ni *
 		net_healthv = lnet_get_net_healthv_locked(net);
 		net_sel_prio = net->net_sel_priority;
 
-		/* if this is a discovery message and lp_disc_net_id is
-		 * specified then use that net to send the discovery on.
-		 */
-		if (peer->lp_disc_net_id == lpn->lpn_net_id &&
-		    discovery) {
-			exit = true;
-			goto select_lpn;
-		}
-
 		if (!best_lpn)
 			goto select_lpn;
+		else
+			CDEBUG(D_NET,
+			       "n[%s, %s] ph[%d, %d], pp[%u, %u], nh[%d, %d], np[%u, %u], ps[%u, %u], ns[%u, %u]\n",
+			       libcfs_net2str(lpn->lpn_net_id),
+			       libcfs_net2str(best_lpn->lpn_net_id),
+			       lpn->lpn_healthv,
+			       best_lpn_healthv,
+			       lpn_sel_prio,
+			       best_lpn_sel_prio,
+			       net_healthv,
+			       best_net_healthv,
+			       net_sel_prio,
+			       best_net_sel_prio,
+			       lpn->lpn_seq,
+			       best_lpn->lpn_seq,
+			       net->net_seq,
+			       best_net->net_seq);
 
 		/* always select the lpn with the best health */
 		if (best_lpn_healthv > lpn->lpn_healthv)
@@ -2291,15 +2321,15 @@ struct lnet_ni *
 		best_lpn_sel_prio = lpn_sel_prio;
 		best_lpn = lpn;
 		best_net = net;
-
-		if (exit)
-			break;
 	}
 
 	if (best_lpn) {
 		/* Select the best NI on the same net as best_lpn chosen
 		 * above
 		 */
+select_best_ni:
+		CDEBUG(D_NET, "selected best_lpn %s\n",
+		       libcfs_net2str(best_lpn->lpn_net_id));
 		best_ni = lnet_find_best_ni_on_spec_net(NULL, peer, best_lpn,
 							msg, md_cpt);
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 15/32] lustre: llite: dont restart directIO with IOCB_NOWAIT
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (13 preceding siblings ...)
  2022-08-04  1:37 ` [lustre-devel] [PATCH 14/32] lnet: Ensure round robin across nets James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 16/32] lustre: sec: handle read-only flag James Simmons
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

It should handle FLR mirror retry and io_uring with IOCB_NOWAIT
flag differently.

int cl_io_loop(const struct lu_env *env, struct cl_io *io)
{
    ...
    if (result == -EAGAIN && io->ci_ndelay) {
        io->ci_need_restart = 1;
        result = 0;
    }
    ...
}

ssize_t
generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
    ...
    if (iocb->ki_flags & IOCB_NOWAIT) {
        if (filemap_range_has_page(mapping, iocb->ki_pos,
                                   iocb->ki_pos +
                                   count - 1))
                return -EAGAIN;
    ...
    }

In current code, it will restart I/O engine for read when get
-EAGAIN code.
However, for io_uring direct IO with IOCB_NOWAIT, if found that
there are cache pages in the current I/O range, it should return
-EAGAIN to the upper layer immediately. Otherwise, it will stuck
in an endless loop.

This patch also adds a tool "io_uring_probe" to check whether
the kernel supports io_uring fully.
The reason adding this check is because the rhel8.5 kernel has
backported io_uring:
cat /proc/kallsyms |grep io_uring
ffffffffa8510e10 W __x64_sys_io_uring_enter
ffffffffa8510e10 W __x64_sys_io_uring_register
ffffffffa8510e10 W __x64_sys_io_uring_setup
but the io_uring syscalls return -ENOSYS.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15399
Lustre-commit: 8db455c77265063a1 ("LU-15399 llite: dont restart directIO with IOCB_NOWAIT")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/46147
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h | 6 +++++-
 fs/lustre/llite/file.c        | 6 ++++++
 fs/lustre/obdclass/cl_io.c    | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index c66e98c5..c717d03 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1960,7 +1960,11 @@ struct cl_io {
 	/**
 	 * Bypass quota check
 	 */
-	unsigned int		ci_noquota:1;
+	unsigned int		ci_noquota:1,
+	/**
+	 * io_uring direct IO with flags IOCB_NOWAIT.
+	 */
+				ci_iocb_nowait:1;
 	/**
 	 * How many times the read has retried before this one.
 	 * Set by the top level and consumed by the LOV.
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 0e71b3a..3aace07 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1587,6 +1587,12 @@ void ll_io_init(struct cl_io *io, const struct file *file, int write,
 					   IOCB_DSYNC));
 	}
 
+#ifdef IOCB_NOWAIT
+	io->ci_iocb_nowait = !!(args &&
+				(args->u.normal.via_iocb->ki_flags &
+				 IOCB_NOWAIT));
+#endif
+
 	io->ci_obj = ll_i2info(inode)->lli_clob;
 	io->ci_lockreq = CILR_MAYBE;
 	if (ll_file_nolock(file)) {
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 4246e17..c388700 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -776,7 +776,7 @@ int cl_io_loop(const struct lu_env *env, struct cl_io *io)
 	if (rc && !result)
 		result = rc;
 
-	if (result == -EAGAIN && io->ci_ndelay) {
+	if (result == -EAGAIN && io->ci_ndelay && !io->ci_iocb_nowait) {
 		io->ci_need_restart = 1;
 		result = 0;
 	}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 16/32] lustre: sec: handle read-only flag
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (14 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 15/32] lustre: llite: dont restart directIO with IOCB_NOWAIT James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 17/32] lustre: llog: Add LLOG_SKIP_PLAIN to skip llog plain James Simmons
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Add a new 'readonly_mount' property to nodemaps on the server
side. When this property is set, we return -EROFS from server
side if the client is not mounting read-only. So the client
will have to specify the read-only mount option to be allowed
to mount.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15451
Lustre-commit: e7ce67de92dea6870 ("LU-15451 sec: read-only nodemap flag")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46149
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/import.c | 4 ++--
 fs/lustre/ptlrpc/niobuf.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c
index a5fdb8a8..697b3c3 100644
--- a/fs/lustre/ptlrpc/import.c
+++ b/fs/lustre/ptlrpc/import.c
@@ -1306,10 +1306,10 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
 		time64_t next_connect;
 
 		import_set_state_nolock(imp, LUSTRE_IMP_DISCON);
-		if (rc == -EACCES) {
+		if (rc == -EACCES || rc == -EROFS) {
 			/*
 			 * Give up trying to reconnect
-			 * EACCES means client has no permission for connection
+			 * EROFS means client must mount read-only
 			 */
 			imp->imp_obd->obd_no_recov = 1;
 			ptlrpc_deactivate_import_nolock(imp);
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 94a0329..be1811a 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -472,7 +472,8 @@ int ptlrpc_send_error(struct ptlrpc_request *req, int may_be_difficult)
 
 	if (req->rq_status != -ENOSPC && req->rq_status != -EACCES &&
 	    req->rq_status != -EPERM && req->rq_status != -ENOENT &&
-	    req->rq_status != -EINPROGRESS && req->rq_status != -EDQUOT)
+	    req->rq_status != -EINPROGRESS && req->rq_status != -EDQUOT &&
+	    req->rq_status != -EROFS)
 		req->rq_type = PTL_RPC_MSG_ERR;
 
 	rc = ptlrpc_send_reply(req, may_be_difficult);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 17/32] lustre: llog: Add LLOG_SKIP_PLAIN to skip llog plain
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (15 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 16/32] lustre: sec: handle read-only flag James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 18/32] lustre: llite: add projid to debug logs James Simmons
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Etienne AUJAMES, Lustre Development List

From: Etienne AUJAMES <etienne.aujames@cea.fr>

Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip
an entire llog plain.

This could speedup the catalog processing for specific usages when a
record need to be access in the "middle" of the catalog. This could
be useful for changelog with several users or HSM.

This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN.
The main idea came from: d813c75d ("LU-14688 mdt: changelog purge
deletes plain llog")

**Performance test:**

* Environment:
2474195 changelogs record store on the mds0 (40 llog plain):
mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users
current index: 2474195
ID    index (idle seconds)
cl1   0 (3509)

* Test
Access to records at the end of the catalog (offset: 2474194):
client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null

* Results
- with the patch:       real    0m0.592s
- without the patch:    real    0m17.835s (x30)

WC-bug-id: https://jira.whamcloud.com/browse/LU-15481
Lustre-commit: aa22a6826ee521ab1 ("LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain")
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Reviewed-on: https://review.whamcloud.com/46310
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_log.h | 18 +++++++++++++++++-
 fs/lustre/mdc/mdc_changelog.c  |  5 +++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/lustre_log.h b/fs/lustre/include/lustre_log.h
index 2e43d56..dbf3fd6 100644
--- a/fs/lustre/include/lustre_log.h
+++ b/fs/lustre/include/lustre_log.h
@@ -264,7 +264,7 @@ struct llog_ctxt {
 };
 
 #define LLOG_PROC_BREAK 0x0001
-#define LLOG_DEL_RECORD 0x0002
+#define LLOG_SKIP_PLAIN 0x0004
 
 static inline int llog_handle2ops(struct llog_handle *loghandle,
 				  const struct llog_operations **lop)
@@ -375,6 +375,22 @@ static inline int llog_next_block(const struct lu_env *env,
 	return rc;
 }
 
+/* Determine if a llog plain of a catalog could be skiped based on record
+ * custom indexes.
+ * This assumes that indexes follow each other. The number of records to skip
+ * can be computed base on a starting offset and the index of the current
+ * record (in llog catalog callback).
+ */
+static inline int llog_is_plain_skipable(struct llog_log_hdr *lh,
+					 struct llog_rec_hdr *rec,
+					 u64 curr, u64 start)
+{
+	if (start == 0 || curr >= start)
+		return 0;
+
+	return (LLOG_HDR_BITMAP_SIZE(lh) - rec->lrh_index) < (start - curr);
+}
+
 /* llog.c */
 int lustre_process_log(struct super_block *sb, char *logname,
 		       struct config_llog_instance *cfg);
diff --git a/fs/lustre/mdc/mdc_changelog.c b/fs/lustre/mdc/mdc_changelog.c
index 36d7fdd..cd2a610 100644
--- a/fs/lustre/mdc/mdc_changelog.c
+++ b/fs/lustre/mdc/mdc_changelog.c
@@ -225,6 +225,11 @@ static int chlg_read_cat_process_cb(const struct lu_env *env,
 		return rc;
 	}
 
+	/* Check if we can skip the entire llog plain */
+	if (llog_is_plain_skipable(llh->lgh_hdr, hdr, rec->cr.cr_index,
+				   crs->crs_start_offset))
+		return LLOG_SKIP_PLAIN;
+
 	/* Skip undesired records */
 	if (rec->cr.cr_index < crs->crs_start_offset)
 		return 0;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 18/32] lustre: llite: add projid to debug logs
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (16 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 17/32] lustre: llog: Add LLOG_SKIP_PLAIN to skip llog plain James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 19/32] lnet: asym route inconsistency warning James Simmons
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Add some minimal debugging on the client to log
the projid when it is changed, along with the affected FID.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13335
Lustre-commit: 6bceb0030d15b7009 ("LU-13335 ldiskfs: add projid to debug logs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46369
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 3aace07..ac20d05 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3417,6 +3417,8 @@ static int ll_set_project(struct inode *inode, u32 xflags, u32 projid)
 	unsigned int inode_flags;
 	int rc = 0;
 
+	CDEBUG(D_QUOTA, DFID" xflags=%x projid=%u\n",
+	       PFID(ll_inode2fid(inode)), xflags, projid);
 	rc = ll_ioctl_check_project(inode, xflags, projid);
 	if (rc)
 		return rc;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 19/32] lnet: asym route inconsistency warning
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (17 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 18/32] lustre: llite: add projid to debug logs James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 20/32] lnet: libcfs: debugfs file_operation should have an owner James Simmons
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Gian-Carlo DeFazio <defazio1@llnl.gov>

lnet_check_route_inconsistency() checks for inconsistency between
the lr_hops and lr_single_hop values of a route.

A warning is currently emitted if the route is not single hop
and the hop count is either 1 or LNET_UNDEFINED_HOPS.

To emit the warning, add the requirement that
avoid_asym_router_failure is enabled.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14555
Lustre-commit: 6ab060e58e6b3f38b ("LU-14555 lnet: asym route inconsistency warning")
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/46918
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index bbef2b3..b684243 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -369,7 +369,8 @@ bool lnet_is_route_alive(struct lnet_route *route)
 lnet_check_route_inconsistency(struct lnet_route *route)
 {
 	if (!route->lr_single_hop &&
-	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
+	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS) &&
+	    avoid_asym_router_failure) {
 		CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n",
 		      libcfs_net2str(route->lr_net),
 		      libcfs_nidstr(&route->lr_gateway->lp_primary_nid),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 20/32] lnet: libcfs: debugfs file_operation should have an owner
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (18 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 19/32] lnet: asym route inconsistency warning James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 21/32] lustre: client: able to cleanup devices manually James Simmons
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

If debugfs a file is open when unloading the libcfs/lnet module, it
produces a kernel Oops (debugfs file_operations callbacks no longer
exist).

Crash generated with routerstat (/sys/kernel/debug/lnet/stats):
[ 1449.750396] IP: [<ffffffffab24e093>] SyS_lseek+0x83/0x100
[ 1449.750412] PGD 9fa14067 PUD 9fa16067 PMD d4e5d067 PTE 0
[ 1449.750428] Oops: 0000 [#1] SMP
[ 1449.750883]  [<ffffffffab7aaf92>] system_call_fastpath+0x25/0x2a
[ 1449.750897]  [<ffffffffab7aaed5>] ?
system_call_after_swapgs+0xa2/0x13a

This patch adds an owner to debugfs file_operation for libcfs and
lnet_router entries (/sys/kernel/debug/lnet/*).

The following behavior is expected:
$ modprobe lustre
$ routerstat 10 > /dev/null &
$ lustre_rmmod
rmmod: ERROR: Module lnet is in use
Can't read statfile (ENODEV)
[1]+  Exit 1                  routerstat 10 > /dev/null
$ lustre_rmmod

Note that the allocated 'struct file_operations' cannot be freed until
the module_exit() function is called, as files could still be open
until then.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15759
Lustre-commit: b2dfb4457f0f1e56f ("LU-15759 libcfs: debugfs file_operation should have an owner")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/47335
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/libcfs/libcfs.h |  4 +++-
 include/linux/lnet/lib-lnet.h |  1 +
 net/lnet/libcfs/module.c      | 43 ++++++++++++++++++++++++++++++++++++-------
 net/lnet/lnet/module.c        |  1 +
 net/lnet/lnet/router_proc.c   | 10 +++++++++-
 5 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/include/linux/libcfs/libcfs.h b/include/linux/libcfs/libcfs.h
index b59ef9b..e29b007 100644
--- a/include/linux/libcfs/libcfs.h
+++ b/include/linux/libcfs/libcfs.h
@@ -57,8 +57,10 @@ static inline int notifier_from_ioctl_errno(int err)
 
 extern struct workqueue_struct *cfs_rehash_wq;
 
-void lnet_insert_debugfs(struct ctl_table *table);
+void lnet_insert_debugfs(struct ctl_table *table, struct module *mod,
+			 void **statep);
 void lnet_remove_debugfs(struct ctl_table *table);
+void lnet_debugfs_fini(void **statep);
 
 int debugfs_doint(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 5a83190..57c8dc2 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -569,6 +569,7 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid,
 
 int lnet_lib_init(void);
 void lnet_lib_exit(void);
+void lnet_router_exit(void);
 
 void lnet_mt_event_handler(struct lnet_event *event);
 
diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c
index a249bdd..a126683 100644
--- a/net/lnet/libcfs/module.c
+++ b/net/lnet/libcfs/module.c
@@ -746,19 +746,23 @@ static ssize_t lnet_debugfs_write(struct file *filp, const char __user *buf,
 	.llseek		= default_llseek,
 };
 
-static const struct file_operations *lnet_debugfs_fops_select(umode_t mode)
+static const struct file_operations *
+lnet_debugfs_fops_select(umode_t mode, const struct file_operations state[3])
 {
 	if (!(mode & 0222))
-		return &lnet_debugfs_file_operations_ro;
+		return &state[0];
 
 	if (!(mode & 0444))
-		return &lnet_debugfs_file_operations_wo;
+		return &state[1];
 
-	return &lnet_debugfs_file_operations_rw;
+	return &state[2];
 }
 
-void lnet_insert_debugfs(struct ctl_table *table)
+void lnet_insert_debugfs(struct ctl_table *table, struct module *mod,
+			 void **statep)
 {
+	struct file_operations *state = *statep;
+
 	if (!lnet_debugfs_root)
 		lnet_debugfs_root = debugfs_create_dir("lnet", NULL);
 
@@ -766,6 +770,19 @@ void lnet_insert_debugfs(struct ctl_table *table)
 	if (IS_ERR_OR_NULL(lnet_debugfs_root))
 		return;
 
+	if (!state) {
+		state = kmalloc(3 * sizeof(*state), GFP_KERNEL);
+		if (!state)
+			return;
+		state[0] = lnet_debugfs_file_operations_ro;
+		state[0].owner = mod;
+		state[1] = lnet_debugfs_file_operations_wo;
+		state[1].owner = mod;
+		state[2] = lnet_debugfs_file_operations_rw;
+		state[2].owner = mod;
+		*statep = state;
+	}
+
 	/*
 	 * We don't save the dentry returned because we don't call
 	 * debugfs_remove() but rather remove_recursive()
@@ -773,10 +790,18 @@ void lnet_insert_debugfs(struct ctl_table *table)
 	for (; table && table->procname; table++)
 		debugfs_create_file(table->procname, table->mode,
 				    lnet_debugfs_root, table,
-				    lnet_debugfs_fops_select(table->mode));
+				    lnet_debugfs_fops_select(table->mode,
+							     (const struct file_operations *)state));
 }
 EXPORT_SYMBOL_GPL(lnet_insert_debugfs);
 
+void lnet_debugfs_fini(void **state)
+{
+	kfree(*state);
+	*state = NULL;
+}
+EXPORT_SYMBOL_GPL(lnet_debugfs_fini);
+
 static void lnet_insert_debugfs_links(
 	const struct lnet_debugfs_symlink_def *symlinks)
 {
@@ -801,6 +826,8 @@ void lnet_remove_debugfs(struct ctl_table *table)
 static DEFINE_MUTEX(libcfs_startup);
 static int libcfs_active;
 
+static void *debugfs_state;
+
 int libcfs_setup(void)
 {
 	int rc = -EINVAL;
@@ -855,7 +882,7 @@ static int libcfs_init(void)
 {
 	int rc;
 
-	lnet_insert_debugfs(lnet_table);
+	lnet_insert_debugfs(lnet_table, THIS_MODULE, &debugfs_state);
 	if (!IS_ERR_OR_NULL(lnet_debugfs_root))
 		lnet_insert_debugfs_links(lnet_debugfs_symlinks);
 
@@ -875,6 +902,8 @@ static void libcfs_exit(void)
 	debugfs_remove_recursive(lnet_debugfs_root);
 	lnet_debugfs_root = NULL;
 
+	lnet_debugfs_fini(&debugfs_state);
+
 	if (cfs_rehash_wq)
 		destroy_workqueue(cfs_rehash_wq);
 
diff --git a/net/lnet/lnet/module.c b/net/lnet/lnet/module.c
index aba9589..9d7b39a 100644
--- a/net/lnet/lnet/module.c
+++ b/net/lnet/lnet/module.c
@@ -271,6 +271,7 @@ static void __exit lnet_exit(void)
 						&lnet_ioctl_handler);
 	LASSERT(!rc);
 
+	lnet_router_exit();
 	lnet_lib_exit();
 }
 
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index f231da1..689670a 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -888,12 +888,20 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 	}
 };
 
+static void *debugfs_state;
+
 void lnet_router_debugfs_init(void)
 {
-	lnet_insert_debugfs(lnet_table);
+	lnet_insert_debugfs(lnet_table, THIS_MODULE,
+			    &debugfs_state);
 }
 
 void lnet_router_debugfs_fini(void)
 {
 	lnet_remove_debugfs(lnet_table);
 }
+
+void lnet_router_exit(void)
+{
+	lnet_debugfs_fini(&debugfs_state);
+}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 21/32] lustre: client: able to cleanup devices manually
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (19 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 20/32] lnet: libcfs: debugfs file_operation should have an owner James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 22/32] lustre: lmv: support striped LMVs James Simmons
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

Using 'lctl cleanup/detach' could be needed in situations
with unclean umount. Meanwhile that doesn't work now for
LMV and also could cause panic after all

Patch restores ability to cleanup/detach client devices
manually.
- debugfs and lprocfs cleanup in lmv_precleanup() are moved
  lmv_cleanup() to be not cleared too early. This prevents
  hang on 'lctl cleanup' for LMV device
- test 172 is added in sanity. It skips device cleanup during
  normal umount, keeping device alive without client mount
  then manually cleanups/detaches them
- prevent negative lov_connections in lov_disconnect() and
  handle it gracefully
- remove obd_cleanup_client_import() in mdc_precleanup(),
  it is called already inside osc_precleanup_common()

WC-bug-id: https://jira.whamcloud.com/browse/LU-15653
Lustre-commit: 210803a2475862464 ("LU-15653 client: able to cleanup devices manually")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46859
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  1 +
 fs/lustre/llite/llite_lib.c     |  5 +++++
 fs/lustre/lmv/lmv_obd.c         | 14 ++++++--------
 fs/lustre/lov/lov_obd.c         |  8 +++++++-
 fs/lustre/mdc/mdc_request.c     |  7 +++----
 5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index b6c8a72..e25d4ed 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -381,6 +381,7 @@
 #define OBD_FAIL_OBDCLASS_MODULE_LOAD			0x60a
 #define OBD_FAIL_OBD_ZERO_NLINK_RACE			0x60b
 #define OBD_FAIL_OBD_SETUP				0x60d
+#define OBD_FAIL_OBD_CLEANUP				0x60e
 
 #define OBD_FAIL_TGT_REPLY_NET				0x700
 #define OBD_FAIL_TGT_CONN_RACE				0x701
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 5b80722..d947ede 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1377,10 +1377,15 @@ void ll_put_super(struct super_block *sb)
 		client_common_put_super(sb);
 	}
 
+	/* imitate failed cleanup */
+	if (OBD_FAIL_CHECK(OBD_FAIL_OBD_CLEANUP))
+		goto skip_cleanup;
+
 	next = 0;
 	while ((obd = class_devices_in_group(&sbi->ll_sb_uuid, &next)))
 		class_manual_cleanup(obd);
 
+skip_cleanup:
 	if (test_bit(LL_SBI_VERBOSE, sbi->ll_flags))
 		LCONSOLE_WARN("Unmounted %s\n", profilenm ? profilenm : "");
 
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 3af7a53..8656d6b 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -520,7 +520,6 @@ static int lmv_disconnect(struct obd_export *exp)
 	struct obd_device *obd = class_exp2obd(exp);
 	struct lmv_obd *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *tgt;
-	int rc;
 
 	lmv_foreach_connected_tgt(lmv, tgt)
 		lmv_disconnect_mdc(obd, tgt);
@@ -528,11 +527,8 @@ static int lmv_disconnect(struct obd_export *exp)
 	if (lmv->lmv_tgts_kobj)
 		kobject_put(lmv->lmv_tgts_kobj);
 
-	if (!lmv->connected)
-		class_export_put(exp);
-	rc = class_disconnect(exp);
 	lmv->connected = 0;
-	return rc;
+	return class_disconnect(exp);
 }
 
 static int lmv_fid2path(struct obd_export *exp, int len, void *karg,
@@ -1147,6 +1143,11 @@ static int lmv_cleanup(struct obd_device *obd)
 	struct lu_tgt_desc *tmp;
 
 	fld_client_fini(&lmv->lmv_fld);
+	fld_client_debugfs_fini(&lmv->lmv_fld);
+
+	lprocfs_obd_cleanup(obd);
+	ldebugfs_free_md_stats(obd);
+
 	lmv_foreach_tgt_safe(lmv, tgt, tmp)
 		lmv_del_target(lmv, tgt);
 	lu_tgt_descs_fini(&lmv->lmv_mdt_descs);
@@ -3063,9 +3064,6 @@ static int lmv_unlink(struct obd_export *exp, struct md_op_data *op_data,
 static int lmv_precleanup(struct obd_device *obd)
 {
 	libcfs_kkuc_group_rem(&obd->obd_uuid, 0, KUC_GRP_HSM);
-	fld_client_debugfs_fini(&obd->u.lmv.lmv_fld);
-	lprocfs_obd_cleanup(obd);
-	ldebugfs_free_md_stats(obd);
 	return 0;
 }
 
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 61159fd..161226f 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -313,8 +313,14 @@ static int lov_disconnect(struct obd_export *exp)
 		goto out;
 
 	/* Only disconnect the underlying layers on the final disconnect. */
+	if (lov->lov_connects == 0) {
+		CWARN("%s: was disconnected already #%d\n",
+		      obd->obd_name, lov->lov_connects);
+		return 0;
+	}
+
 	lov->lov_connects--;
-	if (lov->lov_connects != 0) {
+	if (lov->lov_connects > 0) {
 		/* why should there be more than 1 connect? */
 		CWARN("%s: unexpected disconnect #%d\n",
 		      obd->obd_name, lov->lov_connects);
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index bb51878..c073da2 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -2959,11 +2959,10 @@ static int mdc_precleanup(struct obd_device *obd)
 	osc_precleanup_common(obd);
 
 	mdc_changelog_cdev_finish(obd);
-
-	obd_cleanup_client_import(obd);
-	ptlrpc_lprocfs_unregister_obd(obd);
-	ldebugfs_free_md_stats(obd);
 	mdc_llog_finish(obd);
+	ldebugfs_free_md_stats(obd);
+	ptlrpc_lprocfs_unregister_obd(obd);
+
 	return 0;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 22/32] lustre: lmv: support striped LMVs
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (20 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 21/32] lustre: client: able to cleanup devices manually James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 23/32] lnet: o2iblnd: add debug messages for IB James Simmons
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

lmv_name_to_stripe_index() should support stripe LMV, which is used
by LFSCK to verify name hash.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15868
Lustre-commit: 54a2d4662b58e2ba4 ("LU-15868 lfsck: don't crash upon dir migration failure")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47381
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_lmv.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h
index b1d8ed9..cd7cf9e 100644
--- a/fs/lustre/include/lustre_lmv.h
+++ b/fs/lustre/include/lustre_lmv.h
@@ -366,14 +366,16 @@ static inline u32 crush_hash(u32 a, u32 b)
 static inline int lmv_name_to_stripe_index(struct lmv_mds_md_v1 *lmv,
 					   const char *name, int namelen)
 {
-	if (lmv->lmv_magic == LMV_MAGIC_V1)
+	if (lmv->lmv_magic == LMV_MAGIC_V1 ||
+	    lmv->lmv_magic == LMV_MAGIC_STRIPE)
 		return __lmv_name_to_stripe_index(lmv->lmv_hash_type,
 						  lmv->lmv_stripe_count,
 						  lmv->lmv_migrate_hash,
 						  lmv->lmv_migrate_offset,
 						  name, namelen, true);
 
-	if (lmv->lmv_magic == cpu_to_le32(LMV_MAGIC_V1))
+	if (lmv->lmv_magic == cpu_to_le32(LMV_MAGIC_V1) ||
+	    lmv->lmv_magic == cpu_to_le32(LMV_MAGIC_STRIPE))
 		return __lmv_name_to_stripe_index(
 					le32_to_cpu(lmv->lmv_hash_type),
 					le32_to_cpu(lmv->lmv_stripe_count),
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 23/32] lnet: o2iblnd: add debug messages for IB
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (21 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 22/32] lustre: lmv: support striped LMVs James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 24/32] lnet: o2iblnd: debug message is missing a newline James Simmons
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

If net debug is enabled, information about connection, when
tx status is ECONNABORTED, is collected (only for IB).

WC-bug-id: https://jira.whamcloud.com/browse/LU-15925
Lustre-commit: 9153049bdc7ec8217 ("LU-15925 lnet: add debug messages for IB")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47583
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 01fa499..d4d8954 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -276,6 +276,13 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type,
 
 	if (!tx->tx_status) {		/* success so far */
 		if (status < 0) {	/* failed? */
+			if (status == -ECONNABORTED) {
+				CDEBUG(D_NET,
+				       "bad status for connection to %s with completion type %x\n",
+				       libcfs_nid2str(conn->ibc_peer->ibp_nid),
+				       txtype);
+			}
+
 			tx->tx_status = status;
 			tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_ERROR;
 		} else if (txtype == IBLND_MSG_GET_REQ) {
@@ -812,6 +819,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	/* I'm still holding ibc_lock! */
 	if (conn->ibc_state != IBLND_CONN_ESTABLISHED) {
+		CDEBUG(D_NET, "connection to %s is not established\n",
+		       conn->ibc_peer ? libcfs_nid2str(conn->ibc_peer->ibp_nid) : "NULL");
 		rc = -ECONNABORTED;
 	} else if (tx->tx_pool->tpo_pool.po_failed ||
 		 conn->ibc_hdev != tx->tx_pool->tpo_hdev) {
@@ -1153,6 +1162,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
 
 	if (conn->ibc_state >= IBLND_CONN_DISCONNECTED) {
+		CDEBUG(D_NET, "connection with %s is disconnected\n",
+		       conn->ibc_peer ? libcfs_nid2str(conn->ibc_peer->ibp_nid) : "NULL");
+
 		tx->tx_status = -ECONNABORTED;
 		tx->tx_waiting = 0;
 		if (tx->tx_conn) {
@@ -2141,10 +2153,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	kiblnd_set_conn_state(conn, IBLND_CONN_DISCONNECTED);
 
-	/*
-	 * Complete all tx descs not waiting for sends to complete.
+	/* Complete all tx descs not waiting for sends to complete.
 	 * NB we should be safe from RDMA now that the QP has changed state
 	 */
+	CDEBUG(D_NET, "abort connection with %s\n",
+	       libcfs_nid2str(conn->ibc_peer->ibp_nid));
+
 	kiblnd_abort_txs(conn, &conn->ibc_tx_noops);
 	kiblnd_abort_txs(conn, &conn->ibc_tx_queue);
 	kiblnd_abort_txs(conn, &conn->ibc_tx_queue_rsrvd);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 24/32] lnet: o2iblnd: debug message is missing a newline
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (22 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 23/32] lnet: o2iblnd: add debug messages for IB James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 25/32] lustre: quota: skip non-exist or inact tgt for lfs_quota James Simmons
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Add missing newline to one of the debug messages in
kiblnd_pool_alloc_node.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15984
Lustre-commit: dd670d968a44f0a70 ("LU-15984 o2iblnd: debug message is missing a newline")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47933
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 65bc89b..ea28c65 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1887,7 +1887,7 @@ struct list_head *kiblnd_pool_alloc_node(struct kib_poolset *ps)
 	CDEBUG(D_NET, "%s pool exhausted, allocate new pool\n", ps->ps_name);
 	time_before = ktime_get();
 	rc = ps->ps_pool_create(ps, ps->ps_pool_size, &pool);
-	CDEBUG(D_NET, "ps_pool_create took %lld ms to complete",
+	CDEBUG(D_NET, "ps_pool_create took %lld ms to complete\n",
 	       ktime_ms_delta(ktime_get(), time_before));
 
 	spin_lock(&ps->ps_lock);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 25/32] lustre: quota: skip non-exist or inact tgt for lfs_quota
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (23 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 24/32] lnet: o2iblnd: debug message is missing a newline James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 26/32] lustre: mdc: pack default LMV in open reply James Simmons
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Hongchao Zhang, Lustre Development List

From: Hongchao Zhang <hongchao@whamcloud.com>

The nonexistent or inactive targets (MDC or OSC) should be skipped
for quota reporting.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14472
Lustre-commit: b54b7ce43929ce7ff ("LU-14472 quota: skip non-exist or inact tgt for lfs_quota")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41771
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c   |  8 +++-----
 fs/lustre/lmv/lmv_obd.c | 15 +++++++++++++--
 fs/lustre/lov/lov_obd.c | 13 ++++++++++++-
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 2b63c48..26c9ec3 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -1215,10 +1215,9 @@ int quotactl_ioctl(struct super_block *sb, struct if_quotactl *qctl)
 			break;
 		}
 
+		qctl->qc_cmd = cmd;
 		if (rc)
 			return rc;
-
-		qctl->qc_cmd = cmd;
 	} else {
 		struct obd_quotactl *oqctl;
 		int oqctl_len = sizeof(*oqctl);
@@ -2009,10 +2008,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		}
 
 		rc = quotactl_ioctl(inode->i_sb, qctl);
-		if (rc == 0 && copy_to_user((void __user *)arg, qctl,
-					    sizeof(*qctl)))
+		if ((rc == 0 || rc == -ENODATA) &&
+		    copy_to_user((void __user *)arg, qctl, sizeof(*qctl)))
 			rc = -EFAULT;
-
 out_quotactl:
 		kfree(qctl);
 		return rc;
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 8656d6b..6c0eb03 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -845,6 +845,7 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp,
 	case OBD_IOC_QUOTACTL: {
 		struct if_quotactl *qctl = karg;
 		struct obd_quotactl *oqctl;
+		struct obd_import *imp;
 
 		if (qctl->qc_valid == QC_MDTIDX) {
 			tgt = lmv_tgt(lmv, qctl->qc_idx);
@@ -863,9 +864,19 @@ static int lmv_iocontrol(unsigned int cmd, struct obd_export *exp,
 			return -EINVAL;
 		}
 
-		if (!tgt || !tgt->ltd_exp)
+		if (!tgt)
+			return -ENODEV;
+
+		if (!tgt->ltd_exp)
 			return -EINVAL;
 
+		imp = class_exp2cliimp(tgt->ltd_exp);
+		if (!tgt->ltd_active && imp->imp_state != LUSTRE_IMP_IDLE) {
+			qctl->qc_valid = QC_MDTIDX;
+			qctl->obd_uuid = tgt->ltd_uuid;
+			return -ENODATA;
+		}
+
 		oqctl = kzalloc(sizeof(*oqctl), GFP_KERNEL);
 		if (!oqctl)
 			return -ENOMEM;
@@ -3122,7 +3133,7 @@ static int lmv_get_info(const struct lu_env *env, struct obd_export *exp,
 			exp->exp_connect_data = *(struct obd_connect_data *)val;
 		return rc;
 	} else if (KEY_IS(KEY_TGT_COUNT)) {
-		*((int *)val) = lmv->lmv_mdt_descs.ltd_lmv_desc.ld_tgt_count;
+		*((int *)val) = lmv->lmv_mdt_descs.ltd_tgts_size;
 		return 0;
 	}
 
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 161226f..d2fe8c3 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -1021,13 +1021,17 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 		struct if_quotactl *qctl = karg;
 		struct lov_tgt_desc *tgt = NULL;
 		struct obd_quotactl *oqctl;
+		struct obd_import *imp;
 
 		if (qctl->qc_valid == QC_OSTIDX) {
 			if (count <= qctl->qc_idx)
 				return -EINVAL;
 
 			tgt = lov->lov_tgts[qctl->qc_idx];
-			if (!tgt || !tgt->ltd_exp)
+			if (!tgt)
+				return -ENODEV;
+
+			if (!tgt->ltd_exp)
 				return -EINVAL;
 		} else if (qctl->qc_valid == QC_UUID) {
 			for (i = 0; i < count; i++) {
@@ -1050,6 +1054,13 @@ static int lov_iocontrol(unsigned int cmd, struct obd_export *exp, int len,
 			return -EAGAIN;
 
 		LASSERT(tgt && tgt->ltd_exp);
+		imp = class_exp2cliimp(tgt->ltd_exp);
+		if (!tgt->ltd_active && imp->imp_state != LUSTRE_IMP_IDLE) {
+			qctl->qc_valid = QC_OSTIDX;
+			qctl->obd_uuid = tgt->ltd_uuid;
+			return -ENODATA;
+		}
+
 		oqctl = kzalloc(sizeof(*oqctl), GFP_NOFS);
 		if (!oqctl)
 			return -ENOMEM;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 26/32] lustre: mdc: pack default LMV in open reply
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (24 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 25/32] lustre: quota: skip non-exist or inact tgt for lfs_quota James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 27/32] lnet: Define KFILND network type James Simmons
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Add flag MDS_OPEN_DEFAULT_LMV to indicate that default LMV should be
packed in open reply, otherwise if open fetches LOOKUP lock, client
won't know directory has default LMV, and in subdir creation default
LMV won't take effect.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15850
Lustre-commit: f6e4272fb0be5b798 ("LU-15850 mdt: pack default LMV in open reply")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47576
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_lib.c                 | 1 +
 fs/lustre/mdc/mdc_locks.c               | 2 ++
 fs/lustre/ptlrpc/layout.c               | 1 +
 fs/lustre/ptlrpc/wiretest.c             | 2 ++
 include/uapi/linux/lustre/lustre_user.h | 5 ++++-
 5 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/mdc/mdc_lib.c b/fs/lustre/mdc/mdc_lib.c
index 51080a1..077639d 100644
--- a/fs/lustre/mdc/mdc_lib.c
+++ b/fs/lustre/mdc/mdc_lib.c
@@ -329,6 +329,7 @@ void mdc_open_pack(struct req_capsule *pill, struct md_op_data *op_data,
 			rec->cr_archive_id = op_data->op_archive_id;
 		}
 	}
+	cr_flags |= MDS_OPEN_DEFAULT_LMV;
 	set_mrc_cr_flags(rec, cr_flags);
 }
 
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index b86d1b9..2a9b9a8 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -393,6 +393,8 @@ static int mdc_save_lovea(struct ptlrpc_request *req, void *data, u32 size)
 	 */
 	req_capsule_set_size(&req->rq_pill, &RMF_NIOBUF_INLINE, RCL_SERVER,
 			     sizeof(struct niobuf_remote));
+	req_capsule_set_size(&req->rq_pill, &RMF_DEFAULT_MDT_MD, RCL_SERVER,
+			     sizeof(struct lmv_user_md));
 	ptlrpc_request_set_replen(req);
 
 	/* Get real repbuf allocated size as rounded up power of 2 */
diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c
index 8725edd..82ec899 100644
--- a/fs/lustre/ptlrpc/layout.c
+++ b/fs/lustre/ptlrpc/layout.c
@@ -447,6 +447,7 @@
 	&RMF_NIOBUF_INLINE,
 	&RMF_FILE_SECCTX,
 	&RMF_FILE_ENCCTX,
+	&RMF_DEFAULT_MDT_MD,
 };
 
 static const struct req_msg_field *ldlm_intent_getattr_client[] = {
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 81e0485..60a7fd0 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -2326,6 +2326,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)MDS_OPEN_RESYNC);
 	LASSERTF(MDS_OPEN_PCC == 00000000010000000000000ULL, "found 0%.22lloULL\n",
 		 (long long)MDS_OPEN_PCC);
+	LASSERTF(MDS_OPEN_DEFAULT_LMV == 00000000040000000000000ULL, "found 0%.22lloULL\n",
+		 (long long)MDS_OPEN_DEFAULT_LMV);
 	LASSERTF(LUSTRE_SYNC_FL == 0x00000008, "found 0x%.8x\n",
 		 LUSTRE_SYNC_FL);
 	LASSERTF(LUSTRE_IMMUTABLE_FL == 0x00000010, "found 0x%.8x\n",
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index c57929b..7b79604 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1246,12 +1246,15 @@ enum la_valid {
 					      * for newly created file
 					      */
 #define MDS_OP_WITH_FID	  020000000000000ULL /* operation carried out by FID */
+#define MDS_OPEN_DEFAULT_LMV  040000000000000ULL /* open fetches default LMV */
 
+/* lustre internal open flags, which should not be set from user space */
 #define MDS_OPEN_FL_INTERNAL (MDS_OPEN_HAS_EA | MDS_OPEN_HAS_OBJS |	\
 			      MDS_OPEN_OWNEROVERRIDE | MDS_OPEN_LOCK |	\
 			      MDS_OPEN_BY_FID | MDS_OPEN_LEASE |	\
 			      MDS_OPEN_RELEASE | MDS_OPEN_RESYNC |	\
-			      MDS_OPEN_PCC | MDS_OP_WITH_FID)
+			      MDS_OPEN_PCC | MDS_OP_WITH_FID |		\
+			      MDS_OPEN_DEFAULT_LMV)
 
 /********* Changelogs **********/
 /** Changelog record types */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 27/32] lnet: Define KFILND network type
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (25 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 26/32] lustre: mdc: pack default LMV in open reply James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 28/32] lnet: Adjust niov checks for large MD James Simmons
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Define the KFILND network type. This reserves the network type number
for future implementation and allows creation of kfi peers and
adding routes to kfi peers.

HPE-bug-id: LUS-11060

WC-bug-id: https://jira.whamcloud.com/browse/LU-15983
Lustre-commit: 5fea36c952373c9a2 ("LU-15983 lnet: Define KFILND network type")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/47830
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/nidstr.h |  1 +
 net/lnet/lnet/nidstrings.c       | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h
index 80be2eb..d5829fe 100644
--- a/include/uapi/linux/lnet/nidstr.h
+++ b/include/uapi/linux/lnet/nidstr.h
@@ -55,6 +55,7 @@ enum {
 	GNILND		= 13,
 	GNIIPLND	= 14,
 	PTL4LND		= 15,
+	KFILND		= 16,
 
 	NUM_LNDS
 };
diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c
index 3523b78..ac2aa97 100644
--- a/net/lnet/lnet/nidstrings.c
+++ b/net/lnet/lnet/nidstrings.c
@@ -758,6 +758,16 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist)
 	  .nf_print_addrlist	= libcfs_num_addr_range_print,
 	  .nf_match_addr	= libcfs_num_match
 	},
+	{
+	  .nf_type		= KFILND,
+	  .nf_name		= "kfi",
+	  .nf_modname		= "kkfilnd",
+	  .nf_addr2str		= libcfs_decnum_addr2str,
+	  .nf_str2addr		= libcfs_num_str2addr,
+	  .nf_parse_addrlist	= libcfs_num_parse,
+	  .nf_print_addrlist	= libcfs_num_addr_range_print,
+	  .nf_match_addr	= libcfs_num_match
+	},
 };
 
 static const size_t libcfs_nnetstrfns = ARRAY_SIZE(libcfs_netstrfns);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 28/32] lnet: Adjust niov checks for large MD
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (26 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 27/32] lnet: Define KFILND network type James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 29/32] lustre: ec: code to add support for M to N parity James Simmons
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

An LNet user can allocate a large contiguous MD. That MD can have >
LNET_MAX_IOV pages which causes some LNDs to assert on either niov
argument passed to lnd_recv() or the value stored in
lnet_msg::msg_niov. This is true even in cases where the actual
transfer size is <= LNET_MTU and will not exceed limits in the LNDs.

Adjust ksocklnd_send()/ksocklnd_recv() to assert on the return value
of lnet_extract_kiov().

Remove the assert on msg_niov (payload_niov) from kiblnd_send().
kiblnd_setup_rd_kiov() will already fail if we exceed ko2iblnd's
available scatter gather entries.

HPE-bug-id: LUS-10878
Fixes: 05cd1717bb ("lnet: always put a page list into struct lnet_libmd")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15851
Lustre-commit: 105193b4a147257a0 ("LU-15851 lnet: Adjust niov checks for large MD")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/47319
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 1 -
 net/lnet/klnds/socklnd/socklnd_cb.c | 5 +++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index d4d8954..30e77c0 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1564,7 +1564,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	       payload_nob, payload_niov, libcfs_idstr(target));
 
 	LASSERT(!payload_nob || payload_niov > 0);
-	LASSERT(payload_niov <= LNET_MAX_IOV);
 
 	/* Thread context */
 	LASSERT(!in_interrupt());
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 94600f3..308d8b0 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -936,7 +936,6 @@ struct ksock_conn_cb *
 	       payload_nob, payload_niov, libcfs_idstr(target));
 
 	LASSERT(!payload_nob || payload_niov > 0);
-	LASSERT(payload_niov <= LNET_MAX_IOV);
 	LASSERT(!in_interrupt());
 
 	desc_size = offsetof(struct ksock_tx,
@@ -962,6 +961,8 @@ struct ksock_conn_cb *
 					 payload_niov, payload_kiov,
 					 payload_offset, payload_nob);
 
+	LASSERT(tx->tx_nkiov <= LNET_MAX_IOV);
+
 	if (payload_nob >= *ksocknal_tunables.ksnd_zc_min_payload)
 		tx->tx_zc_capable = 1;
 
@@ -1278,13 +1279,13 @@ struct ksock_conn_cb *
 	struct ksock_sched *sched = conn->ksnc_scheduler;
 
 	LASSERT(iov_iter_count(to) <= rlen);
-	LASSERT(to->nr_segs <= LNET_MAX_IOV);
 
 	conn->ksnc_lnet_msg = msg;
 	conn->ksnc_rx_nob_left = rlen;
 
 	conn->ksnc_rx_to = *to;
 
+	LASSERT(conn->ksnc_rx_to.nr_segs <= LNET_MAX_IOV);
 	LASSERT(conn->ksnc_rx_scheduled);
 
 	spin_lock_bh(&sched->kss_lock);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 29/32] lustre: ec: code to add support for M to N parity
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (27 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 28/32] lnet: Adjust niov checks for large MD James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 30/32] lustre: llite: use max default EA size to get default LMV James Simmons
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Adam Disney, Lustre Development List

This code adds basic functionality for calculating N parities
for M data units. This allows much more than just working with
raid6 calculations. The code is derived from the Intel isa-l
userland library. Keep the code in an separate module for easy
merger upstream at a latter time.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12189
Lustre-commit: 047347170b8aece43 ("LU-12189 ec: code to add support for M to N parity")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Adam Disney <disneyaw@ornl.gov>
Reviewed-on: https://review.whamcloud.com/47628
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/Makefile               |   2 +-
 fs/lustre/ec/Makefile            |   4 +
 fs/lustre/ec/ec_base.c           | 352 +++++++++++++++++++++++++++++++++++++++
 fs/lustre/include/erasure_code.h | 142 ++++++++++++++++
 4 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 fs/lustre/ec/Makefile
 create mode 100644 fs/lustre/ec/ec_base.c
 create mode 100644 fs/lustre/include/erasure_code.h

diff --git a/fs/lustre/Makefile b/fs/lustre/Makefile
index 4a02463..4af6ff6 100644
--- a/fs/lustre/Makefile
+++ b/fs/lustre/Makefile
@@ -1,3 +1,3 @@
 
-obj-$(CONFIG_LUSTRE_FS) += obdclass/ ptlrpc/ fld/ osc/ mgc/ \
+obj-$(CONFIG_LUSTRE_FS) += obdclass/ ptlrpc/ fld/ osc/ mgc/ ec/ \
 			   fid/ lov/ mdc/ lmv/ llite/ obdecho/
diff --git a/fs/lustre/ec/Makefile b/fs/lustre/ec/Makefile
new file mode 100644
index 0000000..aba8ea3
--- /dev/null
+++ b/fs/lustre/ec/Makefile
@@ -0,0 +1,4 @@
+ccflags-y += -I$(srctree)/$(src)/../include
+
+obj-$(CONFIG_LUSTRE_FS) += ec.o
+ec-y := ec_base.o
diff --git a/fs/lustre/ec/ec_base.c b/fs/lustre/ec/ec_base.c
new file mode 100644
index 0000000..e520466
--- /dev/null
+++ b/fs/lustre/ec/ec_base.c
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: BSD-2-Clause
+/**********************************************************************
+ * Copyright(c) 2011-2015 Intel Corporation All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *    Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *    Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *    Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/limits.h>
+#include <linux/module.h>
+#include <linux/string.h>	/* for memset */
+
+#include "erasure_code.h"
+
+/* Global GF(256) tables */
+static const unsigned char gff_base[] = {
+	0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1d, 0x3a,
+	0x74, 0xe8, 0xcd, 0x87, 0x13, 0x26, 0x4c, 0x98, 0x2d, 0x5a,
+	0xb4, 0x75, 0xea, 0xc9, 0x8f, 0x03, 0x06, 0x0c, 0x18, 0x30,
+	0x60, 0xc0, 0x9d, 0x27, 0x4e, 0x9c, 0x25, 0x4a, 0x94, 0x35,
+	0x6a, 0xd4, 0xb5, 0x77, 0xee, 0xc1, 0x9f, 0x23, 0x46, 0x8c,
+	0x05, 0x0a, 0x14, 0x28, 0x50, 0xa0, 0x5d, 0xba, 0x69, 0xd2,
+	0xb9, 0x6f, 0xde, 0xa1, 0x5f, 0xbe, 0x61, 0xc2, 0x99, 0x2f,
+	0x5e, 0xbc, 0x65, 0xca, 0x89, 0x0f, 0x1e, 0x3c, 0x78, 0xf0,
+	0xfd, 0xe7, 0xd3, 0xbb, 0x6b, 0xd6, 0xb1, 0x7f, 0xfe, 0xe1,
+	0xdf, 0xa3, 0x5b, 0xb6, 0x71, 0xe2, 0xd9, 0xaf, 0x43, 0x86,
+	0x11, 0x22, 0x44, 0x88, 0x0d, 0x1a, 0x34, 0x68, 0xd0, 0xbd,
+	0x67, 0xce, 0x81, 0x1f, 0x3e, 0x7c, 0xf8, 0xed, 0xc7, 0x93,
+	0x3b, 0x76, 0xec, 0xc5, 0x97, 0x33, 0x66, 0xcc, 0x85, 0x17,
+	0x2e, 0x5c, 0xb8, 0x6d, 0xda, 0xa9, 0x4f, 0x9e, 0x21, 0x42,
+	0x84, 0x15, 0x2a, 0x54, 0xa8, 0x4d, 0x9a, 0x29, 0x52, 0xa4,
+	0x55, 0xaa, 0x49, 0x92, 0x39, 0x72, 0xe4, 0xd5, 0xb7, 0x73,
+	0xe6, 0xd1, 0xbf, 0x63, 0xc6, 0x91, 0x3f, 0x7e, 0xfc, 0xe5,
+	0xd7, 0xb3, 0x7b, 0xf6, 0xf1, 0xff, 0xe3, 0xdb, 0xab, 0x4b,
+	0x96, 0x31, 0x62, 0xc4, 0x95, 0x37, 0x6e, 0xdc, 0xa5, 0x57,
+	0xae, 0x41, 0x82, 0x19, 0x32, 0x64, 0xc8, 0x8d, 0x07, 0x0e,
+	0x1c, 0x38, 0x70, 0xe0, 0xdd, 0xa7, 0x53, 0xa6, 0x51, 0xa2,
+	0x59, 0xb2, 0x79, 0xf2, 0xf9, 0xef, 0xc3, 0x9b, 0x2b, 0x56,
+	0xac, 0x45, 0x8a, 0x09, 0x12, 0x24, 0x48, 0x90, 0x3d, 0x7a,
+	0xf4, 0xf5, 0xf7, 0xf3, 0xfb, 0xeb, 0xcb, 0x8b, 0x0b, 0x16,
+	0x2c, 0x58, 0xb0, 0x7d, 0xfa, 0xe9, 0xcf, 0x83, 0x1b, 0x36,
+	0x6c, 0xd8, 0xad, 0x47, 0x8e, 0x01
+};
+
+static const unsigned char gflog_base[] = {
+	0x00, 0xff, 0x01, 0x19, 0x02, 0x32, 0x1a, 0xc6, 0x03, 0xdf,
+	0x33, 0xee, 0x1b, 0x68, 0xc7, 0x4b, 0x04, 0x64, 0xe0, 0x0e,
+	0x34, 0x8d, 0xef, 0x81, 0x1c, 0xc1, 0x69, 0xf8, 0xc8, 0x08,
+	0x4c, 0x71, 0x05, 0x8a, 0x65, 0x2f, 0xe1, 0x24, 0x0f, 0x21,
+	0x35, 0x93, 0x8e, 0xda, 0xf0, 0x12, 0x82, 0x45, 0x1d, 0xb5,
+	0xc2, 0x7d, 0x6a, 0x27, 0xf9, 0xb9, 0xc9, 0x9a, 0x09, 0x78,
+	0x4d, 0xe4, 0x72, 0xa6, 0x06, 0xbf, 0x8b, 0x62, 0x66, 0xdd,
+	0x30, 0xfd, 0xe2, 0x98, 0x25, 0xb3, 0x10, 0x91, 0x22, 0x88,
+	0x36, 0xd0, 0x94, 0xce, 0x8f, 0x96, 0xdb, 0xbd, 0xf1, 0xd2,
+	0x13, 0x5c, 0x83, 0x38, 0x46, 0x40, 0x1e, 0x42, 0xb6, 0xa3,
+	0xc3, 0x48, 0x7e, 0x6e, 0x6b, 0x3a, 0x28, 0x54, 0xfa, 0x85,
+	0xba, 0x3d, 0xca, 0x5e, 0x9b, 0x9f, 0x0a, 0x15, 0x79, 0x2b,
+	0x4e, 0xd4, 0xe5, 0xac, 0x73, 0xf3, 0xa7, 0x57, 0x07, 0x70,
+	0xc0, 0xf7, 0x8c, 0x80, 0x63, 0x0d, 0x67, 0x4a, 0xde, 0xed,
+	0x31, 0xc5, 0xfe, 0x18, 0xe3, 0xa5, 0x99, 0x77, 0x26, 0xb8,
+	0xb4, 0x7c, 0x11, 0x44, 0x92, 0xd9, 0x23, 0x20, 0x89, 0x2e,
+	0x37, 0x3f, 0xd1, 0x5b, 0x95, 0xbc, 0xcf, 0xcd, 0x90, 0x87,
+	0x97, 0xb2, 0xdc, 0xfc, 0xbe, 0x61, 0xf2, 0x56, 0xd3, 0xab,
+	0x14, 0x2a, 0x5d, 0x9e, 0x84, 0x3c, 0x39, 0x53, 0x47, 0x6d,
+	0x41, 0xa2, 0x1f, 0x2d, 0x43, 0xd8, 0xb7, 0x7b, 0xa4, 0x76,
+	0xc4, 0x17, 0x49, 0xec, 0x7f, 0x0c, 0x6f, 0xf6, 0x6c, 0xa1,
+	0x3b, 0x52, 0x29, 0x9d, 0x55, 0xaa, 0xfb, 0x60, 0x86, 0xb1,
+	0xbb, 0xcc, 0x3e, 0x5a, 0xcb, 0x59, 0x5f, 0xb0, 0x9c, 0xa9,
+	0xa0, 0x51, 0x0b, 0xf5, 0x16, 0xeb, 0x7a, 0x75, 0x2c, 0xd7,
+	0x4f, 0xae, 0xd5, 0xe9, 0xe6, 0xe7, 0xad, 0xe8, 0x74, 0xd6,
+	0xf4, 0xea, 0xa8, 0x50, 0x58, 0xaf
+};
+
+void ec_init_tables(int k, int rows, unsigned char *a, unsigned char *g_tbls)
+{
+	int i, j;
+
+	for (i = 0; i < rows; i++) {
+		for (j = 0; j < k; j++) {
+			gf_vect_mul_init(*a++, g_tbls);
+			g_tbls += 32;
+		}
+	}
+}
+EXPORT_SYMBOL(ec_init_tables);
+
+unsigned char gf_mul(unsigned char a, unsigned char b)
+{
+	int i;
+
+	if ((a == 0) || (b == 0))
+		return 0;
+
+	i = gflog_base[a] + gflog_base[b];
+	return gff_base[i > 254 ? i - 255 : i];
+}
+
+unsigned char gf_inv(unsigned char a)
+{
+	if (a == 0)
+		return 0;
+
+	return gff_base[255 - gflog_base[a]];
+}
+
+void gf_gen_cauchy1_matrix(unsigned char *a, int m, int k)
+{
+	int i, j;
+	unsigned char *p;
+
+	/* Identity matrix in high position */
+	memset(a, 0, k * m);
+	for (i = 0; i < k; i++)
+		a[k * i + i] = 1;
+
+	/* For the rest choose 1/(i + j) | i != j */
+	p = &a[k * k];
+	for (i = k; i < m; i++)
+		for (j = 0; j < k; j++)
+			*p++ = gf_inv(i ^ j);
+
+}
+EXPORT_SYMBOL(gf_gen_cauchy1_matrix);
+
+int gf_invert_matrix(unsigned char *in_mat, unsigned char *out_mat, const int n)
+{
+	int i, j, k;
+	unsigned char temp;
+
+	/* Set out_mat[] to the identity matrix */
+	for (i = 0; i < n * n; i++)	/* memset(out_mat, 0, n*n) */
+		out_mat[i] = 0;
+
+	for (i = 0; i < n; i++)
+		out_mat[i * n + i] = 1;
+
+	/* Inverse */
+	for (i = 0; i < n; i++) {
+		/* Check for 0 in pivot element */
+		if (in_mat[i * n + i] == 0) {
+			/* Find a row with non-zero in current column and swap */
+			for (j = i + 1; j < n; j++)
+				if (in_mat[j * n + i])
+					break;
+
+			if (j == n)	/* Couldn't find means it's singular */
+				return -1;
+
+			for (k = 0; k < n; k++) {	/* Swap rows i,j */
+				temp = in_mat[i * n + k];
+				in_mat[i * n + k] = in_mat[j * n + k];
+				in_mat[j * n + k] = temp;
+
+				temp = out_mat[i * n + k];
+				out_mat[i * n + k] = out_mat[j * n + k];
+				out_mat[j * n + k] = temp;
+			}
+		}
+
+		temp = gf_inv(in_mat[i * n + i]);	/* 1/pivot */
+		for (j = 0; j < n; j++) {	/* Scale row i by 1/pivot */
+			in_mat[i * n + j] = gf_mul(in_mat[i * n + j], temp);
+			out_mat[i * n + j] = gf_mul(out_mat[i * n + j], temp);
+		}
+
+		for (j = 0; j < n; j++) {
+			if (j == i)
+				continue;
+
+			temp = in_mat[j * n + i];
+			for (k = 0; k < n; k++) {
+				out_mat[j * n + k] ^= gf_mul(temp, out_mat[i * n + k]);
+				in_mat[j * n + k] ^= gf_mul(temp, in_mat[i * n + k]);
+			}
+		}
+	}
+	return 0;
+}
+EXPORT_SYMBOL(gf_invert_matrix);
+
+/* Calculates const table gftbl in GF(2^8) from single input A
+ * gftbl(A) = {A{00}, A{01}, A{02}, ... , A{0f} }, {A{00}, A{10}, A{20}, ... ,
+ * A{f0} }
+ */
+void gf_vect_mul_init(unsigned char c, unsigned char *tbl)
+{
+	unsigned char c2 = (c << 1) ^ ((c & 0x80) ? 0x1d : 0);	/* Mult by
+								 * GF{2}
+								 */
+	unsigned char c4 = (c2 << 1) ^ ((c2 & 0x80) ? 0x1d : 0); /* Mult by
+								  * GF{2}
+								  */
+	unsigned char c8 = (c4 << 1) ^ ((c4 & 0x80) ? 0x1d : 0); /* Mult by
+								  * GF{2}
+								  */
+#if BITS_PER_LONG == 64
+	unsigned long long v1, v2, v4, v8, *t;
+	unsigned long long v10, v20, v40, v80;
+	unsigned char c17, c18, c20, c24;
+
+	t = (unsigned long long *)tbl;
+
+	v1 = c * 0x0100010001000100ull;
+	v2 = c2 * 0x0101000001010000ull;
+	v4 = c4 * 0x0101010100000000ull;
+	v8 = c8 * 0x0101010101010101ull;
+
+	v4 = v1 ^ v2 ^ v4;
+	t[0] = v4;
+	t[1] = v8 ^ v4;
+
+	c17 = (c8 << 1) ^ ((c8 & 0x80) ? 0x1d : 0);	//Mult by GF{2}
+	c18 = (c17 << 1) ^ ((c17 & 0x80) ? 0x1d : 0);	//Mult by GF{2}
+	c20 = (c18 << 1) ^ ((c18 & 0x80) ? 0x1d : 0);	//Mult by GF{2}
+	c24 = (c20 << 1) ^ ((c20 & 0x80) ? 0x1d : 0);	//Mult by GF{2}
+
+	v10 = c17 * 0x0100010001000100ull;
+	v20 = c18 * 0x0101000001010000ull;
+	v40 = c20 * 0x0101010100000000ull;
+	v80 = c24 * 0x0101010101010101ull;
+
+	v40 = v10 ^ v20 ^ v40;
+	t[2] = v40;
+	t[3] = v80 ^ v40;
+
+#else /* 32-bit or other */
+	unsigned char c3, c5, c6, c7, c9, c10, c11, c12, c13, c14, c15;
+	unsigned char c17, c18, c19, c20, c21, c22, c23, c24, c25, c26;
+	unsigned char c27, c28, c29, c30, c31;
+
+	c3 = c2 ^ c;
+	c5 = c4 ^ c;
+	c6 = c4 ^ c2;
+	c7 = c4 ^ c3;
+
+	c9 = c8 ^ c;
+	c10 = c8 ^ c2;
+	c11 = c8 ^ c3;
+	c12 = c8 ^ c4;
+	c13 = c8 ^ c5;
+	c14 = c8 ^ c6;
+	c15 = c8 ^ c7;
+
+	tbl[0] = 0;
+	tbl[1] = c;
+	tbl[2] = c2;
+	tbl[3] = c3;
+	tbl[4] = c4;
+	tbl[5] = c5;
+	tbl[6] = c6;
+	tbl[7] = c7;
+	tbl[8] = c8;
+	tbl[9] = c9;
+	tbl[10] = c10;
+	tbl[11] = c11;
+	tbl[12] = c12;
+	tbl[13] = c13;
+	tbl[14] = c14;
+	tbl[15] = c15;
+
+	c17 = (c8 << 1) ^ ((c8 & 0x80) ? 0x1d : 0);	/* Mult by GF{2} */
+	c18 = (c17 << 1) ^ ((c17 & 0x80) ? 0x1d : 0);	/* Mult by GF{2} */
+	c19 = c18 ^ c17;
+	c20 = (c18 << 1) ^ ((c18 & 0x80) ? 0x1d : 0);	/* Mult by GF{2} */
+	c21 = c20 ^ c17;
+	c22 = c20 ^ c18;
+	c23 = c20 ^ c19;
+	c24 = (c20 << 1) ^ ((c20 & 0x80) ? 0x1d : 0);	/* Mult by GF{2} */
+	c25 = c24 ^ c17;
+	c26 = c24 ^ c18;
+	c27 = c24 ^ c19;
+	c28 = c24 ^ c20;
+	c29 = c24 ^ c21;
+	c30 = c24 ^ c22;
+	c31 = c24 ^ c23;
+
+	tbl[16] = 0;
+	tbl[17] = c17;
+	tbl[18] = c18;
+	tbl[19] = c19;
+	tbl[20] = c20;
+	tbl[21] = c21;
+	tbl[22] = c22;
+	tbl[23] = c23;
+	tbl[24] = c24;
+	tbl[25] = c25;
+	tbl[26] = c26;
+	tbl[27] = c27;
+	tbl[28] = c28;
+	tbl[29] = c29;
+	tbl[30] = c30;
+	tbl[31] = c31;
+
+#endif /* BITS_PER_LONG == 64 */
+}
+
+void ec_encode_data(int len, int srcs, int dests, unsigned char *v,
+		    unsigned char **src, unsigned char **dest)
+{
+	int i, j, l;
+	unsigned char s;
+
+	for (l = 0; l < dests; l++) {
+		for (i = 0; i < len; i++) {
+			s = 0;
+			for (j = 0; j < srcs; j++)
+				s ^= gf_mul(src[j][i], v[j * 32 + l * srcs * 32 + 1]);
+
+			dest[l][i] = s;
+		}
+	}
+}
+EXPORT_SYMBOL(ec_encode_data);
+
+static int __init ec_init(void)
+{
+	return 0;
+}
+
+static void __exit ec_exit(void)
+{
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("M to N erasure code handling");
+MODULE_VERSION("1.0.0");
+MODULE_LICENSE("Dual BSD/GPL");
+
+module_init(ec_init);
+module_exit(ec_exit);
diff --git a/fs/lustre/include/erasure_code.h b/fs/lustre/include/erasure_code.h
new file mode 100644
index 0000000..9e62c2b
--- /dev/null
+++ b/fs/lustre/include/erasure_code.h
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: BSD-2-Clause
+/**********************************************************************
+ * Copyright(c) 2011-2015 Intel Corporation All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *    Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ERASURE_CODE_H_
+#define _ERASURE_CODE_H_
+
+/**
+ *  @file erasure_code.h
+ *  @brief Interface to functions supporting erasure code encode and decode.
+ *
+ *  This file defines the interface to optimized functions used in erasure
+ *  codes.  Encode and decode of erasures in GF(2^8) are made by calculating the
+ *  dot product of the symbols (bytes in GF(2^8)) across a set of buffers and a
+ *  set of coefficients.  Values for the coefficients are determined by the type
+ *  of erasure code.  Using a general dot product means that any sequence of
+ *  coefficients may be used including erasure codes based on random
+ *  coefficients.
+ *  Multiple versions of dot product are supplied to calculate 1-6 output
+ *  vectors in one pass.
+ *  Base GF multiply and divide functions can be sped up by defining
+ *  GF_LARGE_TABLES at the expense of memory size.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @brief Initialize 32-byte constant array for GF(2^8) vector multiply
+ *
+ * Calculates array {C{00}, C{01}, C{02}, ... , C{0f} }, {C{00}, C{10},
+ * C{20}, ... , C{f0} } as required by other fast vector multiply
+ * functions.
+ * @param c     Constant input.
+ * @param gftbl Table output.
+ */
+void gf_vect_mul_init(unsigned char c, unsigned char *gftbl);
+
+/**
+ * @brief Initialize tables for fast Erasure Code encode and decode.
+ *
+ * Generates the expanded tables needed for fast encode or decode for erasure
+ * codes on blocks of data.  32bytes is generated for each input coefficient.
+ *
+ * @param k      The number of vector sources or rows in the generator matrix
+ *               for coding.
+ * @param rows   The number of output vectors to concurrently encode/decode.
+ * @param a      Pointer to sets of arrays of input coefficients used to encode
+ *               or decode data.
+ * @param gftbls Pointer to start of space for concatenated output tables
+ *               generated from input coefficients.  Must be of size 32*k*rows.
+ * @returns none
+ */
+void ec_init_tables(int k, int rows, unsigned char *a, unsigned char *gftbls);
+
+/**
+ * @brief Generate or decode erasure codes on blocks of data, runs appropriate
+ * version.
+ *
+ * Given a list of source data blocks, generate one or multiple blocks of
+ * encoded data as specified by a matrix of GF(2^8) coefficients. When given a
+ * suitable set of coefficients, this function will perform the fast generation
+ * or decoding of Reed-Solomon type erasure codes.
+ *
+ * This function determines what instruction sets are enabled and
+ * selects the appropriate version at runtime.
+ *
+ * @param len    Length of each block of data (vector) of source or dest data.
+ * @param k      The number of vector sources or rows in the generator matrix
+ *		 for coding.
+ * @param rows   The number of output vectors to concurrently encode/decode.
+ * @param gftbls Pointer to array of input tables generated from coding
+ *		  coefficients in ec_init_tables(). Must be of size 32*k*rows
+ * @param data   Array of pointers to source input buffers.
+ * @param coding Array of pointers to coded output buffers.
+ * @returns none
+ */
+void ec_encode_data(int len, int k, int rows, unsigned char *gftbls,
+		    unsigned char **data, unsigned char **coding);
+
+/**
+ * @brief Generate a Cauchy matrix of coefficients to be used for encoding.
+ *
+ * Cauchy matrix example of encoding coefficients where high portion of matrix
+ * is identity matrix I and lower portion is constructed as 1/(i + j) | i != j,
+ * i:{0,k-1} j:{k,m-1}.  Any sub-matrix of a Cauchy matrix should be invertable.
+ *
+ * @param a  [m x k] array to hold coefficients
+ * @param m  number of rows in matrix corresponding to srcs + parity.
+ * @param k  number of columns in matrix corresponding to srcs.
+ * @returns  none
+ */
+void gf_gen_cauchy1_matrix(unsigned char *a, int m, int k);
+
+/**
+ * @brief Invert a matrix in GF(2^8)
+ *
+ * Attempts to construct an n x n inverse of the input matrix. Returns non-zero
+ * if singular. Will always destroy input matrix in process.
+ *
+ * @param in  input matrix, destroyed by invert process
+ * @param out output matrix such that [in] x [out] = [I] - identity matrix
+ * @param n   size of matrix [nxn]
+ * @returns 0 successful, other fail on singular input matrix
+ */
+int gf_invert_matrix(unsigned char *in, unsigned char *out, const int n);
+
+/*************************************************************/
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _ERASURE_CODE_H_ */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 30/32] lustre: llite: use max default EA size to get default LMV
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (28 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 29/32] lustre: ec: code to add support for M to N parity James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 31/32] lustre: llite: pass dmv inherit depth instead of dir depth James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 32/32] lustre: ldlm: Prioritize blocking callbacks James Simmons
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Subdir mount will fetch ROOT default LMV and set it, but the default
EA size cl_default_mds_easize may not be set for MDT0 yet, because
it's updated upon getattr/enqueue, and if subdir mount is not on MDT0,
it may not be initialized yet. Use max EA size to fetch default
layout in ll_dir_get_default_layout().

Fixes: 4cee9af853 ("lustre: llite: enforce ROOT default on subdir mount")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15910
Lustre-commit: bb588480d4cdd6847 ("LU-15910 llite: use max default EA size to get default LMV")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 26c9ec3..3384d81 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -663,17 +663,13 @@ int ll_dir_get_default_layout(struct inode *inode, void **plmm, int *plmm_size,
 	struct mdt_body *body;
 	struct lov_mds_md *lmm = NULL;
 	struct ptlrpc_request *req = NULL;
-	int rc, lmmsize;
+	int lmmsize = OBD_MAX_DEFAULT_EA_SIZE;
 	struct md_op_data *op_data;
 	struct lu_fid fid;
+	int rc;
 
-	rc = ll_get_max_mdsize(sbi, &lmmsize);
-	if (rc)
-		return rc;
-
-	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL,
-				     0, lmmsize, LUSTRE_OPC_ANY,
-				     NULL);
+	op_data = ll_prep_md_op_data(NULL, inode, NULL, NULL, 0, lmmsize,
+				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 31/32] lustre: llite: pass dmv inherit depth instead of dir depth
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (29 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 30/32] lustre: llite: use max default EA size to get default LMV James Simmons
@ 2022-08-04  1:38 ` James Simmons
  2022-08-04  1:38 ` [lustre-devel] [PATCH 32/32] lustre: ldlm: Prioritize blocking callbacks James Simmons
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

In directory creation, once it's ancestor has default LMV, pass
the inherit depth, otherwise pass the directory depth to ROOT.

This depth will be used in QoS allocation.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15850
Lustre-commit: c23c68a52a0436910 ("LU-15850 llite: pass dmv inherit depth instead of dir depth")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47577
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_lmv.h   | 23 +++++++++++++++++--
 fs/lustre/llite/dir.c            |  3 ++-
 fs/lustre/llite/llite_internal.h |  4 ++++
 fs/lustre/llite/llite_lib.c      | 48 +++++++++++++++++++++++++++++++++++++---
 fs/lustre/llite/namei.c          |  4 ++--
 5 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h
index cd7cf9e..3720a97 100644
--- a/fs/lustre/include/lustre_lmv.h
+++ b/fs/lustre/include/lustre_lmv.h
@@ -51,8 +51,6 @@ struct lmv_stripe_md {
 	u32	lsm_md_layout_version;
 	u32	lsm_md_migrate_offset;
 	u32	lsm_md_migrate_hash;
-	u32	lsm_md_default_count;
-	u32	lsm_md_default_index;
 	char	lsm_md_pool_name[LOV_MAXPOOLNAME + 1];
 	struct lmv_oinfo lsm_md_oinfo[0];
 };
@@ -513,4 +511,25 @@ static inline bool lmv_is_layout_changing(const struct lmv_mds_md_v1 *lmv)
 	       lmv_hash_is_migrating(cpu_to_le32(lmv->lmv_hash_type));
 }
 
+static inline u8 lmv_inherit_next(u8 inherit)
+{
+	if (inherit == LMV_INHERIT_END || inherit == LMV_INHERIT_NONE)
+		return LMV_INHERIT_NONE;
+
+	if (inherit == LMV_INHERIT_UNLIMITED || inherit > LMV_INHERIT_MAX)
+		return inherit;
+
+	return inherit - 1;
+}
+
+static inline u8 lmv_inherit_rr_next(u8 inherit_rr)
+{
+	if (inherit_rr == LMV_INHERIT_RR_NONE ||
+	    inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
+	    inherit_rr > LMV_INHERIT_RR_MAX)
+		return inherit_rr;
+
+	return inherit_rr - 1;
+}
+
 #endif
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 3384d81..aea15f5 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -491,7 +491,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	op_data->op_dir_depth = ll_i2info(parent)->lli_dir_depth;
+	op_data->op_dir_depth = ll_i2info(parent)->lli_inherit_depth ?:
+				ll_i2info(parent)->lli_dir_depth;
 
 	if (ll_sbi_has_encrypt(sbi) &&
 	    (IS_ENCRYPTED(parent) ||
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index c350440..2139f88 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -183,6 +183,10 @@ struct ll_inode_info {
 			pid_t				lli_opendir_pid;
 			/* directory depth to ROOT */
 			unsigned short			lli_dir_depth;
+			/* directory depth to ancestor whose default LMV is
+			 * inherited.
+			 */
+			unsigned short			lli_inherit_depth;
 			/* stat will try to access statahead entries or start
 			 * statahead if this flag is set, and this flag will be
 			 * set upon dir open, and cleared when dir is closed,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index d947ede..dee2e51 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1561,6 +1561,7 @@ static void ll_update_default_lsm_md(struct inode *inode, struct lustre_md *md)
 				lmv_free_memmd(lli->lli_default_lsm_md);
 				lli->lli_default_lsm_md = NULL;
 			}
+			lli->lli_inherit_depth = 0;
 			up_write(&lli->lli_lsm_sem);
 		}
 		return;
@@ -2648,9 +2649,34 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	return 0;
 }
 
+/* child default LMV is inherited from parent */
+static inline bool ll_default_lmv_inherited(struct lmv_stripe_md *pdmv,
+					    struct lmv_stripe_md *cdmv)
+{
+	if (!pdmv || !cdmv)
+		return false;
+
+	if (pdmv->lsm_md_magic != cdmv->lsm_md_magic ||
+	    pdmv->lsm_md_stripe_count != cdmv->lsm_md_stripe_count ||
+	    pdmv->lsm_md_master_mdt_index != cdmv->lsm_md_master_mdt_index ||
+	    pdmv->lsm_md_hash_type != cdmv->lsm_md_hash_type)
+		return false;
+
+	if (cdmv->lsm_md_max_inherit !=
+	    lmv_inherit_next(pdmv->lsm_md_max_inherit))
+		return false;
+
+	if (cdmv->lsm_md_max_inherit_rr !=
+	    lmv_inherit_rr_next(pdmv->lsm_md_max_inherit_rr))
+		return false;
+
+	return true;
+}
+
 /* update directory depth to ROOT, called after LOOKUP lock is fetched. */
 void ll_update_dir_depth(struct inode *dir, struct inode *inode)
 {
+	struct ll_inode_info *plli;
 	struct ll_inode_info *lli;
 
 	if (!S_ISDIR(inode->i_mode))
@@ -2659,10 +2685,26 @@ void ll_update_dir_depth(struct inode *dir, struct inode *inode)
 	if (inode == dir)
 		return;
 
+	plli = ll_i2info(dir);
 	lli = ll_i2info(inode);
-	lli->lli_dir_depth = ll_i2info(dir)->lli_dir_depth + 1;
-	CDEBUG(D_INODE, DFID" depth %hu\n",
-	       PFID(&lli->lli_fid), lli->lli_dir_depth);
+	lli->lli_dir_depth = plli->lli_dir_depth + 1;
+	if (plli->lli_default_lsm_md && lli->lli_default_lsm_md) {
+		down_read(&plli->lli_lsm_sem);
+		down_read(&lli->lli_lsm_sem);
+		if (ll_default_lmv_inherited(plli->lli_default_lsm_md,
+					     lli->lli_default_lsm_md))
+			lli->lli_inherit_depth =
+				plli->lli_inherit_depth + 1;
+		else
+			lli->lli_inherit_depth = 0;
+		up_read(&lli->lli_lsm_sem);
+		up_read(&plli->lli_lsm_sem);
+	} else {
+		lli->lli_inherit_depth = 0;
+	}
+
+	CDEBUG(D_INODE, DFID" depth %hu default LMV depth %hu\n",
+	       PFID(&lli->lli_fid), lli->lli_dir_depth, lli->lli_inherit_depth);
 }
 
 void ll_truncate_inode_pages_final(struct inode *inode)
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index cc7b243..2215dd8 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1496,7 +1496,7 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir)
 	struct ll_inode_info *lli = ll_i2info(dir);
 	struct lmv_stripe_md *lsm;
 
-	op_data->op_dir_depth = lli->lli_dir_depth;
+	op_data->op_dir_depth = lli->lli_inherit_depth ?: lli->lli_dir_depth;
 
 	/* parent directory is striped */
 	if (unlikely(lli->lli_lsm_md))
@@ -1635,7 +1635,7 @@ static int ll_new_node(struct inode *dir, struct dentry *dchild,
 			from_kuid(&init_user_ns, current_fsuid()),
 			from_kgid(&init_user_ns, current_fsgid()),
 			current_cap(), rdev, &request);
-#if OBD_OCD_VERSION(2, 14, 58, 0) > LUSTRE_VERSION_CODE
+#if OBD_OCD_VERSION(2, 14, 58, 0) < LUSTRE_VERSION_CODE
 	/*
 	 * server < 2.12.58 doesn't pack default LMV in intent_getattr reply,
 	 * fetch default LMV here.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [lustre-devel] [PATCH 32/32] lustre: ldlm: Prioritize blocking callbacks
  2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
                   ` (30 preceding siblings ...)
  2022-08-04  1:38 ` [lustre-devel] [PATCH 31/32] lustre: llite: pass dmv inherit depth instead of dir depth James Simmons
@ 2022-08-04  1:38 ` James Simmons
  31 siblings, 0 replies; 33+ messages in thread
From: James Simmons @ 2022-08-04  1:38 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

The current code places bl_ast lock callbacks at the end of
the global BL callback queue.  This is bad because it
causes urgent requests from the server to wait behind
non-urgent cleanup tasks to keep lru_size at the right
level.

This can lead to evictions if there is a large queue of
items in the global queue so the callback is not serviced
in a timely manner.

Put bl_ast callbacks on the priority queue so they do not
wait behind the background traffic.

Add some additional debug in this area.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15821
Lustre-commit: 2d59294d52b696125 ("LU-15821 ldlm: Prioritize blocking callbacks")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47215
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lockd.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index 04fe92e..9f89766 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -94,6 +94,8 @@ struct ldlm_bl_pool {
 	atomic_t		blp_busy_threads;
 	int			blp_min_threads;
 	int			blp_max_threads;
+	int			blp_total_locks;
+	int			blp_total_blwis;
 };
 
 struct ldlm_bl_work_item {
@@ -399,19 +401,39 @@ static int __ldlm_bl_to_thread(struct ldlm_bl_work_item *blwi,
 			       enum ldlm_cancel_flags cancel_flags)
 {
 	struct ldlm_bl_pool *blp = ldlm_state->ldlm_bl_pool;
+	char *prio = "regular";
+	int count;
 
 	spin_lock(&blp->blp_lock);
-	if (blwi->blwi_lock && ldlm_is_discard_data(blwi->blwi_lock)) {
-		/* add LDLM_FL_DISCARD_DATA requests to the priority list */
+	/* cannot access blwi after added to list and lock is dropped */
+	count = blwi->blwi_lock ? 1 : blwi->blwi_count;
+
+	/* if the server is waiting on a lock to be cancelled (bl_ast), this is
+	 * an urgent request and should go in the priority queue so it doesn't
+	 * get stuck behind non-priority work (eg, lru size management)
+	 *
+	 * We also prioritize discard_data, which is for eviction handling
+	 */
+	if (blwi->blwi_lock &&
+	    (ldlm_is_discard_data(blwi->blwi_lock) ||
+	     ldlm_is_bl_ast(blwi->blwi_lock))) {
 		list_add_tail(&blwi->blwi_entry, &blp->blp_prio_list);
+		prio = "priority";
 	} else {
 		/* other blocking callbacks are added to the regular list */
 		list_add_tail(&blwi->blwi_entry, &blp->blp_list);
 	}
+	blp->blp_total_locks += count;
+	blp->blp_total_blwis++;
 	spin_unlock(&blp->blp_lock);
 
 	wake_up(&blp->blp_waitq);
 
+	/* unlocked read of blp values is intentional - OK for debug */
+	CDEBUG(D_DLMTRACE,
+	       "added %d/%d locks to %s blp list, %d blwis in pool\n",
+	       count, blp->blp_total_locks, prio, blp->blp_total_blwis);
+
 	/*
 	 * Can not check blwi->blwi_flags as blwi could be already freed in
 	 * LCF_ASYNC mode
@@ -772,6 +794,17 @@ static int ldlm_bl_get_work(struct ldlm_bl_pool *blp,
 	spin_unlock(&blp->blp_lock);
 	*p_blwi = blwi;
 
+	/* intentional unlocked read of blp values - OK for debug */
+	if (blwi) {
+		CDEBUG(D_DLMTRACE,
+		       "Got %d locks of %d total in blp.  (%d blwis in pool)\n",
+		       blwi->blwi_lock ? 1 : blwi->blwi_count,
+		       blp->blp_total_locks, blp->blp_total_blwis);
+	} else {
+		CDEBUG(D_DLMTRACE,
+		       "No blwi found in queue (no bl locks in queue)\n");
+	}
+
 	return (*p_blwi || *p_exp) ? 1 : 0;
 }
 
@@ -1126,6 +1159,8 @@ static int ldlm_setup(void)
 	init_waitqueue_head(&blp->blp_waitq);
 	atomic_set(&blp->blp_num_threads, 0);
 	atomic_set(&blp->blp_busy_threads, 0);
+	blp->blp_total_locks = 0;
+	blp->blp_total_blwis = 0;
 
 	if (ldlm_num_threads == 0) {
 		blp->blp_min_threads = LDLM_NTHRS_INIT;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-08-04  1:40 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-04  1:37 [lustre-devel] [PATCH 00/32] lustre: Update to OpenSFS as of Aug 3 2022 James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 01/32] lustre: mdc: Remove entry from list before freeing James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 02/32] lustre: flr: Don't assume RDONLY implies SOM James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 03/32] lustre: echo: remove client operations from echo objects James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 04/32] lustre: clio: remove cl_page_export() and cl_page_is_vmlocked() James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 05/32] lustre: clio: remove cpo_own and cpo_disown James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 06/32] lustre: clio: remove cpo_assume, cpo_unassume, cpo_fini James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 07/32] lustre: enc: enc-unaware clients get ENOKEY if file not found James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 08/32] lnet: socklnd: Duplicate ksock_conn_cb James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 09/32] lustre: llite: enforce ROOT default on subdir mount James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 10/32] lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 11/32] lustre: som: disabling xattr cache for LSOM on client James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 12/32] lnet: discard some peer_ni lookup functions James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 13/32] lnet: change lnet_*_peer_ni to take struct lnet_nid James Simmons
2022-08-04  1:37 ` [lustre-devel] [PATCH 14/32] lnet: Ensure round robin across nets James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 15/32] lustre: llite: dont restart directIO with IOCB_NOWAIT James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 16/32] lustre: sec: handle read-only flag James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 17/32] lustre: llog: Add LLOG_SKIP_PLAIN to skip llog plain James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 18/32] lustre: llite: add projid to debug logs James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 19/32] lnet: asym route inconsistency warning James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 20/32] lnet: libcfs: debugfs file_operation should have an owner James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 21/32] lustre: client: able to cleanup devices manually James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 22/32] lustre: lmv: support striped LMVs James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 23/32] lnet: o2iblnd: add debug messages for IB James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 24/32] lnet: o2iblnd: debug message is missing a newline James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 25/32] lustre: quota: skip non-exist or inact tgt for lfs_quota James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 26/32] lustre: mdc: pack default LMV in open reply James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 27/32] lnet: Define KFILND network type James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 28/32] lnet: Adjust niov checks for large MD James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 29/32] lustre: ec: code to add support for M to N parity James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 30/32] lustre: llite: use max default EA size to get default LMV James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 31/32] lustre: llite: pass dmv inherit depth instead of dir depth James Simmons
2022-08-04  1:38 ` [lustre-devel] [PATCH 32/32] lustre: ldlm: Prioritize blocking callbacks James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.