lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021
@ 2021-05-15 13:05 James Simmons
  2021-05-15 13:05 ` [lustre-devel] [PATCH 01/13] lnet: Allow delayed sends James Simmons
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:05 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Next batch of patches from the OpenSFS tree ported the native Linux
client.

Andreas Dilger (1):
  lustre: lmv: qos stay on current MDT if less full

Bobi Jam (1):
  lustre: lov: correctly handling sub-lock init failure

Chris Horn (3):
  lnet: Allow delayed sends
  lnet: Local NI must be on same net as next-hop
  lnet: Correct the router ping interval calculation

Lai Siyao (2):
  lustre; obdclass: server qos penalty miscaculated
  lustre: lmv: add default LMV inherit depth

Nikitas Angelinas (1):
  lustre: ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec()

Oleg Drokin (1):
  lustre: llite: Introduce inode open heat counter

Sebastien Buisson (1):
  lustre: sec: rework includes for client encryption

Serguei Smirnov (1):
  lnet: socklnd: add conns_per_peer parameter

Wang Shilong (2):
  lustre: readahead: export pages directly without RA
  lustre: readahead: fix reserving for unaliged read

 fs/lustre/include/lu_object.h              |  30 +++++-
 fs/lustre/include/lustre_crypto.h          | 158 ++++++++++++++++-------------
 fs/lustre/include/lustre_lmv.h             |   8 +-
 fs/lustre/include/obd.h                    |  10 +-
 fs/lustre/llite/crypto.c                   |   6 +-
 fs/lustre/llite/dir.c                      |  20 ++--
 fs/lustre/llite/file.c                     |  92 +++++++++++++----
 fs/lustre/llite/llite_internal.h           |  50 +++++----
 fs/lustre/llite/llite_lib.c                |  11 +-
 fs/lustre/llite/lproc_llite.c              | 112 +++++++++++++++++++-
 fs/lustre/llite/namei.c                    |  37 ++++---
 fs/lustre/llite/rw.c                       |  13 ++-
 fs/lustre/llite/super25.c                  |   4 +-
 fs/lustre/llite/vvp_io.c                   |  18 ++--
 fs/lustre/lmv/lmv_obd.c                    |  84 ++++++++++++---
 fs/lustre/lov/lov_lock.c                   |   2 +
 fs/lustre/obdclass/lu_tgt_descs.c          |  36 +++----
 fs/lustre/osc/osc_request.c                |  12 +--
 fs/lustre/ptlrpc/pack_generic.c            |   5 +-
 fs/lustre/ptlrpc/sec_gc.c                  |   2 -
 include/linux/lnet/lib-lnet.h              |   1 +
 include/linux/lnet/lib-types.h             |   4 +-
 include/uapi/linux/lustre/lustre_user.h    |  37 ++++++-
 net/lnet/klnds/socklnd/socklnd.c           |  92 +++++++++++++++--
 net/lnet/klnds/socklnd/socklnd.h           |  23 ++++-
 net/lnet/klnds/socklnd/socklnd_cb.c        |   3 +-
 net/lnet/klnds/socklnd/socklnd_modparams.c |   9 ++
 net/lnet/lnet/lib-move.c                   |  34 +++----
 net/lnet/lnet/net_fault.c                  |  24 ++++-
 net/lnet/lnet/router.c                     |  57 +++++++----
 30 files changed, 726 insertions(+), 268 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 01/13] lnet: Allow delayed sends
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
@ 2021-05-15 13:05 ` James Simmons
  2021-05-15 13:05 ` [lustre-devel] [PATCH 02/13] lustre: lov: correctly handling sub-lock init failure James Simmons
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:05 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The net_delay_add has some code related to delaying sends, but it
isn't fully implemented. Modify lnet_post_send_locked() to check
whether the message being sent matches a rule and should be delayed.

Fix some bugs with how the delay timers were set and checked.

HPE-bug-id: LUS-7651
WC-bug-id: https://jira.whamcloud.com/browse/LU-14627
Lustre-commit: ab14f3bc852e7081 ("LU-14627 lnet: Allow delayed sends")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43416
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  1 +
 net/lnet/lnet/lib-move.c      |  8 +++++++-
 net/lnet/lnet/net_fault.c     | 24 +++++++++++++++++++-----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 674f9d1..6b9e926 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -630,6 +630,7 @@ void lnet_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
 void lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
 		  int delayed, unsigned int offset,
 		  unsigned int mlen, unsigned int rlen);
+void lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg);
 
 struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni,
 				       struct lnet_msg *get_msg);
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index cb0943e..6d0637c 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -530,7 +530,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	msg->msg_hdr.payload_length = cpu_to_le32(len);
 }
 
-static void
+void
 lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
 {
 	void *priv = msg->msg_private;
@@ -733,6 +733,12 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		}
 	}
 
+	if (unlikely(!list_empty(&the_lnet.ln_delay_rules)) &&
+	    lnet_delay_rule_match_locked(&msg->msg_hdr, msg)) {
+		msg->msg_tx_delayed = 1;
+		return LNET_CREDIT_WAIT;
+	}
+
 	/* unset the tx_delay flag as we're going to send it now */
 	msg->msg_tx_delayed = 0;
 
diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index 515aa05..0d19da4 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -536,6 +536,7 @@ struct delay_daemon_data {
 {
 	struct lnet_fault_attr *attr = &rule->dl_attr;
 	bool delay;
+	time64_t now = ktime_get_seconds();
 
 	if (!lnet_fault_attr_match(attr, src, LNET_NID_ANY,
 				   dst, type, portal))
@@ -544,8 +545,6 @@ struct delay_daemon_data {
 	/* match this rule, check delay rate now */
 	spin_lock(&rule->dl_lock);
 	if (rule->dl_delay_time) { /* time based delay */
-		time64_t now = ktime_get_seconds();
-
 		rule->dl_stat.fs_count++;
 		delay = now >= rule->dl_delay_time;
 		if (delay) {
@@ -587,10 +586,11 @@ struct delay_daemon_data {
 	rule->dl_stat.u.delay.ls_delayed++;
 
 	list_add_tail(&msg->msg_list, &rule->dl_msg_list);
-	msg->msg_delay_send = ktime_get_seconds() + attr->u.delay.la_latency;
+	msg->msg_delay_send = now + attr->u.delay.la_latency;
 	if (rule->dl_msg_send == -1) {
 		rule->dl_msg_send = msg->msg_delay_send;
-		mod_timer(&rule->dl_timer, jiffies + rule->dl_msg_send * HZ);
+		mod_timer(&rule->dl_timer,
+			  jiffies + attr->u.delay.la_latency * HZ);
 	}
 
 	spin_unlock(&rule->dl_lock);
@@ -662,7 +662,8 @@ struct delay_daemon_data {
 		msg = list_first_entry(&rule->dl_msg_list,
 				       struct lnet_msg, msg_list);
 		rule->dl_msg_send = msg->msg_delay_send;
-		mod_timer(&rule->dl_timer, jiffies + rule->dl_msg_send * HZ);
+		mod_timer(&rule->dl_timer,
+			  jiffies + (msg->msg_delay_send - now) * HZ);
 	}
 	spin_unlock(&rule->dl_lock);
 }
@@ -678,6 +679,19 @@ struct delay_daemon_data {
 		int cpt;
 		int rc;
 
+		if (msg->msg_sending) {
+			/* Delayed send */
+			list_del_init(&msg->msg_list);
+			ni = msg->msg_txni;
+			CDEBUG(D_NET, "TRACE: msg %p %s -> %s : %s\n", msg,
+			       libcfs_nid2str(ni->ni_nid),
+			       libcfs_nid2str(msg->msg_txpeer->lpni_nid),
+			       lnet_msgtyp2str(msg->msg_type));
+			lnet_ni_send(ni, msg);
+			continue;
+		}
+
+		/* Delayed receive */
 		LASSERT(msg->msg_rxpeer);
 		LASSERT(msg->msg_rxni);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 02/13] lustre: lov: correctly handling sub-lock init failure
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
  2021-05-15 13:05 ` [lustre-devel] [PATCH 01/13] lnet: Allow delayed sends James Simmons
@ 2021-05-15 13:05 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 03/13] lnet: Local NI must be on same net as next-hop James Simmons
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:05 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

In lov_lock_sub_init(), if a sublock initialization fails, it needs to
bail out of the outer loop as well as the inner one.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14618
Lustre-commit: 1a5169f9962e254 ("LU-14618 lov: correctly handling sub-lock init failure")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43345
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_lock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/lustre/lov/lov_lock.c b/fs/lustre/lov/lov_lock.c
index efaca37..d137614 100644
--- a/fs/lustre/lov/lov_lock.c
+++ b/fs/lustre/lov/lov_lock.c
@@ -198,6 +198,8 @@ static struct lov_lock *lov_lock_sub_init(const struct lu_env *env,
 			lls->sub_initialized = 1;
 			nr++;
 		}
+		if (result < 0)
+			break;
 	}
 	LASSERT(ergo(result == 0, nr == lovlck->lls_nr));
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 03/13] lnet: Local NI must be on same net as next-hop
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
  2021-05-15 13:05 ` [lustre-devel] [PATCH 01/13] lnet: Allow delayed sends James Simmons
  2021-05-15 13:05 ` [lustre-devel] [PATCH 02/13] lustre: lov: correctly handling sub-lock init failure James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 04/13] lnet: socklnd: add conns_per_peer parameter James Simmons
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

When sending to a remote peer we need to restrict our selection of a
local NI to those on the same peer net as the next-hop.

The code currently selects a local NI on the peer net specified by the
lr_lnet field of the lnet_route returned by lnet_find_route_locked().
However, lnet_find_route_locked() may select a next-hop peer NI on any
local peer net - not just lr_lnet.

A redundant assignment to sd->sd_msg->msg_src_nid_param is also
removed. That variable is always set appropriately in
lnet_select_pathway().

HPE-bug-id: LUS-9095
WC-bug-id: https://jira.whamcloud.com/browse/LU-13781
Lustre-commit: 031c087f3847777c ("LU-13781 lnet: Local NI must be on same net as next-hop")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39352
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 6d0637c..3ae0209 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1907,7 +1907,6 @@ struct lnet_ni *
 			     struct lnet_peer **gw_peer)
 {
 	int rc;
-	u32 local_lnet;
 	struct lnet_peer *gw;
 	struct lnet_peer *lp;
 	struct lnet_peer_net *lpn;
@@ -1936,10 +1935,8 @@ struct lnet_ni *
 		if (gwni) {
 			gw = gwni->lpni_peer_net->lpn_peer;
 			lnet_peer_ni_decref_locked(gwni);
-			if (gw->lp_rtr_refcount) {
-				local_lnet = LNET_NIDNET(sd->sd_rtr_nid);
+			if (gw->lp_rtr_refcount)
 				route_found = true;
-			}
 		} else {
 			CWARN("No peer NI for gateway %s. Attempting to find an alternative route.\n",
 			       libcfs_nid2str(sd->sd_rtr_nid));
@@ -2054,31 +2051,26 @@ struct lnet_ni *
 
 		gw = best_route->lr_gateway;
 		LASSERT(gw == gwni->lpni_peer_net->lpn_peer);
-		local_lnet = best_route->lr_lnet;
 	}
 
 	/* Discover this gateway if it hasn't already been discovered.
 	 * This means we might delay the message until discovery has
 	 * completed
 	 */
-	sd->sd_msg->msg_src_nid_param = sd->sd_src_nid;
 	rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt);
 	if (rc)
 		return rc;
 
 	if (!sd->sd_best_ni) {
-		struct lnet_peer_net *lpeer;
-
-		lpeer = lnet_peer_get_net_locked(gw, local_lnet);
-		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpeer,
+		lpn = gwni->lpni_peer_net;
+		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpn,
 							       sd->sd_md_cpt);
-	}
-
-	if (!sd->sd_best_ni) {
-		CERROR("Internal Error. Expected local ni on %s but non found :%s\n",
-		       libcfs_net2str(local_lnet),
-		       libcfs_nid2str(sd->sd_src_nid));
-		return -EFAULT;
+		if (!sd->sd_best_ni) {
+			CERROR("Internal Error. Expected local ni on %s but non found :%s\n",
+			       libcfs_net2str(lpn->lpn_net_id),
+			       libcfs_nid2str(sd->sd_src_nid));
+			return -EFAULT;
+		}
 	}
 
 	*gw_lpni = gwni;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 04/13] lnet: socklnd: add conns_per_peer parameter
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 03/13] lnet: Local NI must be on same net as next-hop James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 05/13] lustre: readahead: export pages directly without RA James Simmons
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Introduce conns_per_peer ksocklnd module parameter.
In typed mode, this parameter shall control
the number of BULK_IN and BULK_OUT tcp connections,
while the number of CONTROL connections shall stay
at 1. In untyped mode, this parameter shall control
the number of untyped connections.
The default conns_per_peer is 1. Max is 127.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12815
Lustre-commit: 71b2476e4ddb95aa ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41056
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd.c           | 92 ++++++++++++++++++++++++++++--
 net/lnet/klnds/socklnd/socklnd.h           | 23 ++++++--
 net/lnet/klnds/socklnd/socklnd_cb.c        |  3 +-
 net/lnet/klnds/socklnd/socklnd_modparams.c |  9 +++
 4 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 4c79d1a..3a667e5 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -132,6 +132,9 @@ static int ksocknal_ip2index(struct sockaddr *addr, struct lnet_ni *ni)
 	conn_cb->ksnr_connected = 0;
 	conn_cb->ksnr_deleted = 0;
 	conn_cb->ksnr_conn_count = 0;
+	conn_cb->ksnr_ctrl_conn_count = 0;
+	conn_cb->ksnr_blki_conn_count = 0;
+	conn_cb->ksnr_blko_conn_count = 0;
 
 	return conn_cb;
 }
@@ -364,6 +367,73 @@ struct ksock_peer_ni *
 	return rc;
 }
 
+static unsigned int
+ksocknal_get_conn_count_by_type(struct ksock_conn_cb *conn_cb,
+				int type)
+{
+	unsigned int count = 0;
+
+	switch (type) {
+	case SOCKLND_CONN_CONTROL:
+		count = conn_cb->ksnr_ctrl_conn_count;
+		break;
+	case SOCKLND_CONN_BULK_IN:
+		count = conn_cb->ksnr_blki_conn_count;
+		break;
+	case SOCKLND_CONN_BULK_OUT:
+		count = conn_cb->ksnr_blko_conn_count;
+		break;
+	case SOCKLND_CONN_ANY:
+		count = conn_cb->ksnr_conn_count;
+		break;
+	default:
+		LBUG();
+		break;
+	}
+
+	return count;
+}
+
+static void
+ksocknal_incr_conn_count(struct ksock_conn_cb *conn_cb,
+			 int type)
+{
+	conn_cb->ksnr_conn_count++;
+
+	/* check if all connections of the given type got created */
+	switch (type) {
+	case SOCKLND_CONN_CONTROL:
+		conn_cb->ksnr_ctrl_conn_count++;
+		/* there's a single control connection per peer */
+		conn_cb->ksnr_connected |= BIT(type);
+		break;
+	case SOCKLND_CONN_BULK_IN:
+		conn_cb->ksnr_blki_conn_count++;
+		if (conn_cb->ksnr_blki_conn_count >=
+		    *ksocknal_tunables.ksnd_conns_per_peer)
+			conn_cb->ksnr_connected |= BIT(type);
+		break;
+	case SOCKLND_CONN_BULK_OUT:
+		conn_cb->ksnr_blko_conn_count++;
+		if (conn_cb->ksnr_blko_conn_count >=
+		    *ksocknal_tunables.ksnd_conns_per_peer)
+			conn_cb->ksnr_connected |= BIT(type);
+		break;
+	case SOCKLND_CONN_ANY:
+		if (conn_cb->ksnr_conn_count >=
+		    *ksocknal_tunables.ksnd_conns_per_peer)
+			conn_cb->ksnr_connected |= BIT(type);
+		break;
+	default:
+		LBUG();
+		break;
+	}
+
+	CDEBUG(D_NET, "Add conn type %d, ksnr_connected %x conns_per_peer %d\n",
+	       type, conn_cb->ksnr_connected,
+	       *ksocknal_tunables.ksnd_conns_per_peer);
+}
+
 static void
 ksocknal_associate_cb_conn_locked(struct ksock_conn_cb *conn_cb,
 				  struct ksock_conn *conn)
@@ -407,8 +477,7 @@ struct ksock_peer_ni *
 			iface->ksni_nroutes++;
 	}
 
-	conn_cb->ksnr_connected |= (1 << type);
-	conn_cb->ksnr_conn_count++;
+	ksocknal_incr_conn_count(conn_cb, type);
 
 	/* Successful connection => further attempts can
 	 * proceed immediately
@@ -728,6 +797,7 @@ struct ksock_peer_ni *
 	int rc2;
 	int rc;
 	int active;
+	int num_dup = 0;
 	char *warn = NULL;
 
 	active = !!conn_cb;
@@ -928,11 +998,14 @@ struct ksock_peer_ni *
 			    conn2->ksnc_type != conn->ksnc_type)
 				continue;
 
-			/*
-			 * Reply on a passive connection attempt so the peer
+			num_dup++;
+			if (num_dup < *ksocknal_tunables.ksnd_conns_per_peer)
+				continue;
+
+			/* Reply on a passive connection attempt so the peer_ni
 			 * realises we're connected.
 			 */
-			LASSERT(!rc);
+			LASSERT(rc == 0);
 			if (!active)
 				rc = EALREADY;
 
@@ -1148,7 +1221,14 @@ struct ksock_peer_ni *
 	if (conn_cb) {
 		/* dissociate conn from cb... */
 		LASSERT(!conn_cb->ksnr_deleted);
-		LASSERT(conn_cb->ksnr_connected & BIT(conn->ksnc_type));
+
+		/* connected bit is set only if all connections
+		 * of the given type got created
+		 */
+		if (ksocknal_get_conn_count_by_type(conn_cb, conn->ksnc_type) ==
+		    *ksocknal_tunables.ksnd_conns_per_peer)
+			LASSERT((conn_cb->ksnr_connected &
+				BIT(conn->ksnc_type)) != 0);
 
 		list_for_each_entry(conn2, &peer_ni->ksnp_conns, ksnc_list) {
 			if (conn2->ksnc_conn_cb == conn_cb &&
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 9f8fe8a..dac8559 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -163,6 +163,11 @@ struct ksock_tunables {
 	int		*ksnd_zc_recv_min_nfrags; /* minimum # of fragments to
 						   * enable ZC receive
 						   */
+	int		*ksnd_conns_per_peer;	/* for typed mode, yields:
+						 * 1 + 2*conns_per_peer total
+						 * for untyped:
+						 * conns_per_peer total
+						 */
 };
 
 struct ksock_net {
@@ -371,6 +376,8 @@ struct ksock_conn {
 							 */
 };
 
+#define SOCKNAL_CONN_COUNT_MAX_BITS	8	/* max conn count bits */
+
 struct ksock_conn_cb {
 	struct list_head	ksnr_connd_list;	/* chain on ksnr_connd_routes */
 	struct ksock_peer_ni   *ksnr_peer;		/* owning peer_ni */
@@ -389,8 +396,11 @@ struct ksock_conn_cb {
 							 * type
 							 */
 	unsigned int		ksnr_deleted:1;		/* been removed from peer_ni? */
-	int			ksnr_conn_count;	/* # conns established by this
-							 * route
+	unsigned int            ksnr_ctrl_conn_count:1; /* # conns by type */
+	unsigned int		ksnr_blki_conn_count:8;
+	unsigned int		ksnr_blko_conn_count:8;
+	int			ksnr_conn_count;	/* total # conns for
+							 * this cb
 							 */
 };
 
@@ -595,9 +605,12 @@ struct ksock_proto {
 
 static inline int ksocknal_timeout(void)
 {
-	return *ksocknal_tunables.ksnd_timeout ?
-		*ksocknal_tunables.ksnd_timeout :
-		lnet_get_lnd_timeout();
+	return *ksocknal_tunables.ksnd_timeout ?: lnet_get_lnd_timeout();
+}
+
+static inline int ksocknal_conns_per_peer(void)
+{
+	return *ksocknal_tunables.ksnd_conns_per_peer ?: 1;
 }
 
 int ksocknal_startup(struct lnet_ni *ni);
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index 43658b2..bfb98f5 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1818,7 +1818,8 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 			type = SOCKLND_CONN_ANY;
 		} else if (wanted & BIT(SOCKLND_CONN_CONTROL)) {
 			type = SOCKLND_CONN_CONTROL;
-		} else if (wanted & BIT(SOCKLND_CONN_BULK_IN)) {
+		} else if (wanted & BIT(SOCKLND_CONN_BULK_IN) &&
+			   conn_cb->ksnr_blki_conn_count <= conn_cb->ksnr_blko_conn_count) {
 			type = SOCKLND_CONN_BULK_IN;
 		} else {
 			LASSERT(wanted & BIT(SOCKLND_CONN_BULK_OUT));
diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c
index 017627f..bc772e4 100644
--- a/net/lnet/klnds/socklnd/socklnd_modparams.c
+++ b/net/lnet/klnds/socklnd/socklnd_modparams.c
@@ -139,6 +139,10 @@
 module_param(zc_recv_min_nfrags, int, 0644);
 MODULE_PARM_DESC(zc_recv_min_nfrags, "minimum # of fragments to enable ZC recv");
 
+static unsigned int conns_per_peer = 1;
+module_param(conns_per_peer, uint, 0444);
+MODULE_PARM_DESC(conns_per_peer, "number of connections per peer");
+
 #if SOCKNAL_VERSION_DEBUG
 static int protocol = 3;
 module_param(protocol, int, 0644);
@@ -177,6 +181,11 @@ int ksocknal_tunables_init(void)
 	ksocknal_tunables.ksnd_zc_min_payload = &zc_min_payload;
 	ksocknal_tunables.ksnd_zc_recv = &zc_recv;
 	ksocknal_tunables.ksnd_zc_recv_min_nfrags = &zc_recv_min_nfrags;
+	if (conns_per_peer > ((1 << SOCKNAL_CONN_COUNT_MAX_BITS) - 1)) {
+		CWARN("socklnd conns_per_peer is capped at %u.\n",
+		      (1 << SOCKNAL_CONN_COUNT_MAX_BITS) - 1);
+	}
+	ksocknal_tunables.ksnd_conns_per_peer     = &conns_per_peer;
 
 #if SOCKNAL_VERSION_DEBUG
 	ksocknal_tunables.ksnd_protocol = &protocol;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 05/13] lustre: readahead: export pages directly without RA
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 04/13] lnet: socklnd: add conns_per_peer parameter James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 06/13] lustre: readahead: fix reserving for unaliged read James Simmons
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

With Readahead disabled, @vpg_defer_uptodate should not
be set as we don't reserve credits for such read.
In vvp_page_completion_read() we will call ll_ra_count_put()
which makes @ra_cur_pages negative.

Fixes: 3b1dfe4b4b ("lustre: llite: fix to submit complete read block with ra disabled")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14616
Lustre-commit: 9f1c0bfd10d619a3 ("LU-14616 readahead: export pages directly without RA")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43338
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 08ab25d..8dcbef3 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -237,8 +237,10 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 	cl_page_assume(env, io, page);
 	vpg = cl2vvp_page(cl_object_page_slice(clob, page));
 	if (!vpg->vpg_defer_uptodate && !PageUptodate(vmpage)) {
-		vpg->vpg_defer_uptodate = 1;
-		vpg->vpg_ra_used = 0;
+		if (hint == MAYNEED) {
+			vpg->vpg_defer_uptodate = 1;
+			vpg->vpg_ra_used = 0;
+		}
 		cl_page_list_add(queue, page);
 	} else {
 		/* skip completed pages */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 06/13] lustre: readahead: fix reserving for unaliged read
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 05/13] lustre: readahead: export pages directly without RA James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 07/13] lustre: sec: rework includes for client encryption James Simmons
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

If read is [2K, 3K] on x86 platform, we only need
read one page, but it was calculated as 2 pages.

This could be problem, as we need reserve more
pages credits, vvp_page_completion_read() will only
free actual reading pages, which cause @ra_cur_pages
leaked.

Fixes: cc603a90cca ("lustre: llite: Fix page count for unaligned reads")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14616
Lustre-commit: 5e7e9240d27a4b74 ("LU-14616 readahead: fix reserving for unaliged read")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43377
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw.c     |  7 +++++++
 fs/lustre/llite/vvp_io.c | 18 ++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 8dcbef3..184e5e8 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -90,6 +90,13 @@ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi,
 	 * LRU pages, otherwise, it could cause deadlock.
 	 */
 	pages = min(sbi->ll_cache->ccc_lru_max >> 2, pages);
+	/**
+	 * if this happen, we reserve more pages than needed,
+	 * this will make us leak @ra_cur_pages, because
+	 * ll_ra_count_put() acutally freed @pages.
+	 */
+	if (WARN_ON_ONCE(pages_min > pages))
+		pages_min = pages;
 
 	/*
 	 * If read-ahead pages left are less than 1M, do not do read-ahead,
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index e98792b..12a28d9 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -798,6 +798,7 @@ static int vvp_io_read_start(const struct lu_env *env,
 	int exceed = 0;
 	int result;
 	struct iov_iter iter;
+	pgoff_t page_offset;
 
 	CLOBINVRNT(env, obj, vvp_object_invariant(obj));
 
@@ -839,15 +840,20 @@ static int vvp_io_read_start(const struct lu_env *env,
 	if (!vio->vui_ra_valid) {
 		vio->vui_ra_valid = true;
 		vio->vui_ra_start_idx = cl_index(obj, pos);
-		vio->vui_ra_pages = cl_index(obj, tot + PAGE_SIZE - 1);
-		/* If both start and end are unaligned, we read one more page
-		 * than the index math suggests.
-		 */
-		if ((pos & ~PAGE_MASK) != 0 && ((pos + tot) & ~PAGE_MASK) != 0)
+		vio->vui_ra_pages = 0;
+		page_offset = pos & ~PAGE_MASK;
+		if (page_offset) {
 			vio->vui_ra_pages++;
+			if (tot > PAGE_SIZE - page_offset)
+				tot -= (PAGE_SIZE - page_offset);
+			else
+				tot = 0;
+		}
+		vio->vui_ra_pages += (tot + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
 		CDEBUG(D_READA, "tot %zu, ra_start %lu, ra_count %lu\n",
-		       tot, vio->vui_ra_start_idx, vio->vui_ra_pages);
+		       vio->vui_tot_count, vio->vui_ra_start_idx,
+		       vio->vui_ra_pages);
 	}
 
 	/* BUG: 5972 */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 07/13] lustre: sec: rework includes for client encryption
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 06/13] lustre: readahead: fix reserving for unaliged read James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 08/13] lustre: ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec() James Simmons
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

Simplify includes for crypto, by not repeating stubs in case
CONFIG_FS_ENCRYPTION is not defined.
Expose encoding routines that are going to be used in the Lustre
code (both client and server sides) with filename encryption.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13717
Lustre-commit: 028281ae195927e9751 ("LU-13717 sec: rework includes for client encryption")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/43386
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_crypto.h | 158 +++++++++++++++++++++-----------------
 fs/lustre/llite/crypto.c          |   6 +-
 fs/lustre/llite/dir.c             |  20 ++---
 fs/lustre/llite/file.c            |  22 +++---
 fs/lustre/llite/llite_internal.h  |  18 +----
 fs/lustre/llite/llite_lib.c       |   6 +-
 fs/lustre/llite/namei.c           |  26 +++----
 fs/lustre/llite/super25.c         |   4 +-
 fs/lustre/osc/osc_request.c       |  12 +--
 9 files changed, 135 insertions(+), 137 deletions(-)

diff --git a/fs/lustre/include/lustre_crypto.h b/fs/lustre/include/lustre_crypto.h
index 01b5e85..b19bb420 100644
--- a/fs/lustre/include/lustre_crypto.h
+++ b/fs/lustre/include/lustre_crypto.h
@@ -30,87 +30,101 @@
 #ifndef _LUSTRE_CRYPTO_H_
 #define _LUSTRE_CRYPTO_H_
 
+#include <linux/fscrypt.h>
+
 struct ll_sb_info;
+#ifdef CONFIG_FS_ENCRYPTION
 int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen,
 		    bool preload);
 bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi);
 bool ll_sbi_has_encrypt(struct ll_sb_info *sbi);
 void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set);
+#else
+static inline int ll_set_encflags(struct inode *inode, void *encctx,
+				  u32 encctxlen, bool preload)
+{
+	return 0;
+}
 
-#ifdef CONFIG_FS_ENCRYPTION
-#define __FS_HAS_ENCRYPTION 1
-#include <linux/fscrypt.h>
+static inline bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi)
+{
+	return false;
+}
 
-#define llcrypt_operations	fscrypt_operations
-#define llcrypt_symlink_data	fscrypt_symlink_data
-#define llcrypt_dummy_context_enabled(inode) \
-	fscrypt_dummy_context_enabled(inode)
-#define llcrypt_has_encryption_key(inode) fscrypt_has_encryption_key(inode)
-#define llcrypt_encrypt_pagecache_blocks(page, len, offs, gfp_flags)	\
-	fscrypt_encrypt_pagecache_blocks(page, len, offs, gfp_flags)
-#define llcrypt_encrypt_block_inplace(inode, page, len, offs, lblk, gfp_flags) \
-	fscrypt_encrypt_block_inplace(inode, page, len, offs, lblk, gfp_flags)
-#define llcrypt_decrypt_pagecache_blocks(page, len, offs)	\
-	fscrypt_decrypt_pagecache_blocks(page, len, offs)
-#define llcrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num)        \
-	fscrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num)
-#define llcrypt_inherit_context(parent, child, fs_data, preload)	\
-	fscrypt_inherit_context(parent, child, fs_data, preload)
-#define llcrypt_get_encryption_info(inode) fscrypt_get_encryption_info(inode)
-#define llcrypt_put_encryption_info(inode) fscrypt_put_encryption_info(inode)
-#define llcrypt_free_inode(inode)	   fscrypt_free_inode(inode)
-#define llcrypt_finalize_bounce_page(pagep)  fscrypt_finalize_bounce_page(pagep)
-#define llcrypt_file_open(inode, filp)	fscrypt_file_open(inode, filp)
-#define llcrypt_ioctl_set_policy(filp, arg)  fscrypt_ioctl_set_policy(filp, arg)
-#define llcrypt_ioctl_get_policy_ex(filp, arg)	\
-	fscrypt_ioctl_get_policy_ex(filp, arg)
-#define llcrypt_ioctl_add_key(filp, arg)	fscrypt_ioctl_add_key(filp, arg)
-#define llcrypt_ioctl_remove_key(filp, arg)  fscrypt_ioctl_remove_key(filp, arg)
-#define llcrypt_ioctl_remove_key_all_users(filp, arg)	\
-	fscrypt_ioctl_remove_key_all_users(filp, arg)
-#define llcrypt_ioctl_get_key_status(filp, arg)	\
-	fscrypt_ioctl_get_key_status(filp, arg)
-#define llcrypt_drop_inode(inode)	fscrypt_drop_inode(inode)
-#define llcrypt_prepare_rename(olddir, olddentry, newdir, newdentry, flags) \
-	fscrypt_prepare_rename(olddir, olddentry, newdir, newdentry, flags)
-#define llcrypt_prepare_link(old_dentry, dir, dentry)	\
-	fscrypt_prepare_link(old_dentry, dir, dentry)
-#define llcrypt_prepare_setattr(dentry, attr)		\
-	fscrypt_prepare_setattr(dentry, attr)
-#define llcrypt_set_ops(sb, cop)	fscrypt_set_ops(sb, cop)
-#else /* !CONFIG_FS_ENCRYPTION */
-#undef IS_ENCRYPTED
-#define IS_ENCRYPTED(x)	0
-#define llcrypt_dummy_context_enabled(inode)	NULL
-/* copied from include/linux/fscrypt.h */
-#define llcrypt_has_encryption_key(inode) false
-#define llcrypt_encrypt_pagecache_blocks(page, len, offs, gfp_flags)	\
-	ERR_PTR(-EOPNOTSUPP)
-#define llcrypt_encrypt_block_inplace(inode, page, len, offs, lblk, gfp_flags) \
-	-EOPNOTSUPP
-#define llcrypt_decrypt_pagecache_blocks(page, len, offs)	-EOPNOTSUPP
-#define llcrypt_decrypt_block_inplace(inode, page, len, offs, lblk_num)	       \
-	-EOPNOTSUPP
-#define llcrypt_inherit_context(parent, child, fs_data, preload)     -EOPNOTSUPP
-#define llcrypt_get_encryption_info(inode)			-EOPNOTSUPP
-#define llcrypt_put_encryption_info(inode)			do {} while (0)
-#define llcrypt_free_inode(inode)				do {} while (0)
-#define llcrypt_finalize_bounce_page(pagep)			do {} while (0)
-static inline int llcrypt_file_open(struct inode *inode, struct file *filp)
+static inline bool ll_sbi_has_encrypt(struct ll_sb_info *sbi)
 {
-	return IS_ENCRYPTED(inode) ? -EOPNOTSUPP : 0;
+	return false;
 }
-#define llcrypt_ioctl_set_policy(filp, arg)			-EOPNOTSUPP
-#define llcrypt_ioctl_get_policy_ex(filp, arg)			-EOPNOTSUPP
-#define llcrypt_ioctl_add_key(filp, arg)			-EOPNOTSUPP
-#define llcrypt_ioctl_remove_key(filp, arg)			-EOPNOTSUPP
-#define llcrypt_ioctl_remove_key_all_users(filp, arg)		-EOPNOTSUPP
-#define llcrypt_ioctl_get_key_status(filp, arg)			-EOPNOTSUPP
-#define llcrypt_drop_inode(inode)				 0
-#define llcrypt_prepare_rename(olddir, olddentry, newdir, newdentry, flags)    0
-#define llcrypt_prepare_link(old_dentry, dir, dentry)		 0
-#define llcrypt_prepare_setattr(dentry, attr)			 0
-#define llcrypt_set_ops(sb, cop)				do {} while (0)
-#endif /* CONFIG_FS_ENCRYPTION */
+
+static inline void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set) { }
+#endif
+
+/* Encoding/decoding routines inspired from yEnc principles.
+ * We just take care of a few critical characters:
+ * NULL, LF, CR, /, DEL and =.
+ * If such a char is found, it is replaced with '=' followed by
+ * the char value + 64.
+ * All other chars are left untouched.
+ * Efficiency of this encoding depends on the occurences of the
+ * critical chars, but statistically on binary data it can be much higher
+ * than base64 for instance.
+ */
+static inline int critical_encode(const u8 *src, int len, char *dst)
+{
+	u8 *p = (u8 *)src, *q = dst;
+
+	while (p - src < len) {
+		/* escape NULL, LF, CR, /, DEL and = */
+		if (unlikely(*p == 0x0 || *p == 0xA || *p == 0xD ||
+			     *p == '/' || *p == 0x7F || *p == '=')) {
+			*(q++) = '=';
+			*(q++) = *(p++) + 64;
+		} else {
+			*(q++) = *(p++);
+		}
+	}
+
+	return (char *)q - dst;
+}
+
+/* returns the number of chars encoding would produce */
+static inline int critical_chars(const u8 *src, int len)
+{
+	u8 *p = (u8 *)src;
+	int newlen = len;
+
+	while (p - src < len) {
+		/* NULL, LF, CR, /, DEL and = cost an additional '=' */
+		if (unlikely(*p == 0x0 || *p == 0xA || *p == 0xD ||
+			     *p == '/' || *p == 0x7F || *p == '='))
+			newlen++;
+		p++;
+	}
+
+	return newlen;
+}
+
+/* decoding routine - returns the number of chars in output */
+static inline int critical_decode(const u8 *src, int len, char *dst)
+{
+	u8 *p = (u8 *)src, *q = dst;
+
+	while (p - src < len) {
+		if (unlikely(*p == '=')) {
+			*(q++) = *(++p) - 64;
+			p++;
+		} else {
+			*(q++) = *(p++);
+		}
+	}
+
+	return (char *)q - dst;
+}
+
+/* Extracts the second-to-last ciphertext block */
+#define LLCRYPT_FNAME_DIGEST(name, len)					\
+	((name) + round_down((len) - FS_CRYPTO_BLOCK_SIZE - 1,		\
+			     FS_CRYPTO_BLOCK_SIZE))
+#define LLCRYPT_FNAME_DIGEST_SIZE	FS_CRYPTO_BLOCK_SIZE
 
 #endif /* _LUSTRE_CRYPTO_H_ */
diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c
index 8bbb766..34d0ad1 100644
--- a/fs/lustre/llite/crypto.c
+++ b/fs/lustre/llite/crypto.c
@@ -64,7 +64,7 @@ int ll_set_encflags(struct inode *inode, void *encctx, u32 encctxlen,
 	if (rc)
 		return rc;
 
-	return preload ? llcrypt_get_encryption_info(inode) : 0;
+	return preload ? fscrypt_get_encryption_info(inode) : 0;
 }
 
 /* ll_set_context has 2 distinct behaviors, depending on the value of inode
@@ -143,7 +143,7 @@ void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set)
 
 static bool ll_empty_dir(struct inode *inode)
 {
-	/* used by llcrypt_ioctl_set_policy(), because a policy can only be set
+	/* used by fscrypt_ioctl_set_policy(), because a policy can only be set
 	 * on an empty dir.
 	 */
 	/* Here we choose to return true, meaning we always call .set_context.
@@ -153,7 +153,7 @@ static bool ll_empty_dir(struct inode *inode)
 	return true;
 }
 
-const struct llcrypt_operations lustre_cryptops = {
+const struct fscrypt_operations lustre_cryptops = {
 	.key_prefix		= "lustre:",
 	.get_context		= ll_get_context,
 	.set_context		= ll_set_context,
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 06ca329..13676c1 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -450,11 +450,11 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 
 	if (ll_sbi_has_encrypt(sbi) &&
 	    (IS_ENCRYPTED(parent) ||
-	     unlikely(llcrypt_dummy_context_enabled(parent)))) {
-		err = llcrypt_get_encryption_info(parent);
+	     unlikely(fscrypt_dummy_context_enabled(parent)))) {
+		err = fscrypt_get_encryption_info(parent);
 		if (err)
 			goto out_op_data;
-		if (!llcrypt_has_encryption_key(parent)) {
+		if (!fscrypt_has_encryption_key(parent)) {
 			err = -ENOKEY;
 			goto out_op_data;
 		}
@@ -476,7 +476,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	}
 
 	if (encrypt) {
-		err = llcrypt_inherit_context(parent, NULL, op_data, false);
+		err = fscrypt_inherit_context(parent, NULL, op_data, false);
 		if (err)
 			goto out_op_data;
 	}
@@ -2149,28 +2149,28 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case FS_IOC_SET_ENCRYPTION_POLICY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_set_policy(file, (const void __user *)arg);
+		return fscrypt_ioctl_set_policy(file, (const void __user *)arg);
 	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
 	case FS_IOC_ADD_ENCRYPTION_KEY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_add_key(file, (void __user *)arg);
+		return fscrypt_ioctl_add_key(file, (void __user *)arg);
 	case FS_IOC_REMOVE_ENCRYPTION_KEY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_remove_key(file, (void __user *)arg);
+		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
 	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_remove_key_all_users(file,
+		return fscrypt_ioctl_remove_key_all_users(file,
 							  (void __user *)arg);
 	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_get_key_status(file, (void __user *)arg);
+		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
 #endif
 	default:
 		return obd_iocontrol(cmd, sbi->ll_dt_exp, 0, NULL,
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 78f3469..ffddec6 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -443,7 +443,7 @@ static inline int ll_dom_readpage(void *data, struct page *page)
 	kunmap_atomic(kaddr);
 
 	if (inode && IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode)) {
-		if (!llcrypt_has_encryption_key(inode)) {
+		if (!fscrypt_has_encryption_key(inode)) {
 			CDEBUG(D_SEC, "no enc key for " DFID "\n",
 			       PFID(ll_inode2fid(inode)));
 		} else {
@@ -456,7 +456,7 @@ static inline int ll_dom_readpage(void *data, struct page *page)
 					   LUSTRE_ENCRYPTION_UNIT_SIZE) == 0)
 					break;
 
-				rc = llcrypt_decrypt_pagecache_blocks(page,
+				rc = fscrypt_decrypt_pagecache_blocks(page,
 								      LUSTRE_ENCRYPTION_UNIT_SIZE,
 								      0);
 				if (rc)
@@ -776,7 +776,7 @@ int ll_file_open(struct inode *inode, struct file *file)
 	file->private_data = NULL; /* prevent ll_local_open assertion */
 
 	if (S_ISREG(inode->i_mode)) {
-		rc = llcrypt_file_open(inode, file);
+		rc = fscrypt_file_open(inode, file);
 		if (rc)
 			goto out_nofiledata;
 	}
@@ -4063,28 +4063,28 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 	case FS_IOC_SET_ENCRYPTION_POLICY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_set_policy(file, (const void __user *)arg);
+		return fscrypt_ioctl_set_policy(file, (const void __user *)arg);
 	case FS_IOC_GET_ENCRYPTION_POLICY_EX:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_get_policy_ex(file, (void __user *)arg);
+		return fscrypt_ioctl_get_policy_ex(file, (void __user *)arg);
 	case FS_IOC_ADD_ENCRYPTION_KEY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_add_key(file, (void __user *)arg);
+		return fscrypt_ioctl_add_key(file, (void __user *)arg);
 	case FS_IOC_REMOVE_ENCRYPTION_KEY:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_remove_key(file, (void __user *)arg);
+		return fscrypt_ioctl_remove_key(file, (void __user *)arg);
 	case FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_remove_key_all_users(file,
+		return fscrypt_ioctl_remove_key_all_users(file,
 							  (void __user *)arg);
 	case FS_IOC_GET_ENCRYPTION_KEY_STATUS:
 		if (!ll_sbi_has_encrypt(ll_i2sbi(inode)))
 			return -EOPNOTSUPP;
-		return llcrypt_ioctl_get_key_status(file, (void __user *)arg);
+		return fscrypt_ioctl_get_key_status(file, (void __user *)arg);
 #endif
 
 	case LL_IOC_UNLOCK_FOREIGN: {
@@ -4551,10 +4551,10 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum,
 	}
 
 	if (IS_ENCRYPTED(child_inode)) {
-		rc = llcrypt_get_encryption_info(child_inode);
+		rc = fscrypt_get_encryption_info(child_inode);
 		if (rc)
 			goto out_iput;
-		if (!llcrypt_has_encryption_key(child_inode)) {
+		if (!fscrypt_has_encryption_key(child_inode)) {
 			CDEBUG(D_SEC, "no enc key for "DFID"\n",
 			       PFID(ll_inode2fid(child_inode)));
 			rc = -ENOKEY;
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index b3e8a96..03d2796 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1681,25 +1681,9 @@ static inline struct pcc_super *ll_info2pccs(struct ll_inode_info *lli)
 	return ll_i2pccs(ll_info2i(lli));
 }
 
-#ifdef CONFIG_FS_ENCRYPTION
 /* crypto.c */
-extern const struct llcrypt_operations lustre_cryptops;
-
-#else /* !CONFIG_FS_ENCRYPTION */
-inline bool ll_sbi_has_test_dummy_encryption(struct ll_sb_info *sbi)
-{
-	return false;
-}
+extern const struct fscrypt_operations lustre_cryptops;
 
-inline bool ll_sbi_has_encrypt(struct ll_sb_info *sbi)
-{
-	return false;
-}
-
-inline void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set)
-{
-}
-#endif /* !CONFIG_FS_ENCRYPTION */
 /* llite/llite_foreign.c */
 int ll_manage_foreign(struct inode *inode, struct lustre_md *lmd);
 bool ll_foreign_is_openable(struct dentry *dentry, unsigned int flags);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 1b3eef0..ada2b625c 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -616,8 +616,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 #if THREAD_SIZE >= 8192 /*b=17630*/
 	sb->s_export_op = &lustre_export_operations;
 #endif
-	llcrypt_set_ops(sb, &lustre_cryptops);
 
+	fscrypt_set_ops(sb, &lustre_cryptops);
 	/* make root inode
 	 * XXX: move this to after cbd setup?
 	 */
@@ -1682,7 +1682,7 @@ void ll_clear_inode(struct inode *inode)
 	 */
 	cl_inode_fini(inode);
 
-	llcrypt_put_encryption_info(inode);
+	fscrypt_put_encryption_info(inode);
 }
 
 static int ll_md_setattr(struct dentry *dentry, struct md_op_data *op_data)
@@ -2140,7 +2140,7 @@ int ll_setattr(struct dentry *de, struct iattr *attr)
 	enum op_xvalid xvalid = 0;
 	int rc;
 
-	rc = llcrypt_prepare_setattr(de, attr);
+	rc = fscrypt_prepare_setattr(de, attr);
 	if (rc)
 		return rc;
 
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 2da33d0..658da49 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -674,7 +674,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 					      ll_i2sbi(inode)->ll_fsname,
 					      PFID(ll_inode2fid(inode)), rc);
 				} else if (encrypt) {
-					rc = llcrypt_get_encryption_info(inode);
+					rc = fscrypt_get_encryption_info(inode);
 					if (rc)
 						CDEBUG(D_SEC,
 						       "cannot get enc info for " DFID ": rc = %d\n",
@@ -744,10 +744,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 			d_lustre_revalidate(*de);
 
 		if (encrypt) {
-			rc = llcrypt_get_encryption_info(inode);
+			rc = fscrypt_get_encryption_info(inode);
 			if (rc)
 				goto out;
-			if (!llcrypt_has_encryption_key(inode)) {
+			if (!fscrypt_has_encryption_key(inode)) {
 				rc = -ENOKEY;
 				goto out;
 			}
@@ -878,7 +878,7 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 			*secctxlen = 0;
 	}
 	if (it->it_op & IT_CREAT && encrypt) {
-		rc = llcrypt_inherit_context(parent, NULL, op_data, false);
+		rc = fscrypt_inherit_context(parent, NULL, op_data, false);
 		if (rc) {
 			retval = ERR_PTR(rc);
 			goto out;
@@ -1134,11 +1134,11 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 		/* in case of create, this is going to be a regular file because
 		 * we set S_IFREG bit on it->it_create_mode above
 		 */
-		rc = llcrypt_get_encryption_info(dir);
+		rc = fscrypt_get_encryption_info(dir);
 		if (rc)
 			goto out_release;
 		if (open_flags & O_CREAT) {
-			if (!llcrypt_has_encryption_key(dir)) {
+			if (!fscrypt_has_encryption_key(dir)) {
 				rc = -ENOKEY;
 				goto out_release;
 			}
@@ -1390,11 +1390,11 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 	if (ll_sbi_has_encrypt(sbi) &&
 	    ((IS_ENCRYPTED(dir) &&
 	    (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode))) ||
-	    (unlikely(llcrypt_dummy_context_enabled(dir)) && S_ISDIR(mode)))) {
-		err = llcrypt_get_encryption_info(dir);
+	    (unlikely(fscrypt_dummy_context_enabled(dir)) && S_ISDIR(mode)))) {
+		err = fscrypt_get_encryption_info(dir);
 		if (err)
 			goto err_exit;
-		if (!llcrypt_has_encryption_key(dir)) {
+		if (!fscrypt_has_encryption_key(dir)) {
 			err = -ENOKEY;
 			goto err_exit;
 		}
@@ -1402,7 +1402,7 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 	}
 
 	if (encrypt) {
-		err = llcrypt_inherit_context(dir, NULL, op_data, false);
+		err = fscrypt_inherit_context(dir, NULL, op_data, false);
 		if (err)
 			goto err_exit;
 	}
@@ -1504,7 +1504,7 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 	d_instantiate(dentry, inode);
 
 	if (encrypt) {
-		err = llcrypt_inherit_context(dir, inode, NULL, true);
+		err = fscrypt_inherit_context(dir, inode, NULL, true);
 		if (err)
 			goto err_exit;
 	}
@@ -1740,7 +1740,7 @@ static int ll_link(struct dentry *old_dentry, struct inode *dir,
 	       PFID(ll_inode2fid(src)), src, PFID(ll_inode2fid(dir)), dir,
 	       new_dentry);
 
-	err = llcrypt_prepare_link(old_dentry, dir, new_dentry);
+	err = fscrypt_prepare_link(old_dentry, dir, new_dentry);
 	if (err)
 		return err;
 
@@ -1785,7 +1785,7 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild,
 	if (unlikely(d_mountpoint(src_dchild) || d_mountpoint(tgt_dchild)))
 		return -EBUSY;
 
-	err = llcrypt_prepare_rename(src, src_dchild, tgt, tgt_dchild, flags);
+	err = fscrypt_prepare_rename(src, src_dchild, tgt, tgt_dchild, flags);
 	if (err)
 		return err;
 
diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c
index decfa2f..f50c23a 100644
--- a/fs/lustre/llite/super25.c
+++ b/fs/lustre/llite/super25.c
@@ -63,7 +63,7 @@ static void ll_inode_destroy_callback(struct rcu_head *head)
 	struct inode *inode = container_of(head, struct inode, i_rcu);
 	struct ll_inode_info *ptr = ll_i2info(inode);
 
-	llcrypt_free_inode(inode);
+	fscrypt_free_inode(inode);
 	kmem_cache_free(ll_inode_cachep, ptr);
 }
 
@@ -77,7 +77,7 @@ static int ll_drop_inode(struct inode *inode)
 	int drop = generic_drop_inode(inode);
 
 	if (!drop)
-		drop = llcrypt_drop_inode(inode);
+		drop = fscrypt_drop_inode(inode);
 
 	return drop;
 }
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index e49d73f..0d590ed 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1371,11 +1371,11 @@ static inline void osc_release_bounce_pages(struct brw_page **pga,
 
 	for (i = 0; i < page_count; i++) {
 		/* Bounce pages allocated by a call to
-		 * llcrypt_encrypt_pagecache_blocks() in osc_brw_prep_request()
+		 * fscrypt_encrypt_pagecache_blocks() in osc_brw_prep_request()
 		 * are identified thanks to the PageChecked flag.
 		 */
 		if (PageChecked(pga[i]->pg))
-			llcrypt_finalize_bounce_page(&pga[i]->pg);
+			fscrypt_finalize_bounce_page(&pga[i]->pg);
 		pga[i]->count -= pga[i]->bp_count_diff;
 		pga[i]->off += pga[i]->bp_off_diff;
 	}
@@ -1463,7 +1463,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 				pg->pg->index = pg->off >> PAGE_SHIFT;
 			}
 			data_page =
-				llcrypt_encrypt_pagecache_blocks(pg->pg,
+				fscrypt_encrypt_pagecache_blocks(pg->pg,
 								 nunits, 0,
 								 GFP_NOFS);
 			if (directio) {
@@ -2145,7 +2145,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 	if (inode && IS_ENCRYPTED(inode)) {
 		int idx;
 
-		if (!llcrypt_has_encryption_key(inode)) {
+		if (!fscrypt_has_encryption_key(inode)) {
 			CDEBUG(D_SEC, "no enc key for ino %lu\n", inode->i_ino);
 			goto out;
 		}
@@ -2181,7 +2181,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 					for (i = offs;
 					     i < offs + LUSTRE_ENCRYPTION_UNIT_SIZE;
 					     i += blocksize, lblk_num++) {
-						rc = llcrypt_decrypt_block_inplace(inode,
+						rc = fscrypt_decrypt_block_inplace(inode,
 										   pg->pg,
 										   blocksize, i,
 										   lblk_num);
@@ -2189,7 +2189,7 @@ static int osc_brw_fini_request(struct ptlrpc_request *req, int rc)
 							break;
 					}
 				} else {
-					rc = llcrypt_decrypt_pagecache_blocks(pg->pg,
+					rc = fscrypt_decrypt_pagecache_blocks(pg->pg,
 									      LUSTRE_ENCRYPTION_UNIT_SIZE,
 									      offs);
 				}
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 08/13] lustre: ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec()
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 07/13] lustre: sec: rework includes for client encryption James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 09/13] lustre; obdclass: server qos penalty miscaculated James Simmons
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Nikitas Angelinas, Lustre Development List

From: Nikitas Angelinas <nikitas.angelinas@hpe.com>

sptlrpc_gc_del_sec() calls mutex_lock() which calls might_sleep(), so
the explicit might_sleep() call can be removed as redundant.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14628
Lustre-commit: c31fb42f9aa561ae ("LU-14628 ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec()")
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-on: https://review.whamcloud.com/43397
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/sec_gc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/lustre/ptlrpc/sec_gc.c b/fs/lustre/ptlrpc/sec_gc.c
index bc76323..fedcf2c 100644
--- a/fs/lustre/ptlrpc/sec_gc.c
+++ b/fs/lustre/ptlrpc/sec_gc.c
@@ -76,8 +76,6 @@ void sptlrpc_gc_del_sec(struct ptlrpc_sec *sec)
 	if (list_empty(&sec->ps_gc_list))
 		return;
 
-	might_sleep();
-
 	/* signal before list_del to make iteration in gc thread safe */
 	atomic_inc(&sec_gc_wait_del);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 09/13] lustre; obdclass: server qos penalty miscaculated
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 08/13] lustre: ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec() James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 10/13] lustre: lmv: add default LMV inherit depth James Simmons
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

Server qos penalty calculation uses active target count, but it
should use server count, which will make it larger than expected,
then weight of targets are often 0, and finally cause MDT0 is
often chosen in qos allocation.

Fixes: 3f2a3e1d4 ("lustre: obdclass: lu_tgt_descs cleanup")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13440
Lustre-commit: 0ccce7ecb72f847f ("LU-13440 obdclass: server qos penalty miscaculated")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43385
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/lu_tgt_descs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index cb62ce4..9f33d22 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -633,7 +633,7 @@ int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt,
 	ltq->ltq_penalty += ltq->ltq_penalty_per_obj *
 			    ltd->ltd_lov_desc.ld_active_tgt_count;
 	svr->lsq_penalty += svr->lsq_penalty_per_obj *
-			    ltd->ltd_lov_desc.ld_active_tgt_count;
+			    qos->lq_active_svr_count;
 
 	/* Decrease all MDS penalties */
 	list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 10/13] lustre: lmv: add default LMV inherit depth
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 09/13] lustre; obdclass: server qos penalty miscaculated James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 11/13] lustre: lmv: qos stay on current MDT if less full James Simmons
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

A new field "u8 lum_max_inherit" is added into struct lmv_user_md,
which represents the inherit depth of default LMV. It will be
decreased by 1 for subdirectories.

The valid value of lum_max_inherit is [0, 255]:
* 0 means unlimited inherit.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means it's not set.

A new field "u8 lum_max_inherit_rr" is added, if default stripe
offset is -1, lum_max_inherit_rr is non-zero, and system is balanced,
new directories are created in roundrobin mannner, otherwise they
are created on the MDT where their parents are located to avoid
creating remote directories. And similarly this value will be
decreased by 1 for each level of subdirectories.

The valid value of lum_max_inherit_rr is different:
* 0 means not set.
* 1 means inherit end.
* 250 is the max inherit depth.
* [251, 254] are reserved.
* 255 means unlimited inherit.

However for the user interface of "lfs", the valid value is [-1, 250]:
* -1 means unlimited inherit.
* 0 means not set.
* others are the same.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13440
Lustre-commit: 01d34a6b3b2e34f7 ("LU-13440 lmv: add default LMV inherit depth")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43131
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lu_object.h           | 24 ++++++++++++-
 fs/lustre/include/lustre_lmv.h          |  8 +++--
 fs/lustre/llite/namei.c                 |  4 +++
 fs/lustre/lmv/lmv_obd.c                 | 62 +++++++++++++++++++++++++++------
 fs/lustre/obdclass/lu_tgt_descs.c       | 16 ---------
 fs/lustre/ptlrpc/pack_generic.c         |  5 ++-
 include/uapi/linux/lustre/lustre_user.h | 37 +++++++++++++++++++-
 7 files changed, 124 insertions(+), 32 deletions(-)

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index a270631..3a71d6b 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -1537,11 +1537,33 @@ struct lu_tgt_descs {
 void lu_tgt_descs_fini(struct lu_tgt_descs *ltd);
 int ltd_add_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt);
 void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt);
-bool ltd_qos_is_usable(struct lu_tgt_descs *ltd);
 int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd);
 int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt,
 		   u64 *total_wt);
 
+/**
+ * Whether MDT inode and space usages are balanced.
+ */
+static inline bool ltd_qos_is_balanced(struct lu_tgt_descs *ltd)
+{
+	return !test_bit(LQ_DIRTY, &ltd->ltd_qos.lq_flags) &&
+	       test_bit(LQ_SAME_SPACE, &ltd->ltd_qos.lq_flags);
+}
+
+/**
+ * Whether QoS data is up-to-date and QoS can be applied.
+ */
+static inline bool ltd_qos_is_usable(struct lu_tgt_descs *ltd)
+{
+	if (ltd_qos_is_balanced(ltd))
+		return false;
+
+	if (ltd->ltd_lov_desc.ld_active_tgt_count < 2)
+		return false;
+
+	return true;
+}
+
 static inline struct lu_tgt_desc *ltd_first_tgt(struct lu_tgt_descs *ltd)
 {
 	int index;
diff --git a/fs/lustre/include/lustre_lmv.h b/fs/lustre/include/lustre_lmv.h
index aee8342..a74f0a5 100644
--- a/fs/lustre/include/lustre_lmv.h
+++ b/fs/lustre/include/lustre_lmv.h
@@ -46,6 +46,8 @@ struct lmv_stripe_md {
 	u32	lsm_md_stripe_count;
 	u32	lsm_md_master_mdt_index;
 	u32	lsm_md_hash_type;
+	u8	lsm_md_max_inherit;
+	u8	lsm_md_max_inherit_rr;
 	u32	lsm_md_layout_version;
 	u32	lsm_md_migrate_offset;
 	u32	lsm_md_migrate_hash;
@@ -119,11 +121,11 @@ static inline void lsm_md_dump(int mask, const struct lmv_stripe_md *lsm)
 	 * terminated string so only print LOV_MAXPOOLNAME bytes.
 	 */
 	CDEBUG(mask,
-	       "magic %#x stripe count %d master mdt %d hash type %#x version %d migrate offset %d migrate hash %#x pool %.*s\n",
+	       "magic %#x stripe count %d master mdt %d hash type %#x max inherit %hhu version %d migrate offset %d migrate hash %#x pool %.*s\n",
 	       lsm->lsm_md_magic, lsm->lsm_md_stripe_count,
 	       lsm->lsm_md_master_mdt_index, lsm->lsm_md_hash_type,
-	       lsm->lsm_md_layout_version, lsm->lsm_md_migrate_offset,
-	       lsm->lsm_md_migrate_hash,
+	       lsm->lsm_md_max_inherit, lsm->lsm_md_layout_version,
+	       lsm->lsm_md_migrate_offset, lsm->lsm_md_migrate_hash,
 	       LOV_MAXPOOLNAME, lsm->lsm_md_pool_name);
 
 	if (!lmv_dir_striped(lsm))
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 658da49..6ed2943 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1451,6 +1451,10 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 			md.default_lmv->lsm_md_master_mdt_index =
 				lum->lum_stripe_offset;
 			md.default_lmv->lsm_md_hash_type = lum->lum_hash_type;
+			md.default_lmv->lsm_md_max_inherit =
+				lum->lum_max_inherit;
+			md.default_lmv->lsm_md_max_inherit_rr =
+				lum->lum_max_inherit_rr;
 
 			err = ll_update_inode(dir, &md);
 			md_free_lustre_md(sbi->ll_md_exp, &md);
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 4fa441e..552ef07 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1695,6 +1695,22 @@ int lmv_old_layout_lookup(struct lmv_obd *lmv, struct md_op_data *op_data)
 	return rc;
 }
 
+static inline bool lmv_op_user_qos_mkdir(const struct md_op_data *op_data)
+{
+	const struct lmv_user_md *lum = op_data->op_data;
+
+	return (op_data->op_cli_flags & CLI_SET_MEA) && lum &&
+	       le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC &&
+	       le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT;
+}
+
+static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data)
+{
+	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
+
+	return lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT;
+}
+
 /* mkdir by QoS in two cases:
  * 1. 'lfs mkdir -i -1'
  * 2. parent default LMV master_mdt_index is -1
@@ -1704,27 +1720,38 @@ int lmv_old_layout_lookup(struct lmv_obd *lmv, struct md_op_data *op_data)
  */
 static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data)
 {
-	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
-	const struct lmv_user_md *lum = op_data->op_data;
-
 	if (op_data->op_code != LUSTRE_OPC_MKDIR)
 		return false;
 
 	if (lmv_dir_striped(op_data->op_mea1))
 		return false;
 
-	if (op_data->op_cli_flags & CLI_SET_MEA && lum &&
-	    (le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC ||
-	     le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC_SPECIFIC) &&
-	    le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT)
+	if (lmv_op_user_qos_mkdir(op_data))
 		return true;
 
-	if (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT)
+	if (lmv_op_default_qos_mkdir(op_data))
 		return true;
 
 	return false;
 }
 
+/* if default LMV is set, and its index is LMV_OFFSET_DEFAULT, and
+ * 1. max_inherit_rr is set and is not LMV_INHERIT_RR_NONE
+ * 2. or parent is ROOT
+ * mkdir roundrobin.
+ * NB, this also needs to check server is balanced, which is checked by caller.
+ */
+static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data)
+{
+	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
+
+	if (!lmv_op_default_qos_mkdir(op_data))
+		return false;
+
+	return lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE ||
+	       fid_is_root(&op_data->op_fid1);
+}
+
 /* 'lfs mkdir -i <specific_MDT>' */
 static inline bool lmv_op_user_specific_mkdir(const struct md_op_data *op_data)
 {
@@ -1746,6 +1773,7 @@ static inline bool lmv_op_user_specific_mkdir(const struct md_op_data *op_data)
 	       op_data->op_default_mea1->lsm_md_master_mdt_index !=
 			LMV_OFFSET_DEFAULT;
 }
+
 int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 		const void *data, size_t datalen, umode_t mode, uid_t uid,
 		gid_t gid, kernel_cap_t cap_effective, u64 rdev,
@@ -1793,11 +1821,23 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 		if (!tgt)
 			return -ENODEV;
 	} else if (lmv_op_qos_mkdir(op_data)) {
+		struct lmv_tgt_desc *tmp = tgt;
+
 		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds);
-		if (tgt == ERR_PTR(-EAGAIN))
-			tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds);
+		if (tgt == ERR_PTR(-EAGAIN)) {
+			if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
+			    !lmv_op_default_rr_mkdir(op_data) &&
+			    !lmv_op_user_qos_mkdir(op_data))
+				/* if it's not necessary, don't create remote
+				 * directory.
+				 */
+				tgt = tmp;
+			else
+				tgt = lmv_locate_tgt_rr(lmv, &op_data->op_mds);
+		}
 		if (IS_ERR(tgt))
 			return PTR_ERR(tgt);
+
 		/*
 		 * only update statfs after QoS mkdir, this means the cached
 		 * statfs may be stale, and current mkdir may not follow QoS
@@ -3110,6 +3150,8 @@ static inline int lmv_unpack_user_md(struct obd_export *exp,
 	lsm->lsm_md_stripe_count = le32_to_cpu(lmu->lum_stripe_count);
 	lsm->lsm_md_master_mdt_index = le32_to_cpu(lmu->lum_stripe_offset);
 	lsm->lsm_md_hash_type = le32_to_cpu(lmu->lum_hash_type);
+	lsm->lsm_md_max_inherit = lmu->lum_max_inherit;
+	lsm->lsm_md_max_inherit_rr = lmu->lum_max_inherit_rr;
 	lsm->lsm_md_pool_name[LOV_MAXPOOLNAME] = 0;
 
 	return 0;
diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index 9f33d22..83f4675 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -403,22 +403,6 @@ void ltd_del_tgt(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt)
 EXPORT_SYMBOL(ltd_del_tgt);
 
 /**
- * Whether QoS data is up-to-date and QoS can be applied.
- */
-bool ltd_qos_is_usable(struct lu_tgt_descs *ltd)
-{
-	if (!test_bit(LQ_DIRTY, &ltd->ltd_qos.lq_flags) &&
-	    test_bit(LQ_SAME_SPACE, &ltd->ltd_qos.lq_flags))
-		return false;
-
-	if (ltd->ltd_lov_desc.ld_active_tgt_count < 2)
-		return false;
-
-	return true;
-}
-EXPORT_SYMBOL(ltd_qos_is_usable);
-
-/**
  * Calculate penalties per-tgt and per-server
  *
  * Re-calculate penalties when the configuration changes, active targets
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index 5dbab3d..047573a 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -2067,7 +2067,10 @@ void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
 	__swab32s(&lum->lum_stripe_offset);
 	__swab32s(&lum->lum_hash_type);
 	__swab32s(&lum->lum_type);
-	BUILD_BUG_ON(!offsetof(typeof(*lum), lum_padding1));
+	/* lum_max_inherit and lum_max_inherit_rr do not need to be swabbed */
+	BUILD_BUG_ON(offsetof(typeof(*lum), lum_padding1) == 0);
+	BUILD_BUG_ON(offsetof(typeof(*lum), lum_padding2) == 0);
+	BUILD_BUG_ON(offsetof(typeof(*lum), lum_padding3) == 0);
 	switch (lum->lum_magic) {
 	case LMV_USER_MAGIC_SPECIFIC:
 		count = lum->lum_stripe_count;
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 542d2d3..bcb9f86 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -789,7 +789,11 @@ struct lmv_user_md_v1 {
 	__u32	lum_stripe_offset;	/* MDT idx for default dirstripe */
 	__u32	lum_hash_type;		/* Dir stripe policy */
 	__u32	lum_type;		/* LMV type: default */
-	__u32	lum_padding1;
+	__u8	lum_max_inherit;	/* inherit depth of default LMV */
+	__u8	lum_max_inherit_rr;	/* inherit depth of default LMV to
+					 * round-robin mkdir
+					 */
+	__u16	lum_padding1;
 	__u32	lum_padding2;
 	__u32	lum_padding3;
 	char	lum_pool_name[LOV_MAXPOOLNAME + 1];
@@ -815,6 +819,37 @@ enum lmv_type {
 	LMV_TYPE_DEFAULT = 0x0000,
 };
 
+/* lum_max_inherit will be decreased by 1 after each inheritance if it's not
+ * LMV_INHERIT_UNLIMITED or > LMV_INHERIT_MAX.
+ */
+enum {
+	/* for historical reason, 0 means unlimited inheritance */
+	LMV_INHERIT_UNLIMITED	= 0,
+	/* unlimited lum_max_inherit by default */
+	LMV_INHERIT_DEFAULT	= 0,
+	/* not inherit any more */
+	LMV_INHERIT_END		= 1,
+	/* max inherit depth */
+	LMV_INHERIT_MAX		= 250,
+	/* [251, 254] are reserved */
+	/* not set, or when inherit depth goes beyond end,  */
+	LMV_INHERIT_NONE	= 255,
+};
+
+enum {
+	/* not set, or when inherit_rr depth goes beyond end,  */
+	LMV_INHERIT_RR_NONE		= 0,
+	/* disable lum_max_inherit_rr by default */
+	LMV_INHERIT_RR_DEFAULT		= 0,
+	/* not inherit any more */
+	LMV_INHERIT_RR_END		= 1,
+	/* max inherit depth */
+	LMV_INHERIT_RR_MAX		= 250,
+	/* [251, 254] are reserved */
+	/* unlimited inheritance */
+	LMV_INHERIT_RR_UNLIMITED	= 255,
+};
+
 static inline int lmv_user_md_size(int stripes, int lmm_magic)
 {
 	int size = sizeof(struct lmv_user_md);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 11/13] lustre: lmv: qos stay on current MDT if less full
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 10/13] lustre: lmv: add default LMV inherit depth James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 12/13] lnet: Correct the router ping interval calculation James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 13/13] lustre: llite: Introduce inode open heat counter James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Keep "space balanced" subdirectories on the parent MDT if it is less
full than average, since it doesn't make sense to select another MDT
which may occasionally be *more* full.  This also reduces random
"MDT jumping" and needless remote directories.

Reduce the QOS threshold for space balanced LMV layouts, so that the
MDTs don't become too imbalanced before trying to fix the problem.

Change the LUSTRE_OP_MKDIR opcode to be 1 instead of 0, so it can
be seen that a valid opcode has been stored into the structure.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13439
Lustre-commit: 94da640afc0f ("LU-13439 lmv: qos stay on current MDT if less full")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43445
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lu_object.h     |  6 ++++++
 fs/lustre/include/obd.h           | 10 +++++-----
 fs/lustre/lmv/lmv_obd.c           | 22 +++++++++++++++++++---
 fs/lustre/obdclass/lu_tgt_descs.c | 18 +++++++++++++-----
 4 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index 3a71d6b..b1d7577 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -1457,6 +1457,12 @@ struct lu_tgt_qos {
 };
 
 /* target descriptor */
+#define LOV_QOS_DEF_THRESHOLD_RR_PCT	17
+#define LMV_QOS_DEF_THRESHOLD_RR_PCT	5
+
+#define LOV_QOS_DEF_PRIO_FREE		90
+#define LMV_QOS_DEF_PRIO_FREE		90
+
 struct lu_tgt_desc {
 	union {
 		struct dt_device	*ltd_tgt;
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index efd4538..678953a 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -718,11 +718,11 @@ enum md_cli_flags {
 };
 
 enum md_op_code {
-	LUSTRE_OPC_MKDIR	= 0,
-	LUSTRE_OPC_SYMLINK	= 1,
-	LUSTRE_OPC_MKNOD	= 2,
-	LUSTRE_OPC_CREATE	= 3,
-	LUSTRE_OPC_ANY		= 5,
+	LUSTRE_OPC_MKDIR = 1,
+	LUSTRE_OPC_SYMLINK,
+	LUSTRE_OPC_MKNOD,
+	LUSTRE_OPC_CREATE,
+	LUSTRE_OPC_ANY,
 };
 
 /**
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 552ef07..fb89047 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1429,9 +1429,10 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 
 static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 {
-	struct lu_tgt_desc *tgt;
+	struct lu_tgt_desc *tgt, *cur = NULL;
 	u64 total_weight = 0;
 	u64 cur_weight = 0;
+	int total_usable = 0;
 	u64 rand;
 	int rc;
 
@@ -1452,15 +1453,30 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 	}
 
 	lmv_foreach_tgt(lmv, tgt) {
-		tgt->ltd_qos.ltq_usable = 0;
-		if (!tgt->ltd_exp || !tgt->ltd_active)
+		if (!tgt->ltd_exp || !tgt->ltd_active) {
+			tgt->ltd_qos.ltq_usable = 0;
 			continue;
+		}
 
 		tgt->ltd_qos.ltq_usable = 1;
 		lu_tgt_qos_weight_calc(tgt);
+		if (tgt->ltd_index == *mdt) {
+			cur = tgt;
+			cur_weight = tgt->ltd_qos.ltq_weight;
+		}
 		total_weight += tgt->ltd_qos.ltq_weight;
+		total_usable++;
+	}
+
+	/* if current MDT has higher-than-average space, stay on same MDT */
+	rand = total_weight / total_usable;
+	if (cur_weight >= rand) {
+		tgt = cur;
+		rc = 0;
+		goto unlock;
 	}
 
+	cur_weight = 0;
 	rand = lu_prandom_u64_max(total_weight);
 
 	lmv_foreach_connected_tgt(lmv, tgt) {
diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index 83f4675..2a2b30a 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -265,13 +265,21 @@ int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt)
 	init_rwsem(&ltd->ltd_qos.lq_rw_sem);
 	set_bit(LQ_DIRTY, &ltd->ltd_qos.lq_flags);
 	set_bit(LQ_RESET, &ltd->ltd_qos.lq_flags);
-	/* Default priority is toward free space balance */
-	ltd->ltd_qos.lq_prio_free = 232;
-	/* Default threshold for rr (roughly 17%) */
-	ltd->ltd_qos.lq_threshold_rr = 43;
 	ltd->ltd_is_mdt = is_mdt;
-	if (is_mdt)
+	/* MDT imbalance threshold is low to balance across MDTs
+	 * relatively quickly, because each directory may result
+	 * in a large number of files/subdirs created therein.
+	 */
+	if (is_mdt) {
 		ltd->ltd_lmv_desc.ld_pattern = LMV_HASH_TYPE_DEFAULT;
+		ltd->ltd_qos.lq_prio_free = LMV_QOS_DEF_PRIO_FREE * 256 / 100;
+		ltd->ltd_qos.lq_threshold_rr =
+			LMV_QOS_DEF_THRESHOLD_RR_PCT * 256 / 100;
+	} else {
+		ltd->ltd_qos.lq_prio_free = LOV_QOS_DEF_PRIO_FREE * 256 / 100;
+		ltd->ltd_qos.lq_threshold_rr =
+			LOV_QOS_DEF_THRESHOLD_RR_PCT * 256 / 100;
+	}
 
 	return 0;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 12/13] lnet: Correct the router ping interval calculation
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 11/13] lustre: lmv: qos stay on current MDT if less full James Simmons
@ 2021-05-15 13:06 ` James Simmons
  2021-05-15 13:06 ` [lustre-devel] [PATCH 13/13] lustre: llite: Introduce inode open heat counter James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The router ping interval is being divided by the number of local nets
which results in sending pings more frequently than defined by the
alive_router_check_interval. In addition, the current code is structured
such that we may not find a peer net in need of a ping until after
inspecting the router list multiple times. Re-work the code so that the
loop that inspects a router's peer nets will look at all of them until
it either loops back around the list or it finds one that actually
needs to be pinged.

We also move the check of LNET_PEER_RTR_DISCOVERY so that we avoid the
work of inspecting the router's peer nets if the router is already being
discovered.

HPE-bug-id: LUS-9237
WC-bug-id: https://jira.whamcloud.com/browse/LU-13912
Lustre-commit: 0131d39a622f1efc ("LU-13912 lnet: Correct the router ping interval calculation")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39694
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h |  4 +--
 net/lnet/lnet/router.c         | 57 ++++++++++++++++++++++++++----------------
 2 files changed, 37 insertions(+), 24 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index f199b15..d898066 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -798,8 +798,8 @@ struct lnet_peer_net {
 	/* peer net health */
 	int			lpn_healthv;
 
-	/* time of last router net check attempt */
-	time64_t		lpn_rtrcheck_timestamp;
+	/* time of next router ping on this net */
+	time64_t		lpn_next_ping;
 
 	/* selection sequence number */
 	u32			lpn_seq;
diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index e179997..9003d47 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -603,6 +603,7 @@ static void lnet_shuffle_seed(void)
 	unsigned int offset = 0;
 	unsigned int len = 0;
 	struct list_head *e;
+	time64_t now;
 
 	lnet_shuffle_seed();
 
@@ -623,9 +624,10 @@ static void lnet_shuffle_seed(void)
 	/* force a router check on the gateway to make sure the route is
 	 * alive
 	 */
+	now = ktime_get_real_seconds();
 	list_for_each_entry(lpn, &route->lr_gateway->lp_peer_nets,
 			    lpn_peer_nets) {
-		lpn->lpn_rtrcheck_timestamp = 0;
+		lpn->lpn_next_ping = now;
 	}
 
 	the_lnet.ln_remote_nets_version++;
@@ -1105,11 +1107,12 @@ bool lnet_router_checker_active(void)
 void
 lnet_check_routers(void)
 {
-	struct lnet_peer_net *first_lpn = NULL;
+	struct lnet_peer_net *first_lpn;
 	struct lnet_peer_net *lpn;
 	struct lnet_peer_ni *lpni;
 	struct lnet_peer *rtr;
 	bool push = false;
+	bool needs_ping;
 	bool found_lpn;
 	u64 version;
 	u32 net_id;
@@ -1122,14 +1125,18 @@ bool lnet_router_checker_active(void)
 	version = the_lnet.ln_routers_version;
 
 	list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) {
+		/* If we're currently discovering the peer then don't
+		 * issue another discovery
+		 */
+		if (rtr->lp_state & LNET_PEER_RTR_DISCOVERY)
+			continue;
+
 		now = ktime_get_real_seconds();
 
-		/* only discover the router if we've passed
-		 * alive_router_check_interval seconds. Some of the router
-		 * interfaces could be down and in that case they would be
-		 * undergoing recovery separately from this discovery.
-		 */
-		/* find next peer net which is also local */
+		/* find the next local peer net which needs to be ping'd */
+		needs_ping = false;
+		first_lpn = NULL;
+		found_lpn = false;
 		net_id = rtr->lp_disc_net_id;
 		do {
 			lpn = lnet_get_next_peer_net_locked(rtr, net_id);
@@ -1138,13 +1145,27 @@ bool lnet_router_checker_active(void)
 				       libcfs_nid2str(rtr->lp_primary_nid));
 				break;
 			}
+
+			/* We looped back to the first peer net */
 			if (first_lpn == lpn)
 				break;
 			if (!first_lpn)
 				first_lpn = lpn;
-			found_lpn = lnet_islocalnet_locked(lpn->lpn_net_id);
+
 			net_id = lpn->lpn_net_id;
-		} while (!found_lpn);
+			if (!lnet_islocalnet_locked(net_id))
+				continue;
+
+			found_lpn = true;
+
+			CDEBUG(D_NET, "rtr %s(%p) %s(%p) next ping %lld\n",
+			       libcfs_nid2str(rtr->lp_primary_nid), rtr,
+			       libcfs_net2str(net_id), lpn,
+			       lpn->lpn_next_ping);
+
+			needs_ping = now >= lpn->lpn_next_ping;
+
+		} while (!needs_ping);
 
 		if (!found_lpn || !lpn) {
 			CERROR("no local network found for gateway %s\n",
@@ -1152,18 +1173,10 @@ bool lnet_router_checker_active(void)
 			continue;
 		}
 
-		if (now - lpn->lpn_rtrcheck_timestamp <
-		    alive_router_check_interval / lnet_current_net_count)
+		if (!needs_ping)
 			continue;
 
-		/* If we're currently discovering the peer then don't
-		 * issue another discovery
-		 */
 		spin_lock(&rtr->lp_lock);
-		if (rtr->lp_state & LNET_PEER_RTR_DISCOVERY) {
-			spin_unlock(&rtr->lp_lock);
-			continue;
-		}
 		/* make sure we fully discover the router */
 		rtr->lp_state &= ~LNET_PEER_NIDS_UPTODATE;
 		rtr->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH |
@@ -1188,16 +1201,16 @@ bool lnet_router_checker_active(void)
 		       libcfs_nid2str(lpni->lpni_nid), cpt);
 		rc = lnet_discover_peer_locked(lpni, cpt, false);
 
-		/* decrement ref count acquired by find_peer_ni_locked() */
+		/* drop ref taken above */
 		lnet_peer_ni_decref_locked(lpni);
 
 		if (!rc)
-			lpn->lpn_rtrcheck_timestamp = now;
+			lpn->lpn_next_ping = now + alive_router_check_interval;
 		else
 			CERROR("Failed to discover router %s\n",
 			       libcfs_nid2str(rtr->lp_primary_nid));
 
-		/* NB dropped lock */
+		/* NB cpt lock was dropped in lnet_discover_peer_locked() */
 		if (version != the_lnet.ln_routers_version) {
 			/* the routers list has changed */
 			goto rescan;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [lustre-devel] [PATCH 13/13] lustre: llite: Introduce inode open heat counter
  2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-05-15 13:06 ` [lustre-devel] [PATCH 12/13] lnet: Correct the router ping interval calculation James Simmons
@ 2021-05-15 13:06 ` James Simmons
  12 siblings, 0 replies; 14+ messages in thread
From: James Simmons @ 2021-05-15 13:06 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

Initial framework to support detection of naive apps that
assume open-closes are "free" and proceed to open/close
same files between minute operations.

We will track number of file opens per inode and last time inode
was closed.

Initially we'll expose these controls:
llite/opencache_threshold_count - enables functionality and controls after how
                                  many opens open lock is requested
llite/opencache_threshold_ms    - if any reopen happens within this time (in
                                  ms), the open would trigger open lock request
llite/opencache_max_ms          - If last close was longer than this many ms
                                  ago - start counting opens from zero again

Once enough useful data is collected we can look into adding a heatmap
or another similar mechanism to better manage it and enable it
by default with sensible settings.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10948
Lustre-commit: 41d99c4902836b726 ("LU-10948 llite: Introduce inode open heat counter")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/32158
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c           |  70 ++++++++++++++++++++----
 fs/lustre/llite/llite_internal.h |  32 +++++++++--
 fs/lustre/llite/llite_lib.c      |   5 ++
 fs/lustre/llite/lproc_llite.c    | 112 +++++++++++++++++++++++++++++++++++++--
 fs/lustre/llite/namei.c          |   7 +++
 5 files changed, 210 insertions(+), 16 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index ffddec6..26aa7be 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -43,6 +43,7 @@
 #include <linux/sched.h>
 #include <linux/mount.h>
 #include <linux/falloc.h>
+#include <linux/ktime.h>
 
 #include <uapi/linux/lustre/lustre_fiemap.h>
 #include <uapi/linux/lustre/lustre_ioctl.h>
@@ -414,6 +415,8 @@ int ll_file_release(struct inode *inode, struct file *file)
 		lli->lli_async_rc = 0;
 	}
 
+	lli->lli_close_fd_time = ktime_get();
+
 	rc = ll_md_close(inode, file);
 
 	if (CFS_FAIL_TIMEOUT_MS(OBD_FAIL_PTLRPC_DUMP_LOG, cfs_fail_val))
@@ -745,6 +748,29 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
 	return 0;
 }
 
+void ll_track_file_opens(struct inode *inode)
+{
+	struct ll_inode_info *lli = ll_i2info(inode);
+	struct ll_sb_info *sbi = ll_i2sbi(inode);
+
+	/* do not skew results with delays from never-opened inodes */
+	if (ktime_to_ns(lli->lli_close_fd_time))
+		ll_stats_ops_tally(sbi, LPROC_LL_INODE_OPCLTM,
+			   ktime_us_delta(ktime_get(), lli->lli_close_fd_time));
+
+	if (ktime_after(ktime_get(),
+			ktime_add_ms(lli->lli_close_fd_time,
+				     sbi->ll_oc_max_ms))) {
+		lli->lli_open_fd_count = 1;
+		lli->lli_close_fd_time = ns_to_ktime(0);
+	} else {
+		lli->lli_open_fd_count++;
+	}
+
+	ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_INODE_OCOUNT,
+			   lli->lli_open_fd_count);
+}
+
 /* Open a file, and (for the very first open) create objects on the OSTs at
  * this time.  If opened with O_LOV_DELAY_CREATE, then we don't do the object
  * creation or open until ll_lov_setstripe() ioctl is called.
@@ -791,6 +817,7 @@ int ll_file_open(struct inode *inode, struct file *file)
 	if (S_ISDIR(inode->i_mode))
 		ll_authorize_statahead(inode, fd);
 
+	ll_track_file_opens(inode);
 	if (is_root_inode(inode)) {
 		file->private_data = fd;
 		return 0;
@@ -868,6 +895,7 @@ int ll_file_open(struct inode *inode, struct file *file)
 		LASSERT(*och_usecount == 0);
 		if (!it->it_disposition) {
 			struct dentry *dentry = file_dentry(file);
+			struct ll_sb_info *sbi = ll_i2sbi(inode);
 			struct ll_dentry_data *ldd;
 
 			/* We cannot just request lock handle now, new ELC code
@@ -884,20 +912,42 @@ int ll_file_open(struct inode *inode, struct file *file)
 			 *    handle to be returned from LOOKUP|OPEN request,
 			 *    for example if the target entry was a symlink.
 			 *
-			 * Only fetch MDS_OPEN_LOCK if this is in NFS path,
-			 * marked by a bit set in ll_iget_for_nfs. Clear the
-			 * bit so that it's not confusing later callers.
+			 * In NFS path we know there's pathologic behavior
+			 * so we always enable open lock caching when coming
+			 * from there. It's detected by setting a flag in
+			 * ll_iget_for_nfs.
 			 *
-			 * NB; when ldd is NULL, it must have come via normal
-			 * lookup path only, since ll_iget_for_nfs always calls
-			 * ll_d_init().
+			 * After reaching number of opens of this inode
+			 * we always ask for an open lock on it to handle
+			 * bad userspace actors that open and close files
+			 * in a loop for absolutely no good reason
 			 */
 			ldd = ll_d2d(dentry);
-			if (ldd && ldd->lld_nfs_dentry) {
+			if (filename_is_volatile(dentry->d_name.name,
+						 dentry->d_name.len,
+						 NULL)) {
+				/* There really is nothing here, but this
+				 * make this more readable I think.
+				 * We do not want openlock for volatile
+				 * files under any circumstances
+				 */
+			} else if (ldd && ldd->lld_nfs_dentry) {
+				/* NFS path. This also happens to catch
+				 * open by fh files I guess
+				 */
+				it->it_flags |= MDS_OPEN_LOCK;
+				/* clear the flag for future lookups */
 				ldd->lld_nfs_dentry = 0;
-				if (!filename_is_volatile(dentry->d_name.name,
-							  dentry->d_name.len,
-							  NULL))
+			} else if (sbi->ll_oc_thrsh_count > 0) {
+				/* Take MDS_OPEN_LOCK with many opens */
+				if (lli->lli_open_fd_count >=
+				    sbi->ll_oc_thrsh_count)
+					it->it_flags |= MDS_OPEN_LOCK;
+
+				/* If this is open after we just closed */
+				else if (ktime_before(ktime_get(),
+						      ktime_add_ms(lli->lli_close_fd_time,
+								   sbi->ll_oc_thrsh_ms)))
 					it->it_flags |= MDS_OPEN_LOCK;
 			}
 
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 03d2796..72aa564 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -137,9 +137,15 @@ struct ll_inode_info {
 	struct obd_client_handle       *lli_mds_read_och;
 	struct obd_client_handle       *lli_mds_write_och;
 	struct obd_client_handle       *lli_mds_exec_och;
-	u64			   lli_open_fd_read_count;
-	u64			   lli_open_fd_write_count;
-	u64			   lli_open_fd_exec_count;
+	u64				lli_open_fd_read_count;
+	u64				lli_open_fd_write_count;
+	u64				lli_open_fd_exec_count;
+
+	/* Number of times this inode was opened */
+	u64				lli_open_fd_count;
+	/* When last close was performed on this inode */
+	ktime_t				lli_close_fd_time;
+
 	/* Protects access to och pointers and their usage counters */
 	struct mutex			lli_och_mutex;
 
@@ -765,6 +771,19 @@ struct ll_sb_info {
 	unsigned int		ll_heat_decay_weight;
 	unsigned int		ll_heat_period_second;
 
+	/* Opens of the same inode before we start requesting open lock */
+	u32			  ll_oc_thrsh_count;
+
+	/* Time in ms between last inode close and next open to be considered
+	 * instant back to back and would trigger an open lock request
+	 */
+	u32			  ll_oc_thrsh_ms;
+
+	/* Time in ms after last file close that we no longer count prior
+	 * opens
+	 */
+	u32			  ll_oc_max_ms;
+
 	/* filesystem fsname */
 	char			ll_fsname[LUSTRE_MAXFSNAME + 1];
 
@@ -788,6 +807,10 @@ struct ll_sb_info {
 #define SBI_DEFAULT_HEAT_DECAY_WEIGHT	((80 * 256 + 50) / 100)
 #define SBI_DEFAULT_HEAT_PERIOD_SECOND	(60)
 
+#define SBI_DEFAULT_OPENCACHE_THRESHOLD_COUNT	(5)
+#define SBI_DEFAULT_OPENCACHE_THRESHOLD_MS	(100) /* 0.1 second */
+#define SBI_DEFAULT_OPENCACHE_THRESHOLD_MAX_MS	(60000) /* 1 minute */
+
 /*
  * per file-descriptor read-ahead data.
  */
@@ -1029,6 +1052,8 @@ enum {
 	LPROC_LL_REMOVEXATTR,
 	LPROC_LL_INODE_PERM,
 	LPROC_LL_FALLOCATE,
+	LPROC_LL_INODE_OCOUNT,
+	LPROC_LL_INODE_OPCLTM,
 	LPROC_LL_FILE_OPCODES
 };
 
@@ -1088,6 +1113,7 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, u64 bits,
 int ll_file_release(struct inode *inode, struct file *file);
 int ll_release_openhandle(struct inode *inode, struct lookup_intent *it);
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
+void ll_track_file_opens(struct inode *inode);
 int ll_getattr(const struct path *path, struct kstat *stat,
 	       u32 request_mask, unsigned int flags);
 int ll_getattr_dentry(struct dentry *de, struct kstat *stat, u32 request_mask,
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index ada2b625c..0c914c9 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -190,6 +190,11 @@ static struct ll_sb_info *ll_init_sbi(void)
 	/* Per-filesystem file heat */
 	sbi->ll_heat_decay_weight = SBI_DEFAULT_HEAT_DECAY_WEIGHT;
 	sbi->ll_heat_period_second = SBI_DEFAULT_HEAT_PERIOD_SECOND;
+
+	/* Per-fs open heat level before requesting open lock */
+	sbi->ll_oc_thrsh_count = SBI_DEFAULT_OPENCACHE_THRESHOLD_COUNT;
+	sbi->ll_oc_max_ms = SBI_DEFAULT_OPENCACHE_THRESHOLD_MAX_MS;
+	sbi->ll_oc_thrsh_ms = SBI_DEFAULT_OPENCACHE_THRESHOLD_MS;
 	return sbi;
 out_destroy_ra:
 	kfree(sbi->ll_foreign_symlink_upcall);
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 16d1497..cd8394c 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1369,6 +1369,105 @@ static ssize_t heat_period_second_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(heat_period_second);
 
+static ssize_t opencache_threshold_count_show(struct kobject *kobj,
+					      struct attribute *attr,
+					      char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	if (sbi->ll_oc_thrsh_count)
+		return snprintf(buf, PAGE_SIZE, "%u\n",
+				sbi->ll_oc_thrsh_count);
+	else
+		return snprintf(buf, PAGE_SIZE, "off\n");
+}
+
+static ssize_t opencache_threshold_count_store(struct kobject *kobj,
+					       struct attribute *attr,
+					       const char *buffer,
+					       size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	unsigned int val;
+	int rc;
+
+	rc = kstrtouint(buffer, 10, &val);
+	if (rc) {
+		bool enable;
+		/* also accept "off" to disable and "on" to always cache */
+		rc = kstrtobool(buffer, &enable);
+		if (rc)
+			return rc;
+		val = enable;
+	}
+	sbi->ll_oc_thrsh_count = val;
+
+	return count;
+}
+LUSTRE_RW_ATTR(opencache_threshold_count);
+
+static ssize_t opencache_threshold_ms_show(struct kobject *kobj,
+					   struct attribute *attr,
+					   char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_oc_thrsh_ms);
+}
+
+static ssize_t opencache_threshold_ms_store(struct kobject *kobj,
+					    struct attribute *attr,
+					    const char *buffer,
+					    size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	unsigned int val;
+	int rc;
+
+	rc = kstrtouint(buffer, 10, &val);
+	if (rc)
+		return rc;
+
+	sbi->ll_oc_thrsh_ms = val;
+
+	return count;
+}
+LUSTRE_RW_ATTR(opencache_threshold_ms);
+
+static ssize_t opencache_max_ms_show(struct kobject *kobj,
+				     struct attribute *attr,
+				     char *buf)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n", sbi->ll_oc_max_ms);
+}
+
+static ssize_t opencache_max_ms_store(struct kobject *kobj,
+				      struct attribute *attr,
+				      const char *buffer,
+				      size_t count)
+{
+	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
+					      ll_kset.kobj);
+	unsigned int val;
+	int rc;
+
+	rc = kstrtouint(buffer, 10, &val);
+	if (rc)
+		return rc;
+
+	sbi->ll_oc_max_ms = val;
+
+	return count;
+}
+LUSTRE_RW_ATTR(opencache_max_ms);
+
 static int ll_unstable_stats_seq_show(struct seq_file *m, void *v)
 {
 	struct super_block *sb = m->private;
@@ -1568,6 +1667,8 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = {
 	&lustre_attr_max_read_ahead_mb.attr,
 	&lustre_attr_max_read_ahead_per_file_mb.attr,
 	&lustre_attr_max_read_ahead_whole_mb.attr,
+	&lustre_attr_max_read_ahead_async_active.attr,
+	&lustre_attr_read_ahead_async_file_threshold_mb.attr,
 	&lustre_attr_read_ahead_range_kb.attr,
 	&lustre_attr_checksums.attr,
 	&lustre_attr_checksum_pages.attr,
@@ -1587,8 +1688,9 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = {
 	&lustre_attr_file_heat.attr,
 	&lustre_attr_heat_decay_percentage.attr,
 	&lustre_attr_heat_period_second.attr,
-	&lustre_attr_max_read_ahead_async_active.attr,
-	&lustre_attr_read_ahead_async_file_threshold_mb.attr,
+	&lustre_attr_opencache_threshold_count.attr,
+	&lustre_attr_opencache_threshold_ms.attr,
+	&lustre_attr_opencache_max_ms.attr,
 	NULL,
 };
 
@@ -1624,12 +1726,16 @@ static void sbi_kobj_release(struct kobject *kobj)
 	{ LPROC_LL_LLSEEK,	LPROCFS_TYPE_LATENCY,	"seek" },
 	{ LPROC_LL_FSYNC,	LPROCFS_TYPE_LATENCY,	"fsync" },
 	{ LPROC_LL_READDIR,	LPROCFS_TYPE_LATENCY,	"readdir" },
+	{ LPROC_LL_INODE_OCOUNT, LPROCFS_TYPE_REQS |
+				 LPROCFS_CNTR_AVGMINMAX |
+				 LPROCFS_CNTR_STDDEV,	"opencount" },
+	{ LPROC_LL_INODE_OPCLTM, LPROCFS_TYPE_LATENCY,	"openclosetime" },
 	/* inode operation */
 	{ LPROC_LL_SETATTR,	LPROCFS_TYPE_LATENCY,	"setattr" },
 	{ LPROC_LL_TRUNC,	LPROCFS_TYPE_LATENCY,	"truncate" },
 	{ LPROC_LL_FLOCK,	LPROCFS_TYPE_LATENCY,	"flock" },
 	{ LPROC_LL_GETATTR,	LPROCFS_TYPE_LATENCY,	"getattr" },
-	{ LPROC_LL_FALLOCATE,	 LPROCFS_TYPE_LATENCY,	"fallocate" },
+	{ LPROC_LL_FALLOCATE,	LPROCFS_TYPE_LATENCY,	"fallocate" },
 	/* dir inode operation */
 	{ LPROC_LL_CREATE,	LPROCFS_TYPE_LATENCY,	"create" },
 	{ LPROC_LL_LINK,	LPROCFS_TYPE_LATENCY,	"link" },
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 6ed2943..f5f34b0 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -1148,6 +1148,13 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 
 	OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_CREATE_FILE_PAUSE2, cfs_fail_val);
 
+	/* We can only arrive at this path when we have no inode, so
+	 * we only need to request open lock if it was requested
+	 * for every open
+	 */
+	if (ll_i2sbi(dir)->ll_oc_thrsh_count == 1)
+		it->it_flags |= MDS_OPEN_LOCK;
+
 	/* Dentry added to dcache tree in ll_lookup_it */
 	de = ll_lookup_it(dir, dentry, it, &secctx, &secctxlen, &pca, encrypt,
 			  &encctx, &encctxlen);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-15 13:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15 13:05 [lustre-devel] [PATCH 00/13] lustre: sync to OpenSFS tree as of May 14, 2021 James Simmons
2021-05-15 13:05 ` [lustre-devel] [PATCH 01/13] lnet: Allow delayed sends James Simmons
2021-05-15 13:05 ` [lustre-devel] [PATCH 02/13] lustre: lov: correctly handling sub-lock init failure James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 03/13] lnet: Local NI must be on same net as next-hop James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 04/13] lnet: socklnd: add conns_per_peer parameter James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 05/13] lustre: readahead: export pages directly without RA James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 06/13] lustre: readahead: fix reserving for unaliged read James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 07/13] lustre: sec: rework includes for client encryption James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 08/13] lustre: ptlrpc: remove might_sleep() in sptlrpc_gc_del_sec() James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 09/13] lustre; obdclass: server qos penalty miscaculated James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 10/13] lustre: lmv: add default LMV inherit depth James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 11/13] lustre: lmv: qos stay on current MDT if less full James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 12/13] lnet: Correct the router ping interval calculation James Simmons
2021-05-15 13:06 ` [lustre-devel] [PATCH 13/13] lustre: llite: Introduce inode open heat counter James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).