All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] NFS/RDMA server for v4.17
@ 2018-03-27 14:49 Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 01/18] sunrpc: Remove unneeded pointer dereference Chuck Lever
                   ` (17 more replies)
  0 siblings, 18 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:49 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Hi Bruce-

Here's a revision of the first two "for v4.17" patchsets I sent you
a couple of weeks ago. This series includes fixes for NFSD and
server-side RPC trace points, and the write and symlink XDR decoder
clean up patches.


Changes since v1:
- svc_process latency report no longer includes transport latency
- new separate report of transport queuing latency
- fix bug when reading holes over NFSv4
- NFSv4-related hunks removed from the XDR decoder clean up patches

---

Chuck Lever (18):
      sunrpc: Remove unneeded pointer dereference
      svc: Simplify ->xpo_secure_port
      sunrpc: Update show_svc_xprt_flags() to include recently added flags
      sunrpc: Move trace_svc_xprt_dequeue()
      sunrpc: Simplify do_enqueue tracing
      sunrpc: Simplify trace_svc_recv
      sunrpc: Save remote presentation address in svc_xprt for trace events
      sunrpc: Re-purpose trace_svc_process
      sunrpc: Report per-RPC execution stats
      svc: Report xprt dequeue latency
      nfsd: Fix NFSD trace points
      nfsd: Record request byte count, not count of vectors
      nfsd: Add "nfsd_" to trace point names
      nfsd: Add I/O trace points in the NFSv4 write path
      nfsd: Add I/O trace points in the NFSv4 read proc
      nfsd: Trace NFSv4 COMPOUND execution
      NFSD: Clean up legacy NFS WRITE argument XDR decoders
      NFSD: Clean up legacy NFS SYMLINK argument XDR decoders


 fs/nfsd/nfs3proc.c                       |   18 ++-
 fs/nfsd/nfs3xdr.c                        |   67 ++--------
 fs/nfsd/nfs4layouts.c                    |   16 +-
 fs/nfsd/nfs4proc.c                       |   25 ++--
 fs/nfsd/nfs4xdr.c                        |   56 ++++++--
 fs/nfsd/nfsproc.c                        |   23 ++-
 fs/nfsd/nfsxdr.c                         |   63 +++++----
 fs/nfsd/trace.h                          |   98 +++++++++++++-
 fs/nfsd/vfs.c                            |   85 ++++++-------
 fs/nfsd/vfs.h                            |    8 +
 fs/nfsd/xdr.h                            |    3 
 fs/nfsd/xdr3.h                           |    3 
 fs/nfsd/xdr4.h                           |    2 
 include/linux/sunrpc/svc.h               |    6 +
 include/linux/sunrpc/svc_xprt.h          |    6 +
 include/trace/events/sunrpc.h            |  204 ++++++++++++++++++------------
 net/sunrpc/svc.c                         |  118 ++++++++++++++++-
 net/sunrpc/svc_xprt.c                    |   34 ++---
 net/sunrpc/svcsock.c                     |    8 +
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   10 +
 20 files changed, 545 insertions(+), 308 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/18] sunrpc: Remove unneeded pointer dereference
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
@ 2018-03-27 14:49 ` Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 02/18] svc: Simplify ->xpo_secure_port Chuck Lever
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:49 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Noticed during code inspection that there is already a
local automatic variable "xprt" so dereferencing rqst->rq_xprt
again is unnecessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc_xprt.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f9307bd..6dca0f5 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -492,7 +492,7 @@ static void svc_xprt_release(struct svc_rqst *rqstp)
 {
 	struct svc_xprt	*xprt = rqstp->rq_xprt;
 
-	rqstp->rq_xprt->xpt_ops->xpo_release_rqst(rqstp);
+	xprt->xpt_ops->xpo_release_rqst(rqstp);
 
 	kfree(rqstp->rq_deferred);
 	rqstp->rq_deferred = NULL;
@@ -889,7 +889,7 @@ int svc_send(struct svc_rqst *rqstp)
 		goto out;
 
 	/* release the receive skb before sending the reply */
-	rqstp->rq_xprt->xpt_ops->xpo_release_rqst(rqstp);
+	xprt->xpt_ops->xpo_release_rqst(rqstp);
 
 	/* calculate over-all length */
 	xb = &rqstp->rq_res;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/18] svc: Simplify ->xpo_secure_port
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 01/18] sunrpc: Remove unneeded pointer dereference Chuck Lever
@ 2018-03-27 14:49 ` Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 03/18] sunrpc: Update show_svc_xprt_flags() to include recently added flags Chuck Lever
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:49 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Instead of returning a value that is used to set or clear
a bit, just make ->xpo_secure_port mangle that bit, and return void.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_xprt.h          |    2 +-
 net/sunrpc/svc_xprt.c                    |    5 +----
 net/sunrpc/svcsock.c                     |    7 +++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 +++---
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 1caf7bc..19475ac 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -25,7 +25,7 @@ struct svc_xprt_ops {
 	void		(*xpo_release_rqst)(struct svc_rqst *);
 	void		(*xpo_detach)(struct svc_xprt *);
 	void		(*xpo_free)(struct svc_xprt *);
-	int		(*xpo_secure_port)(struct svc_rqst *);
+	void		(*xpo_secure_port)(struct svc_rqst *rqstp);
 	void		(*xpo_kill_temp_xprt)(struct svc_xprt *);
 };
 
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 6dca0f5..4e3b4c5 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -844,10 +844,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
 
 	clear_bit(XPT_OLD, &xprt->xpt_flags);
 
-	if (xprt->xpt_ops->xpo_secure_port(rqstp))
-		set_bit(RQ_SECURE, &rqstp->rq_flags);
-	else
-		clear_bit(RQ_SECURE, &rqstp->rq_flags);
+	xprt->xpt_ops->xpo_secure_port(rqstp);
 	rqstp->rq_chandle.defer = svc_defer;
 	rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);
 
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 943f2a7..9b67035 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -391,9 +391,12 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
 	release_sock(sock->sk);
 }
 
-static int svc_sock_secure_port(struct svc_rqst *rqstp)
+static void svc_sock_secure_port(struct svc_rqst *rqstp)
 {
-	return svc_port_is_privileged(svc_addr(rqstp));
+	if (svc_port_is_privileged(svc_addr(rqstp)))
+		set_bit(RQ_SECURE, &rqstp->rq_flags);
+	else
+		clear_bit(RQ_SECURE, &rqstp->rq_flags);
 }
 
 /*
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 7b2f4d3..17da06d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -69,7 +69,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 static void svc_rdma_detach(struct svc_xprt *xprt);
 static void svc_rdma_free(struct svc_xprt *xprt);
 static int svc_rdma_has_wspace(struct svc_xprt *xprt);
-static int svc_rdma_secure_port(struct svc_rqst *);
+static void svc_rdma_secure_port(struct svc_rqst *);
 static void svc_rdma_kill_temp_xprt(struct svc_xprt *);
 
 static const struct svc_xprt_ops svc_rdma_ops = {
@@ -988,9 +988,9 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt)
 	return 1;
 }
 
-static int svc_rdma_secure_port(struct svc_rqst *rqstp)
+static void svc_rdma_secure_port(struct svc_rqst *rqstp)
 {
-	return 1;
+	set_bit(RQ_SECURE, &rqstp->rq_flags);
 }
 
 static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 03/18] sunrpc: Update show_svc_xprt_flags() to include recently added flags
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 01/18] sunrpc: Remove unneeded pointer dereference Chuck Lever
  2018-03-27 14:49 ` [PATCH v2 02/18] svc: Simplify ->xpo_secure_port Chuck Lever
@ 2018-03-27 14:49 ` Chuck Lever
  2018-03-27 14:50 ` [PATCH v2 04/18] sunrpc: Move trace_svc_xprt_dequeue() Chuck Lever
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:49 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

XPT_KILL_TEMP was added by commit 546125d16142 ("sunrpc: don't call
sleeping functions from the notifier block callbacks"), and
XPT_CONG_CTRL was added by commit 362142b25843 ("sunrpc: flag
transports as having congestion control") .

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/sunrpc.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 970c91a..a231444 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -591,7 +591,9 @@
 		{ (1UL << XPT_OLD),		"XPT_OLD"},		\
 		{ (1UL << XPT_LISTENER),	"XPT_LISTENER"},	\
 		{ (1UL << XPT_CACHE_AUTH),	"XPT_CACHE_AUTH"},	\
-		{ (1UL << XPT_LOCAL),		"XPT_LOCAL"})
+		{ (1UL << XPT_LOCAL),		"XPT_LOCAL"},		\
+		{ (1UL << XPT_KILL_TEMP),	"XPT_KILL_TEMP"},	\
+		{ (1UL << XPT_CONG_CTRL),	"XPT_CONG_CTRL"})
 
 TRACE_EVENT(svc_xprt_do_enqueue,
 	TP_PROTO(struct svc_xprt *xprt, struct svc_rqst *rqst),


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 04/18] sunrpc: Move trace_svc_xprt_dequeue()
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (2 preceding siblings ...)
  2018-03-27 14:49 ` [PATCH v2 03/18] sunrpc: Update show_svc_xprt_flags() to include recently added flags Chuck Lever
@ 2018-03-27 14:50 ` Chuck Lever
  2018-03-27 14:50 ` [PATCH v2 05/18] sunrpc: Simplify do_enqueue tracing Chuck Lever
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:50 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Reduce the amount of noise generated by trace_svc_xprt_dequeue by
moving it to the end of svc_get_next_xprt. This generates exactly
one trace event when a ready xprt is found, rather than spurious
events when there is no work to do. The empty events contain no
information that can't be obtained simply by tracing function calls
to svc_xprt_dequeue.

A small additional benefit is simplification of the svc_xprt_event
trace class, which no longer has to handle the case when the @xprt
parameter is NULL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/sunrpc.h |   16 +++++-----------
 net/sunrpc/svc_xprt.c         |    5 +----
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index a231444..9bba307 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -634,24 +634,18 @@
 	TP_STRUCT__entry(
 		__field(struct svc_xprt *, xprt)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, xprt != NULL ?
-			xprt->xpt_remotelen : 0)
+		__dynamic_array(unsigned char, addr, xprt->xpt_remotelen)
 	),
 
 	TP_fast_assign(
 		__entry->xprt = xprt;
-		if (xprt) {
-			memcpy(__get_dynamic_array(addr),
-					&xprt->xpt_remote,
-					xprt->xpt_remotelen);
-			__entry->flags = xprt->xpt_flags;
-		} else
-			__entry->flags = 0;
+		__entry->flags = xprt->xpt_flags;
+		memcpy(__get_dynamic_array(addr),
+		       &xprt->xpt_remote, xprt->xpt_remotelen);
 	),
 
 	TP_printk("xprt=0x%p addr=%pIScp flags=%s", __entry->xprt,
-		__get_dynamic_array_len(addr) != 0 ?
-			(struct sockaddr *)__get_dynamic_array(addr) : NULL,
+		(struct sockaddr *)__get_dynamic_array(addr),
 		show_svc_xprt_flags(__entry->flags))
 );
 
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 4e3b4c5..71f4718 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -454,13 +454,9 @@ static struct svc_xprt *svc_xprt_dequeue(struct svc_pool *pool)
 					struct svc_xprt, xpt_ready);
 		list_del_init(&xprt->xpt_ready);
 		svc_xprt_get(xprt);
-
-		dprintk("svc: transport %p dequeued, inuse=%d\n",
-			xprt, kref_read(&xprt->xpt_ref));
 	}
 	spin_unlock_bh(&pool->sp_lock);
 out:
-	trace_svc_xprt_dequeue(xprt);
 	return xprt;
 }
 
@@ -734,6 +730,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
 		rqstp->rq_chandle.thread_wait = 5*HZ;
 	else
 		rqstp->rq_chandle.thread_wait = 1*HZ;
+	trace_svc_xprt_dequeue(rqstp->rq_xprt);
 	return rqstp->rq_xprt;
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/18] sunrpc: Simplify do_enqueue tracing
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (3 preceding siblings ...)
  2018-03-27 14:50 ` [PATCH v2 04/18] sunrpc: Move trace_svc_xprt_dequeue() Chuck Lever
@ 2018-03-27 14:50 ` Chuck Lever
  2018-03-27 14:50 ` [PATCH v2 06/18] sunrpc: Simplify trace_svc_recv Chuck Lever
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:50 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

There are three cases where svc_xprt_do_enqueue() returns without
waking an nfsd thread:

1. There is no work to do

2. The transport is already busy

3. There are no available nfsd threads

Only 3. is truly interesting. Move the trace point so it records
that there was work to do and either an nfsd thread was awoken, or
a free one could not found.

As an additional clean up, remove a redundant comment and a couple
of dprintk call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc_xprt.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 71f4718..5fe150c 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -382,25 +382,21 @@ void svc_xprt_do_enqueue(struct svc_xprt *xprt)
 	int cpu;
 
 	if (!svc_xprt_has_something_to_do(xprt))
-		goto out;
+		return;
 
 	/* Mark transport as busy. It will remain in this state until
 	 * the provider calls svc_xprt_received. We update XPT_BUSY
 	 * atomically because it also guards against trying to enqueue
 	 * the transport twice.
 	 */
-	if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags)) {
-		/* Don't enqueue transport while already enqueued */
-		dprintk("svc: transport %p busy, not enqueued\n", xprt);
-		goto out;
-	}
+	if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
+		return;
 
 	cpu = get_cpu();
 	pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
 
 	atomic_long_inc(&pool->sp_stats.packets);
 
-	dprintk("svc: transport %p put into queue\n", xprt);
 	spin_lock_bh(&pool->sp_lock);
 	list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
 	pool->sp_stats.sockets_queued++;
@@ -420,7 +416,6 @@ void svc_xprt_do_enqueue(struct svc_xprt *xprt)
 out_unlock:
 	rcu_read_unlock();
 	put_cpu();
-out:
 	trace_svc_xprt_do_enqueue(xprt, rqstp);
 }
 EXPORT_SYMBOL_GPL(svc_xprt_do_enqueue);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/18] sunrpc: Simplify trace_svc_recv
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (4 preceding siblings ...)
  2018-03-27 14:50 ` [PATCH v2 05/18] sunrpc: Simplify do_enqueue tracing Chuck Lever
@ 2018-03-27 14:50 ` Chuck Lever
  2018-03-27 14:51 ` [PATCH v2 07/18] sunrpc: Save remote presentation address in svc_xprt for trace events Chuck Lever
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:50 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

There doesn't seem to be a lot of value in calling trace_svc_recv
in the failing case.

1. There are two very common cases: one is the transport is not
ready, and the other is shutdown. Neither is terribly interesting.

2. The trace record for the failing case contains nothing but
the status code.

Therefore the trace point call site in the error exit is removed.
Since the trace point is now recording a length instead of a
status, rename the status field and remove the case that records a
zero XID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/sunrpc.h |   14 +++++++-------
 net/sunrpc/svc_xprt.c         |    1 -
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 9bba307..5849bfb 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -485,28 +485,28 @@
 		{ (1UL << RQ_BUSY),		"RQ_BUSY"})
 
 TRACE_EVENT(svc_recv,
-	TP_PROTO(struct svc_rqst *rqst, int status),
+	TP_PROTO(struct svc_rqst *rqst, int len),
 
-	TP_ARGS(rqst, status),
+	TP_ARGS(rqst, len),
 
 	TP_STRUCT__entry(
 		__field(u32, xid)
-		__field(int, status)
+		__field(int, len)
 		__field(unsigned long, flags)
 		__dynamic_array(unsigned char, addr, rqst->rq_addrlen)
 	),
 
 	TP_fast_assign(
-		__entry->xid = status > 0 ? be32_to_cpu(rqst->rq_xid) : 0;
-		__entry->status = status;
+		__entry->xid = be32_to_cpu(rqst->rq_xid);
+		__entry->len = len;
 		__entry->flags = rqst->rq_flags;
 		memcpy(__get_dynamic_array(addr),
 			&rqst->rq_addr, rqst->rq_addrlen);
 	),
 
-	TP_printk("addr=%pIScp xid=0x%08x status=%d flags=%s",
+	TP_printk("addr=%pIScp xid=0x%08x len=%d flags=%s",
 			(struct sockaddr *)__get_dynamic_array(addr),
-			__entry->xid, __entry->status,
+			__entry->xid, __entry->len,
 			show_rqstp_flags(__entry->flags))
 );
 
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 5fe150c..47384d0 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -848,7 +848,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
 	rqstp->rq_res.len = 0;
 	svc_xprt_release(rqstp);
 out:
-	trace_svc_recv(rqstp, err);
 	return err;
 }
 EXPORT_SYMBOL_GPL(svc_recv);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/18] sunrpc: Save remote presentation address in svc_xprt for trace events
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (5 preceding siblings ...)
  2018-03-27 14:50 ` [PATCH v2 06/18] sunrpc: Simplify trace_svc_recv Chuck Lever
@ 2018-03-27 14:51 ` Chuck Lever
  2018-03-27 14:51 ` [PATCH v2 08/18] sunrpc: Re-purpose trace_svc_process Chuck Lever
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:51 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.

My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.

To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.

The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_xprt.h          |    4 +
 include/trace/events/sunrpc.h            |   89 +++++++++++-------------------
 net/sunrpc/svc_xprt.c                    |    3 +
 net/sunrpc/svcsock.c                     |    1 
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    4 +
 5 files changed, 43 insertions(+), 58 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 19475ac..c3d7206 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -83,6 +83,7 @@ struct svc_xprt {
 	size_t			xpt_locallen;	/* length of address */
 	struct sockaddr_storage	xpt_remote;	/* remote peer's address */
 	size_t			xpt_remotelen;	/* length of address */
+	char			xpt_remotebuf[INET6_ADDRSTRLEN + 10];
 	struct rpc_wait_queue	xpt_bc_pending;	/* backchannel wait queue */
 	struct list_head	xpt_users;	/* callbacks on free */
 
@@ -152,7 +153,10 @@ static inline void svc_xprt_set_remote(struct svc_xprt *xprt,
 {
 	memcpy(&xprt->xpt_remote, sa, salen);
 	xprt->xpt_remotelen = salen;
+	snprintf(xprt->xpt_remotebuf, sizeof(xprt->xpt_remotebuf) - 1,
+		 "%pISpc", sa);
 }
+
 static inline unsigned short svc_addr_port(const struct sockaddr *sa)
 {
 	const struct sockaddr_in *sin = (const struct sockaddr_in *)sa;
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 5849bfb..1ec8c4c 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -493,20 +493,18 @@
 		__field(u32, xid)
 		__field(int, len)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, rqst->rq_addrlen)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqst->rq_xid);
 		__entry->len = len;
 		__entry->flags = rqst->rq_flags;
-		memcpy(__get_dynamic_array(addr),
-			&rqst->rq_addr, rqst->rq_addrlen);
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
 	),
 
-	TP_printk("addr=%pIScp xid=0x%08x len=%d flags=%s",
-			(struct sockaddr *)__get_dynamic_array(addr),
-			__entry->xid, __entry->len,
+	TP_printk("addr=%s xid=0x%08x len=%d flags=%s",
+			__get_str(addr), __entry->xid, __entry->len,
 			show_rqstp_flags(__entry->flags))
 );
 
@@ -519,20 +517,18 @@
 	TP_STRUCT__entry(
 		__field(u32, xid)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, rqst->rq_addrlen)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqst->rq_xid);
 		__entry->flags = rqst->rq_flags;
-		memcpy(__get_dynamic_array(addr),
-			&rqst->rq_addr, rqst->rq_addrlen);
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
 	),
 
-	TP_printk("addr=%pIScp rq_xid=0x%08x flags=%s",
-		(struct sockaddr *)__get_dynamic_array(addr),
-		__entry->xid,
-		show_rqstp_flags(__entry->flags))
+	TP_printk("addr=%s xid=0x%08x flags=%s",
+			__get_str(addr), __entry->xid,
+			show_rqstp_flags(__entry->flags))
 );
 
 DEFINE_EVENT(svc_rqst_event, svc_defer,
@@ -553,21 +549,19 @@
 		__field(u32, xid)
 		__field(int, status)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, rqst->rq_addrlen)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqst->rq_xid);
 		__entry->status = status;
 		__entry->flags = rqst->rq_flags;
-		memcpy(__get_dynamic_array(addr),
-			&rqst->rq_addr, rqst->rq_addrlen);
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
 	),
 
-	TP_printk("addr=%pIScp rq_xid=0x%08x status=%d flags=%s",
-		(struct sockaddr *)__get_dynamic_array(addr),
-		__entry->xid,
-		__entry->status, show_rqstp_flags(__entry->flags))
+	TP_printk("addr=%s xid=0x%08x status=%d flags=%s",
+		  __get_str(addr), __entry->xid,
+		  __entry->status, show_rqstp_flags(__entry->flags))
 );
 
 DEFINE_EVENT(svc_rqst_status, svc_process,
@@ -604,26 +598,19 @@
 		__field(struct svc_xprt *, xprt)
 		__field(int, pid)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, xprt != NULL ?
-			xprt->xpt_remotelen : 0)
+		__string(addr, xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xprt = xprt;
 		__entry->pid = rqst? rqst->rq_task->pid : 0;
-		if (xprt) {
-			memcpy(__get_dynamic_array(addr),
-				&xprt->xpt_remote,
-				xprt->xpt_remotelen);
-			__entry->flags = xprt->xpt_flags;
-		} else
-			__entry->flags = 0;
+		__entry->flags = xprt->xpt_flags;
+		__assign_str(addr, xprt->xpt_remotebuf);
 	),
 
-	TP_printk("xprt=0x%p addr=%pIScp pid=%d flags=%s", __entry->xprt,
-		__get_dynamic_array_len(addr) != 0 ?
-			(struct sockaddr *)__get_dynamic_array(addr) : NULL,
-		__entry->pid, show_svc_xprt_flags(__entry->flags))
+	TP_printk("xprt=%p addr=%s pid=%d flags=%s",
+			__entry->xprt, __get_str(addr),
+			__entry->pid, show_svc_xprt_flags(__entry->flags))
 );
 
 DECLARE_EVENT_CLASS(svc_xprt_event,
@@ -634,19 +621,18 @@
 	TP_STRUCT__entry(
 		__field(struct svc_xprt *, xprt)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, xprt->xpt_remotelen)
+		__string(addr, xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xprt = xprt;
 		__entry->flags = xprt->xpt_flags;
-		memcpy(__get_dynamic_array(addr),
-		       &xprt->xpt_remote, xprt->xpt_remotelen);
+		__assign_str(addr, xprt->xpt_remotebuf);
 	),
 
-	TP_printk("xprt=0x%p addr=%pIScp flags=%s", __entry->xprt,
-		(struct sockaddr *)__get_dynamic_array(addr),
-		show_svc_xprt_flags(__entry->flags))
+	TP_printk("xprt=%p addr=%s flags=%s",
+			__entry->xprt, __get_str(addr),
+			show_svc_xprt_flags(__entry->flags))
 );
 
 DEFINE_EVENT(svc_xprt_event, svc_xprt_dequeue,
@@ -682,25 +668,18 @@
 		__field(struct svc_xprt *, xprt)
 		__field(int, len)
 		__field(unsigned long, flags)
-		__dynamic_array(unsigned char, addr, xprt != NULL ?
-			xprt->xpt_remotelen : 0)
+		__string(addr, xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xprt = xprt;
 		__entry->len = len;
-		if (xprt) {
-			memcpy(__get_dynamic_array(addr),
-					&xprt->xpt_remote,
-					xprt->xpt_remotelen);
-			__entry->flags = xprt->xpt_flags;
-		} else
-			__entry->flags = 0;
+		__entry->flags = xprt->xpt_flags;
+		__assign_str(addr, xprt->xpt_remotebuf);
 	),
 
-	TP_printk("xprt=0x%p addr=%pIScp len=%d flags=%s", __entry->xprt,
-		__get_dynamic_array_len(addr) != 0 ?
-			(struct sockaddr *)__get_dynamic_array(addr) : NULL,
+	TP_printk("xprt=%p addr=%s len=%d flags=%s",
+		__entry->xprt, __get_str(addr),
 		__entry->len, show_svc_xprt_flags(__entry->flags))
 );
 
@@ -712,18 +691,16 @@
 
 	TP_STRUCT__entry(
 		__field(u32, xid)
-		__dynamic_array(unsigned char, addr, dr->addrlen)
+		__string(addr, dr->xprt->xpt_remotebuf)
 	),
 
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(*(__be32 *)(dr->args +
 						       (dr->xprt_hlen>>2)));
-		memcpy(__get_dynamic_array(addr), &dr->addr, dr->addrlen);
+		__assign_str(addr, dr->xprt->xpt_remotebuf);
 	),
 
-	TP_printk("addr=%pIScp xid=0x%08x",
-		(struct sockaddr *)__get_dynamic_array(addr),
-		__entry->xid)
+	TP_printk("addr=%s xid=0x%08x", __get_str(addr), __entry->xid)
 );
 
 DEFINE_EVENT(svc_deferred_event, svc_drop_deferred,
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 47384d0..f745754 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -173,6 +173,7 @@ void svc_xprt_init(struct net *net, struct svc_xprt_class *xcl,
 	set_bit(XPT_BUSY, &xprt->xpt_flags);
 	rpc_init_wait_queue(&xprt->xpt_bc_pending, "xpt_bc_pending");
 	xprt->xpt_net = get_net(net);
+	strcpy(xprt->xpt_remotebuf, "uninitialized");
 }
 EXPORT_SYMBOL_GPL(svc_xprt_init);
 
@@ -894,12 +895,12 @@ int svc_send(struct svc_rqst *rqstp)
 		len = xprt->xpt_ops->xpo_sendto(rqstp);
 	mutex_unlock(&xprt->xpt_mutex);
 	rpc_wake_up(&xprt->xpt_bc_pending);
+	trace_svc_send(rqstp, len);
 	svc_xprt_release(rqstp);
 
 	if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
 		len = 0;
 out:
-	trace_svc_send(rqstp, len);
 	return len;
 }
 
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 9b67035..4ca1d92 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1310,6 +1310,7 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
 	set_bit(XPT_CONG_CTRL, &svsk->sk_xprt.xpt_flags);
 	if (sk->sk_state == TCP_LISTEN) {
 		dprintk("setting up TCP socket for listening\n");
+		strcpy(svsk->sk_xprt.xpt_remotebuf, "listener");
 		set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
 		sk->sk_data_ready = svc_tcp_listen_data_ready;
 		set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 17da06d..96cc8f6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -401,8 +401,10 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	 */
 	set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
 
-	if (listener)
+	if (listener) {
+		strcpy(cma_xprt->sc_xprt.xpt_remotebuf, "listener");
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
+	}
 
 	return cma_xprt;
 }


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/18] sunrpc: Re-purpose trace_svc_process
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (6 preceding siblings ...)
  2018-03-27 14:51 ` [PATCH v2 07/18] sunrpc: Save remote presentation address in svc_xprt for trace events Chuck Lever
@ 2018-03-27 14:51 ` Chuck Lever
  2018-03-27 14:51 ` [PATCH v2 09/18] sunrpc: Report per-RPC execution stats Chuck Lever
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:51 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Currently, trace_svc_process has two call sites:

1. Just after a call to svc_send. svc_send already invokes
   trace_svc_send with the same arguments just before returning

2. Just before a call to svc_drop. svc_drop already invokes
   trace_svc_drop with the same arguments just after it is called

Therefore trace_svc_process does not provide any additional
information not already provided by these other trace points.

However, it would be useful to record the incoming RPC procedure.
So reuse trace_svc_process for this purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/sunrpc.h |   30 ++++++++++++++++++++++++++----
 net/sunrpc/svc.c              |    9 +++------
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 1ec8c4c..5a8157c 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -508,6 +508,32 @@
 			show_rqstp_flags(__entry->flags))
 );
 
+TRACE_EVENT(svc_process,
+	TP_PROTO(const struct svc_rqst *rqst, const char *name),
+
+	TP_ARGS(rqst, name),
+
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, vers)
+		__field(u32, proc)
+		__string(service, name)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(rqst->rq_xid);
+		__entry->vers = rqst->rq_vers;
+		__entry->proc = rqst->rq_proc;
+		__assign_str(service, name);
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%u",
+			__get_str(addr), __entry->xid,
+			__get_str(service), __entry->vers, __entry->proc)
+);
+
 DECLARE_EVENT_CLASS(svc_rqst_event,
 
 	TP_PROTO(struct svc_rqst *rqst),
@@ -564,10 +590,6 @@
 		  __entry->status, show_rqstp_flags(__entry->flags))
 );
 
-DEFINE_EVENT(svc_rqst_status, svc_process,
-	TP_PROTO(struct svc_rqst *rqst, int status),
-	TP_ARGS(rqst, status));
-
 DEFINE_EVENT(svc_rqst_status, svc_send,
 	TP_PROTO(struct svc_rqst *rqst, int status),
 	TP_ARGS(rqst, status));
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 387cc4a..f19987f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1255,6 +1255,7 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
 
 	/* Syntactic check complete */
 	serv->sv_stats->rpccnt++;
+	trace_svc_process(rqstp, progp->pg_name);
 
 	/* Build the reply header. */
 	statp = resv->iov_base +resv->iov_len;
@@ -1431,14 +1432,10 @@ static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ..
 	}
 
 	/* Returns 1 for send, 0 for drop */
-	if (likely(svc_process_common(rqstp, argv, resv))) {
-		int ret = svc_send(rqstp);
+	if (likely(svc_process_common(rqstp, argv, resv)))
+		return svc_send(rqstp);
 
-		trace_svc_process(rqstp, ret);
-		return ret;
-	}
 out_drop:
-	trace_svc_process(rqstp, 0);
 	svc_drop(rqstp);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/18] sunrpc: Report per-RPC execution stats
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (7 preceding siblings ...)
  2018-03-27 14:51 ` [PATCH v2 08/18] sunrpc: Re-purpose trace_svc_process Chuck Lever
@ 2018-03-27 14:51 ` Chuck Lever
  2018-03-27 14:52 ` [PATCH v2 10/18] svc: Report xprt dequeue latency Chuck Lever
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:51 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Introduce a mechanism to report the server-side execution latency of
each RPC. The goal is to enable user space to filter the trace
record for latency outliers, build histograms, etc.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h    |    1 +
 include/trace/events/sunrpc.h |   21 +++++++++++++++++++++
 net/sunrpc/svc_xprt.c         |    3 ++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 786ae22..3bd7504 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -283,6 +283,7 @@ struct svc_rqst {
 	int			rq_reserved;	/* space on socket outq
 						 * reserved for this request
 						 */
+	ktime_t			rq_stime;	/* start time */
 
 	struct cache_req	rq_chandle;	/* handle passed to caches for 
 						 * request delaying 
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 5a8157c..1b383f7 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -705,6 +705,27 @@
 		__entry->len, show_svc_xprt_flags(__entry->flags))
 );
 
+TRACE_EVENT(svc_stats_latency,
+	TP_PROTO(const struct svc_rqst *rqst),
+
+	TP_ARGS(rqst),
+
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(unsigned long, execute)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(rqst->rq_xid);
+		__entry->execute = ktime_to_us(ktime_sub(ktime_get(),
+							 rqst->rq_stime));
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s xid=0x%08x execute-us=%lu",
+		__get_str(addr), __entry->xid, __entry->execute)
+);
 
 DECLARE_EVENT_CLASS(svc_deferred_event,
 	TP_PROTO(struct svc_deferred_req *dr),
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f745754..a7425da 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -782,7 +782,7 @@ static int svc_handle_xprt(struct svc_rqst *rqstp, struct svc_xprt *xprt)
 			len = svc_deferred_recv(rqstp);
 		else
 			len = xprt->xpt_ops->xpo_recvfrom(rqstp);
-		dprintk("svc: got len=%d\n", len);
+		rqstp->rq_stime = ktime_get();
 		rqstp->rq_reserved = serv->sv_max_mesg;
 		atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
 	}
@@ -888,6 +888,7 @@ int svc_send(struct svc_rqst *rqstp)
 
 	/* Grab mutex to serialize outgoing data. */
 	mutex_lock(&xprt->xpt_mutex);
+	trace_svc_stats_latency(rqstp);
 	if (test_bit(XPT_DEAD, &xprt->xpt_flags)
 			|| test_bit(XPT_CLOSE, &xprt->xpt_flags))
 		len = -ENOTCONN;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/18] svc: Report xprt dequeue latency
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (8 preceding siblings ...)
  2018-03-27 14:51 ` [PATCH v2 09/18] sunrpc: Report per-RPC execution stats Chuck Lever
@ 2018-03-27 14:52 ` Chuck Lever
  2018-03-27 14:52 ` [PATCH v2 11/18] nfsd: Fix NFSD trace points Chuck Lever
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:52 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Record the time between when a rqstp is enqueued on a transport
and when it is dequeued. This includes how long the rqstp waits on
the queue and how long it takes the kernel scheduler to wake a
nfsd thread to service it.

The svc_xprt_dequeue trace point is altered to include the number
of microseconds between xprt_enqueue and xprt_dequeue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc.h    |    1 +
 include/trace/events/sunrpc.h |   30 ++++++++++++++++++++++++++----
 net/sunrpc/svc_xprt.c         |    4 ++--
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 3bd7504..dc4c009 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -272,6 +272,7 @@ struct svc_rqst {
 #define	RQ_BUSY		(6)			/* request is busy */
 #define	RQ_DATA		(7)			/* request has data */
 	unsigned long		rq_flags;	/* flags field */
+	ktime_t			rq_qtime;	/* enqueue time */
 
 	void *			rq_argp;	/* decoded arguments */
 	void *			rq_resp;	/* xdr'd results */
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 1b383f7..922cb89 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -657,14 +657,36 @@
 			show_svc_xprt_flags(__entry->flags))
 );
 
-DEFINE_EVENT(svc_xprt_event, svc_xprt_dequeue,
-	TP_PROTO(struct svc_xprt *xprt),
-	TP_ARGS(xprt));
-
 DEFINE_EVENT(svc_xprt_event, svc_xprt_no_write_space,
 	TP_PROTO(struct svc_xprt *xprt),
 	TP_ARGS(xprt));
 
+TRACE_EVENT(svc_xprt_dequeue,
+	TP_PROTO(struct svc_rqst *rqst),
+
+	TP_ARGS(rqst),
+
+	TP_STRUCT__entry(
+		__field(struct svc_xprt *, xprt)
+		__field(unsigned long, flags)
+		__field(unsigned long, wakeup)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->xprt = rqst->rq_xprt;
+		__entry->flags = rqst->rq_xprt->xpt_flags;
+		__entry->wakeup = ktime_to_us(ktime_sub(ktime_get(),
+							rqst->rq_qtime));
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
+	),
+
+	TP_printk("xprt=%p addr=%s flags=%s wakeup-us=%lu",
+			__entry->xprt, __get_str(addr),
+			show_svc_xprt_flags(__entry->flags),
+			__entry->wakeup)
+);
+
 TRACE_EVENT(svc_wake_up,
 	TP_PROTO(int pid),
 
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index a7425da..5185efb 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -409,6 +409,7 @@ void svc_xprt_do_enqueue(struct svc_xprt *xprt)
 		if (test_and_set_bit(RQ_BUSY, &rqstp->rq_flags))
 			continue;
 		atomic_long_inc(&pool->sp_stats.threads_woken);
+		rqstp->rq_qtime = ktime_get();
 		wake_up_process(rqstp->rq_task);
 		goto out_unlock;
 	}
@@ -530,7 +531,6 @@ void svc_wake_up(struct svc_serv *serv)
 		if (test_bit(RQ_BUSY, &rqstp->rq_flags))
 			continue;
 		rcu_read_unlock();
-		dprintk("svc: daemon %p woken up.\n", rqstp);
 		wake_up_process(rqstp->rq_task);
 		trace_svc_wake_up(rqstp->rq_task->pid);
 		return;
@@ -726,7 +726,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
 		rqstp->rq_chandle.thread_wait = 5*HZ;
 	else
 		rqstp->rq_chandle.thread_wait = 1*HZ;
-	trace_svc_xprt_dequeue(rqstp->rq_xprt);
+	trace_svc_xprt_dequeue(rqstp);
 	return rqstp->rq_xprt;
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 11/18] nfsd: Fix NFSD trace points
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (9 preceding siblings ...)
  2018-03-27 14:52 ` [PATCH v2 10/18] svc: Report xprt dequeue latency Chuck Lever
@ 2018-03-27 14:52 ` Chuck Lever
  2018-03-27 14:52 ` [PATCH v2 12/18] nfsd: Record request byte count, not count of vectors Chuck Lever
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:52 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

nfsd-1915  [003] 77915.780959: write_opened:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780960: write_io_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780964: write_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1

Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
report", where the print format string is actually used. These
data transformations have to be done during the TP_fast_assign step.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/trace.h |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 8b2f1d9..3aa3f3b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -18,19 +18,24 @@
 		 int		len),
 	TP_ARGS(rqstp, fhp, offset, len),
 	TP_STRUCT__entry(
-		__field(__be32, xid)
-		__field_struct(struct knfsd_fh, fh)
+		__field(u32, xid)
+		__field(u32, fh_hash)
 		__field(loff_t, offset)
 		__field(int, len)
 	),
 	TP_fast_assign(
-		__entry->xid = rqstp->rq_xid,
-		fh_copy_shallow(&__entry->fh, &fhp->fh_handle);
+		__entry->xid = be32_to_cpu(rqstp->rq_xid);
+		do {
+			struct knfsd_fh fh;
+
+			fh_copy_shallow(&fh, &fhp->fh_handle);
+			__entry->fh_hash = knfsd_fh_hash(&fh);
+		} while (0);
 		__entry->offset = offset;
 		__entry->len = len;
 	),
-	TP_printk("xid=0x%x fh=0x%x offset=%lld len=%d",
-		  __be32_to_cpu(__entry->xid), knfsd_fh_hash(&__entry->fh),
+	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%lld len=%d",
+		  __entry->xid, __entry->fh_hash,
 		  __entry->offset, __entry->len)
 )
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 12/18] nfsd: Record request byte count, not count of vectors
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (10 preceding siblings ...)
  2018-03-27 14:52 ` [PATCH v2 11/18] nfsd: Fix NFSD trace points Chuck Lever
@ 2018-03-27 14:52 ` Chuck Lever
  2018-03-27 14:53 ` [PATCH v2 13/18] nfsd: Add "nfsd_" to trace point names Chuck Lever
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:52 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Byte count is more helpful to know than vector count.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/trace.h |   15 +++++----------
 fs/nfsd/vfs.c   |   16 ++++++++--------
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 3aa3f3b..9b79506 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -15,26 +15,21 @@
 	TP_PROTO(struct svc_rqst *rqstp,
 		 struct svc_fh	*fhp,
 		 loff_t		offset,
-		 int		len),
+		 unsigned long	len),
 	TP_ARGS(rqstp, fhp, offset, len),
 	TP_STRUCT__entry(
 		__field(u32, xid)
 		__field(u32, fh_hash)
 		__field(loff_t, offset)
-		__field(int, len)
+		__field(unsigned long, len)
 	),
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqstp->rq_xid);
-		do {
-			struct knfsd_fh fh;
-
-			fh_copy_shallow(&fh, &fhp->fh_handle);
-			__entry->fh_hash = knfsd_fh_hash(&fh);
-		} while (0);
+		__entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
 		__entry->offset = offset;
 		__entry->len = len;
 	),
-	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%lld len=%d",
+	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%lld len=%lu",
 		  __entry->xid, __entry->fh_hash,
 		  __entry->offset, __entry->len)
 )
@@ -44,7 +39,7 @@
 	TP_PROTO(struct svc_rqst *rqstp,	\
 		 struct svc_fh	*fhp,		\
 		 loff_t		offset,		\
-		 int		len),		\
+		 unsigned long	len),		\
 	TP_ARGS(rqstp, fhp, offset, len))
 
 DEFINE_NFSD_IO_EVENT(read_start);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index a3c9bfa..e2eab4c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1024,27 +1024,27 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct raparms	*ra;
 	__be32 err;
 
-	trace_read_start(rqstp, fhp, offset, vlen);
+	trace_read_start(rqstp, fhp, offset, *count);
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
 	if (err)
 		return err;
 
 	ra = nfsd_init_raparms(file);
 
-	trace_read_opened(rqstp, fhp, offset, vlen);
+	trace_read_opened(rqstp, fhp, offset, *count);
 
 	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
 		err = nfsd_splice_read(rqstp, file, offset, count);
 	else
 		err = nfsd_readv(file, offset, vec, vlen, count);
 
-	trace_read_io_done(rqstp, fhp, offset, vlen);
+	trace_read_io_done(rqstp, fhp, offset, *count);
 
 	if (ra)
 		nfsd_put_raparams(file, ra);
 	fput(file);
 
-	trace_read_done(rqstp, fhp, offset, vlen);
+	trace_read_done(rqstp, fhp, offset, *count);
 
 	return err;
 }
@@ -1061,18 +1061,18 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct file *file = NULL;
 	__be32 err = 0;
 
-	trace_write_start(rqstp, fhp, offset, vlen);
+	trace_write_start(rqstp, fhp, offset, *cnt);
 
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 	if (err)
 		goto out;
 
-	trace_write_opened(rqstp, fhp, offset, vlen);
+	trace_write_opened(rqstp, fhp, offset, *cnt);
 	err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt, stable);
-	trace_write_io_done(rqstp, fhp, offset, vlen);
+	trace_write_io_done(rqstp, fhp, offset, *cnt);
 	fput(file);
 out:
-	trace_write_done(rqstp, fhp, offset, vlen);
+	trace_write_done(rqstp, fhp, offset, *cnt);
 	return err;
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 13/18] nfsd: Add "nfsd_" to trace point names
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (11 preceding siblings ...)
  2018-03-27 14:52 ` [PATCH v2 12/18] nfsd: Record request byte count, not count of vectors Chuck Lever
@ 2018-03-27 14:53 ` Chuck Lever
  2018-03-27 14:53 ` [PATCH v2 14/18] nfsd: Add I/O trace points in the NFSv4 write path Chuck Lever
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:53 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4layouts.c |   16 ++++++++--------
 fs/nfsd/nfs4proc.c    |    4 ++--
 fs/nfsd/trace.h       |    4 ++--
 fs/nfsd/vfs.c         |   16 ++++++++--------
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 7d88836..228faf0 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -165,7 +165,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 	struct nfs4_client *clp = ls->ls_stid.sc_client;
 	struct nfs4_file *fp = ls->ls_stid.sc_file;
 
-	trace_layoutstate_free(&ls->ls_stid.sc_stateid);
+	trace_nfsd_layoutstate_free(&ls->ls_stid.sc_stateid);
 
 	spin_lock(&clp->cl_lock);
 	list_del_init(&ls->ls_perclnt);
@@ -264,7 +264,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 	list_add(&ls->ls_perfile, &fp->fi_lo_states);
 	spin_unlock(&fp->fi_lock);
 
-	trace_layoutstate_alloc(&ls->ls_stid.sc_stateid);
+	trace_nfsd_layoutstate_alloc(&ls->ls_stid.sc_stateid);
 	return ls;
 }
 
@@ -334,7 +334,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 	if (list_empty(&ls->ls_layouts))
 		goto out_unlock;
 
-	trace_layout_recall(&ls->ls_stid.sc_stateid);
+	trace_nfsd_layout_recall(&ls->ls_stid.sc_stateid);
 
 	refcount_inc(&ls->ls_stid.sc_count);
 	nfsd4_run_cb(&ls->ls_recall);
@@ -507,7 +507,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 						false, lrp->lr_layout_type,
 						&ls);
 	if (nfserr) {
-		trace_layout_return_lookup_fail(&lrp->lr_sid);
+		trace_nfsd_layout_return_lookup_fail(&lrp->lr_sid);
 		return nfserr;
 	}
 
@@ -523,7 +523,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 			nfs4_inc_and_copy_stateid(&lrp->lr_sid, &ls->ls_stid);
 		lrp->lrs_present = 1;
 	} else {
-		trace_layoutstate_unhash(&ls->ls_stid.sc_stateid);
+		trace_nfsd_layoutstate_unhash(&ls->ls_stid.sc_stateid);
 		nfs4_unhash_stid(&ls->ls_stid);
 		lrp->lrs_present = 0;
 	}
@@ -694,7 +694,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 		/*
 		 * Unknown error or non-responding client, we'll need to fence.
 		 */
-		trace_layout_recall_fail(&ls->ls_stid.sc_stateid);
+		trace_nfsd_layout_recall_fail(&ls->ls_stid.sc_stateid);
 
 		ops = nfsd4_layout_ops[ls->ls_layout_type];
 		if (ops->fence_client)
@@ -703,7 +703,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 			nfsd4_cb_layout_fail(ls);
 		return -1;
 	case -NFS4ERR_NOMATCHING_LAYOUT:
-		trace_layout_recall_done(&ls->ls_stid.sc_stateid);
+		trace_nfsd_layout_recall_done(&ls->ls_stid.sc_stateid);
 		task->tk_status = 0;
 		return 1;
 	}
@@ -716,7 +716,7 @@ void nfsd4_setup_layout_type(struct svc_export *exp)
 		container_of(cb, struct nfs4_layout_stateid, ls_recall);
 	LIST_HEAD(reaplist);
 
-	trace_layout_recall_release(&ls->ls_stid.sc_stateid);
+	trace_nfsd_layout_recall_release(&ls->ls_stid.sc_stateid);
 
 	nfsd4_return_all_layouts(ls, &reaplist);
 	nfsd4_free_layouts(&reaplist);
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a0bed2b..10c7620 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1412,7 +1412,7 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
 	nfserr = nfsd4_preprocess_layout_stateid(rqstp, cstate, &lgp->lg_sid,
 						true, lgp->lg_layout_type, &ls);
 	if (nfserr) {
-		trace_layout_get_lookup_fail(&lgp->lg_sid);
+		trace_nfsd_layout_get_lookup_fail(&lgp->lg_sid);
 		goto out;
 	}
 
@@ -1481,7 +1481,7 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
 						false, lcp->lc_layout_type,
 						&ls);
 	if (nfserr) {
-		trace_layout_commit_lookup_fail(&lcp->lc_sid);
+		trace_nfsd_layout_commit_lookup_fail(&lcp->lc_sid);
 		/* fixup error code as per RFC5661 */
 		if (nfserr == nfserr_bad_stateid)
 			nfserr = nfserr_badlayout;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 9b79506..30e0485 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -35,7 +35,7 @@
 )
 
 #define DEFINE_NFSD_IO_EVENT(name)		\
-DEFINE_EVENT(nfsd_io_class, name,		\
+DEFINE_EVENT(nfsd_io_class, nfsd_##name,	\
 	TP_PROTO(struct svc_rqst *rqstp,	\
 		 struct svc_fh	*fhp,		\
 		 loff_t		offset,		\
@@ -76,7 +76,7 @@
 )
 
 #define DEFINE_STATEID_EVENT(name) \
-DEFINE_EVENT(nfsd_stateid_class, name, \
+DEFINE_EVENT(nfsd_stateid_class, nfsd_##name, \
 	TP_PROTO(stateid_t *stp), \
 	TP_ARGS(stp))
 DEFINE_STATEID_EVENT(layoutstate_alloc);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index e2eab4c..8570319 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1024,27 +1024,27 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct raparms	*ra;
 	__be32 err;
 
-	trace_read_start(rqstp, fhp, offset, *count);
+	trace_nfsd_read_start(rqstp, fhp, offset, *count);
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
 	if (err)
 		return err;
 
 	ra = nfsd_init_raparms(file);
 
-	trace_read_opened(rqstp, fhp, offset, *count);
+	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
 
 	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
 		err = nfsd_splice_read(rqstp, file, offset, count);
 	else
 		err = nfsd_readv(file, offset, vec, vlen, count);
 
-	trace_read_io_done(rqstp, fhp, offset, *count);
+	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
 
 	if (ra)
 		nfsd_put_raparams(file, ra);
 	fput(file);
 
-	trace_read_done(rqstp, fhp, offset, *count);
+	trace_nfsd_read_done(rqstp, fhp, offset, *count);
 
 	return err;
 }
@@ -1061,18 +1061,18 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct file *file = NULL;
 	__be32 err = 0;
 
-	trace_write_start(rqstp, fhp, offset, *cnt);
+	trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
 
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 	if (err)
 		goto out;
 
-	trace_write_opened(rqstp, fhp, offset, *cnt);
+	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 	err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt, stable);
-	trace_write_io_done(rqstp, fhp, offset, *cnt);
+	trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
 	fput(file);
 out:
-	trace_write_done(rqstp, fhp, offset, *cnt);
+	trace_nfsd_write_done(rqstp, fhp, offset, *cnt);
 	return err;
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 14/18] nfsd: Add I/O trace points in the NFSv4 write path
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (12 preceding siblings ...)
  2018-03-27 14:53 ` [PATCH v2 13/18] nfsd: Add "nfsd_" to trace point names Chuck Lever
@ 2018-03-27 14:53 ` Chuck Lever
  2018-03-27 14:53 ` [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc Chuck Lever
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:53 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |    7 +++++--
 fs/nfsd/trace.h    |   33 +++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.c      |   23 ++++++++++++-----------
 3 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 10c7620..b93673e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1001,6 +1001,9 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
 	if (write->wr_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
+	cnt = write->wr_buflen;
+	trace_nfsd_write_start(rqstp, &cstate->current_fh,
+			       write->wr_offset, cnt);
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
 						stateid, WR_STATE, &filp, NULL);
 	if (status) {
@@ -1008,7 +1011,6 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
 		return status;
 	}
 
-	cnt = write->wr_buflen;
 	write->wr_how_written = write->wr_stable_how;
 	gen_boot_verifier(&write->wr_verifier, SVC_NET(rqstp));
 
@@ -1021,7 +1023,8 @@ static int fill_in_write_vector(struct kvec *vec, struct nfsd4_write *write)
 	fput(filp);
 
 	write->wr_bytes_written = cnt;
-
+	trace_nfsd_write_done(rqstp, &cstate->current_fh,
+			      write->wr_offset, cnt);
 	return status;
 }
 
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 30e0485..653e9ee 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -51,6 +51,39 @@
 DEFINE_NFSD_IO_EVENT(write_io_done);
 DEFINE_NFSD_IO_EVENT(write_done);
 
+DECLARE_EVENT_CLASS(nfsd_err_class,
+	TP_PROTO(struct svc_rqst *rqstp,
+		 struct svc_fh	*fhp,
+		 loff_t		offset,
+		 int		status),
+	TP_ARGS(rqstp, fhp, offset, status),
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, fh_hash)
+		__field(loff_t, offset)
+		__field(int, status)
+	),
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(rqstp->rq_xid);
+		__entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+		__entry->offset = offset;
+		__entry->status = status;
+	),
+	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%lld status=%d",
+		  __entry->xid, __entry->fh_hash,
+		  __entry->offset, __entry->status)
+)
+
+#define DEFINE_NFSD_ERR_EVENT(name)		\
+DEFINE_EVENT(nfsd_err_class, nfsd_##name,	\
+	TP_PROTO(struct svc_rqst *rqstp,	\
+		 struct svc_fh	*fhp,		\
+		 loff_t		offset,		\
+		 int		len),		\
+	TP_ARGS(rqstp, fhp, offset, len))
+
+DEFINE_NFSD_ERR_EVENT(write_err);
+
 #include "state.h"
 
 DECLARE_EVENT_CLASS(nfsd_stateid_class,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8570319..ee59a0b 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -965,13 +965,15 @@ static int wait_for_concurrent_writes(struct file *file)
 {
 	struct svc_export	*exp;
 	struct iov_iter		iter;
-	__be32			err = 0;
+	__be32			nfserr;
 	int			host_err;
 	int			use_wgather;
 	loff_t			pos = offset;
 	unsigned int		pflags = current->flags;
 	rwf_t			flags = 0;
 
+	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
+
 	if (test_bit(RQ_LOCAL, &rqstp->rq_flags))
 		/*
 		 * We want less throttling in balance_dirty_pages()
@@ -994,22 +996,23 @@ static int wait_for_concurrent_writes(struct file *file)
 	host_err = vfs_iter_write(file, &iter, &pos, flags);
 	if (host_err < 0)
 		goto out_nfserr;
-	*cnt = host_err;
-	nfsdstats.io_write += host_err;
+	nfsdstats.io_write += *cnt;
 	fsnotify_modify(file);
 
 	if (stable && use_wgather)
 		host_err = wait_for_concurrent_writes(file);
 
 out_nfserr:
-	dprintk("nfsd: write complete host_err=%d\n", host_err);
-	if (host_err >= 0)
-		err = 0;
-	else
-		err = nfserrno(host_err);
+	if (host_err >= 0) {
+		trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
+		nfserr = nfs_ok;
+	} else {
+		trace_nfsd_write_err(rqstp, fhp, offset, host_err);
+		nfserr = nfserrno(host_err);
+	}
 	if (test_bit(RQ_LOCAL, &rqstp->rq_flags))
 		current_restore_flags(pflags, PF_LESS_THROTTLE);
-	return err;
+	return nfserr;
 }
 
 /*
@@ -1067,9 +1070,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err)
 		goto out;
 
-	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 	err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt, stable);
-	trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
 	fput(file);
 out:
 	trace_nfsd_write_done(rqstp, fhp, offset, *cnt);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (13 preceding siblings ...)
  2018-03-27 14:53 ` [PATCH v2 14/18] nfsd: Add I/O trace points in the NFSv4 write path Chuck Lever
@ 2018-03-27 14:53 ` Chuck Lever
  2018-03-27 16:57   ` Chuck Lever
  2018-03-27 14:53 ` [PATCH v2 16/18] nfsd: Trace NFSv4 COMPOUND execution Chuck Lever
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:53 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not effective for NFSv4 reads.

Move and copy the trace points so that NFSv4 reads are captured.
Also, we want to record any local I/O error that occurs, and
the total count of bytes that were actually moved. And, also
whether splice or vectored read was used.

The svc_fh is not passed to the read helpers, so some code
duplication is necessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |    5 +++++
 fs/nfsd/nfs4xdr.c  |   56 +++++++++++++++++++++++++++++++++++++---------------
 fs/nfsd/trace.h    |    4 +++-
 fs/nfsd/vfs.c      |   54 ++++++++++++++++++++++----------------------------
 fs/nfsd/vfs.h      |    8 ++++---
 5 files changed, 76 insertions(+), 51 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b93673e..39016b6 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -751,6 +751,9 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 	if (read->rd_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
+	trace_nfsd_read_start(rqstp, &cstate->current_fh,
+			      read->rd_offset, read->rd_length);
+
 	/*
 	 * If we do a zero copy read, then a client will see read data
 	 * that reflects the state of the file *after* performing the
@@ -783,6 +786,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 {
 	if (u->read.rd_filp)
 		fput(u->read.rd_filp);
+	trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
+			     u->read.rd_offset, u->read.rd_length);
 }
 
 static __be32
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e502fd1..d03059a 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -37,6 +37,7 @@
 #include <linux/file.h>
 #include <linux/slab.h>
 #include <linux/namei.h>
+#include <linux/fsnotify.h>
 #include <linux/statfs.h>
 #include <linux/utsname.h>
 #include <linux/pagemap.h>
@@ -50,6 +51,7 @@
 #include "cache.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "trace.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -3416,28 +3418,28 @@ static __be32 nfsd4_encode_splice_read(
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	struct xdr_buf *buf = xdr->buf;
+	int host_err;
 	u32 eof;
 	long len;
 	int space_left;
-	__be32 nfserr;
 	__be32 *p = xdr->p - 2;
 
 	/* Make sure there will be room for padding if needed */
 	if (xdr->end - xdr->p < 1)
 		return nfserr_resource;
 
+	trace_nfsd_read_splice(resp->rqstp, read->rd_fhp,
+			       read->rd_offset, maxcount);
 	len = maxcount;
-	nfserr = nfsd_splice_read(read->rd_rqstp, file,
+	host_err = nfsd_splice_read(read->rd_rqstp, file,
 				  read->rd_offset, &maxcount);
-	if (nfserr) {
-		/*
-		 * nfsd_splice_actor may have already messed with the
-		 * page length; reset it so as not to confuse
-		 * xdr_truncate_encode:
-		 */
-		buf->page_len = 0;
-		return nfserr;
-	}
+	if (host_err < 0)
+		goto err;
+	trace_nfsd_read_io_done(read->rd_rqstp, read->rd_fhp,
+				read->rd_offset, maxcount);
+	maxcount = host_err;
+	nfsdstats.io_read += maxcount;
+	fsnotify_access(file);
 
 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
 				d_inode(read->rd_fhp->fh_dentry)->i_size);
@@ -3470,6 +3472,17 @@ static __be32 nfsd4_encode_splice_read(
 	xdr->end = (__be32 *)((void *)xdr->end + space_left);
 
 	return 0;
+
+err:
+	/*
+	 * nfsd_splice_actor may have already messed with the
+	 * page length; reset it so as not to confuse
+	 * xdr_truncate_encode:
+	 */
+	buf->page_len = 0;
+	trace_nfsd_read_err(read->rd_rqstp, read->rd_fhp,
+			    read->rd_offset, host_err);
+	return nfserrno(host_err);
 }
 
 static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
@@ -3477,12 +3490,12 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 				 struct file *file, unsigned long maxcount)
 {
 	struct xdr_stream *xdr = &resp->xdr;
+	int host_err;
 	u32 eof;
 	int v;
 	int starting_len = xdr->buf->len - 8;
 	long len;
 	int thislen;
-	__be32 nfserr;
 	__be32 tmp;
 	__be32 *p;
 	u32 zzz = 0;
@@ -3510,11 +3523,18 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 	}
 	read->rd_vlen = v;
 
+	trace_nfsd_read_vector(resp->rqstp, read->rd_fhp,
+			       read->rd_offset, maxcount);
 	len = maxcount;
-	nfserr = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
-			read->rd_vlen, &maxcount);
-	if (nfserr)
-		return nfserr;
+	host_err = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
+			      read->rd_vlen, &maxcount);
+	if (host_err < 0)
+		goto err;
+	trace_nfsd_read_io_done(resp->rqstp, read->rd_fhp,
+				read->rd_offset, maxcount);
+	nfsdstats.io_read += maxcount;
+	fsnotify_access(file);
+
 	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
 
 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
@@ -3530,6 +3550,10 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 								&zzz, pad);
 	return 0;
 
+err:
+	trace_nfsd_read_err(resp->rqstp, read->rd_fhp,
+			    read->rd_offset, host_err);
+	return nfserrno(host_err);
 }
 
 static __be32
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 653e9ee..a8bbd9d 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -43,7 +43,8 @@
 	TP_ARGS(rqstp, fhp, offset, len))
 
 DEFINE_NFSD_IO_EVENT(read_start);
-DEFINE_NFSD_IO_EVENT(read_opened);
+DEFINE_NFSD_IO_EVENT(read_splice);
+DEFINE_NFSD_IO_EVENT(read_vector);
 DEFINE_NFSD_IO_EVENT(read_io_done);
 DEFINE_NFSD_IO_EVENT(read_done);
 DEFINE_NFSD_IO_EVENT(write_start);
@@ -82,6 +83,7 @@
 		 int		len),		\
 	TP_ARGS(rqstp, fhp, offset, len))
 
+DEFINE_NFSD_ERR_EVENT(read_err);
 DEFINE_NFSD_ERR_EVENT(write_err);
 
 #include "state.h"
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ee59a0b..0fa54b5 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -881,19 +881,7 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
 	return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
 }
 
-static __be32
-nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
-{
-	if (host_err >= 0) {
-		nfsdstats.io_read += host_err;
-		*count = host_err;
-		fsnotify_access(file);
-		return 0;
-	} else 
-		return nfserrno(host_err);
-}
-
-__be32 nfsd_splice_read(struct svc_rqst *rqstp,
+int nfsd_splice_read(struct svc_rqst *rqstp,
 		     struct file *file, loff_t offset, unsigned long *count)
 {
 	struct splice_desc sd = {
@@ -902,23 +890,18 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp,
 		.pos		= offset,
 		.u.data		= rqstp,
 	};
-	int host_err;
 
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
-	host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
-	return nfsd_finish_read(file, count, host_err);
+	return splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
 }
 
-__be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
-		unsigned long *count)
+int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
+	       unsigned long *count)
 {
 	struct iov_iter iter;
-	int host_err;
 
 	iov_iter_kvec(&iter, READ | ITER_KVEC, vec, vlen, *count);
-	host_err = vfs_iter_read(file, &iter, &offset, 0);
-
-	return nfsd_finish_read(file, count, host_err);
+	return vfs_iter_read(file, &iter, &offset, 0);
 }
 
 /*
@@ -1025,6 +1008,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 {
 	struct file *file;
 	struct raparms	*ra;
+	int host_err;
 	__be32 err;
 
 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
@@ -1034,14 +1018,24 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	ra = nfsd_init_raparms(file);
 
-	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
-
-	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
-		err = nfsd_splice_read(rqstp, file, offset, count);
-	else
-		err = nfsd_readv(file, offset, vec, vlen, count);
-
-	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
+	if (file->f_op->splice_read &&
+	    test_bit(RQ_SPLICE_OK, &rqstp->rq_flags)) {
+		trace_nfsd_read_splice(rqstp, fhp, offset, *count);
+		host_err = nfsd_splice_read(rqstp, file, offset, count);
+	} else {
+		trace_nfsd_read_vector(rqstp, fhp, offset, *count);
+		host_err = nfsd_readv(file, offset, vec, vlen, count);
+	}
+	if (host_err >= 0) {
+		trace_nfsd_read_io_done(rqstp, fhp, offset, host_err);
+		nfsdstats.io_read += host_err;
+		*count = host_err;
+		fsnotify_access(file);
+		err = nfs_ok;
+	} else  {
+		trace_nfsd_read_err(rqstp, fhp, offset, host_err);
+		err = nfserrno(host_err);
+	}
 
 	if (ra)
 		nfsd_put_raparams(file, ra);
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index be6d8e0..d9131c3 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -78,10 +78,10 @@ __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
 struct raparms;
-__be32		nfsd_splice_read(struct svc_rqst *,
-				struct file *, loff_t, unsigned long *);
-__be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
-				unsigned long *);
+int		nfsd_splice_read(struct svc_rqst *rqstp, struct file *file,
+				loff_t offset, unsigned long *count);
+int		nfsd_readv(struct file *file, loff_t offset, struct kvec *vec,
+				int vlen, unsigned long *count);
 __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
 				loff_t, struct kvec *, int, unsigned long *);
 __be32 		nfsd_write(struct svc_rqst *, struct svc_fh *, loff_t,


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 16/18] nfsd: Trace NFSv4 COMPOUND execution
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (14 preceding siblings ...)
  2018-03-27 14:53 ` [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc Chuck Lever
@ 2018-03-27 14:53 ` Chuck Lever
  2018-03-27 14:54 ` [PATCH v2 17/18] NFSD: Clean up legacy NFS WRITE argument XDR decoders Chuck Lever
  2018-03-27 14:54 ` [PATCH v2 18/18] NFSD: Clean up legacy NFS SYMLINK " Chuck Lever
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:53 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |    9 +++------
 fs/nfsd/trace.h    |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 39016b6..0df37e0 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1722,12 +1722,10 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 		goto encode_op;
 	}
 
+	trace_nfsd_compound(rqstp, args->opcnt);
 	while (!status && resp->opcnt < args->opcnt) {
 		op = &args->ops[resp->opcnt++];
 
-		dprintk("nfsv4 compound op #%d/%d: %d (%s)\n",
-			resp->opcnt, args->opcnt, op->opnum,
-			nfsd4_op_name(op->opnum));
 		/*
 		 * The XDR decode routines may have pre-set op->status;
 		 * for example, if there is a miscellaneous XDR error
@@ -1801,9 +1799,8 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
 			status = op->status;
 		}
 
-		dprintk("nfsv4 compound op %p opcnt %d #%d: %d: status %d\n",
-			args->ops, args->opcnt, resp->opcnt, op->opnum,
-			be32_to_cpu(status));
+		trace_nfsd_compound_status(args->opcnt, resp->opcnt, status,
+					   nfsd4_op_name(op->opnum));
 
 		nfsd4_cstate_clear_replay(cstate);
 		nfsd4_increment_op_stats(op->opnum);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index a8bbd9d..80933e4 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -11,6 +11,45 @@
 #include <linux/tracepoint.h>
 #include "nfsfh.h"
 
+TRACE_EVENT(nfsd_compound,
+	TP_PROTO(const struct svc_rqst *rqst,
+		 u32 args_opcnt),
+	TP_ARGS(rqst, args_opcnt),
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, args_opcnt)
+	),
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(rqst->rq_xid);
+		__entry->args_opcnt = args_opcnt;
+	),
+	TP_printk("xid=0x%08x opcnt=%u",
+		__entry->xid, __entry->args_opcnt)
+)
+
+TRACE_EVENT(nfsd_compound_status,
+	TP_PROTO(u32 args_opcnt,
+		 u32 resp_opcnt,
+		 __be32 status,
+		 const char *name),
+	TP_ARGS(args_opcnt, resp_opcnt, status, name),
+	TP_STRUCT__entry(
+		__field(u32, args_opcnt)
+		__field(u32, resp_opcnt)
+		__field(int, status)
+		__string(name, name)
+	),
+	TP_fast_assign(
+		__entry->args_opcnt = args_opcnt;
+		__entry->resp_opcnt = resp_opcnt;
+		__entry->status = be32_to_cpu(status);
+		__assign_str(name, name);
+	),
+	TP_printk("op=%u/%u %s status=%d",
+		__entry->resp_opcnt, __entry->args_opcnt,
+		__get_str(name), __entry->status)
+)
+
 DECLARE_EVENT_CLASS(nfsd_io_class,
 	TP_PROTO(struct svc_rqst *rqstp,
 		 struct svc_fh	*fhp,


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 17/18] NFSD: Clean up legacy NFS WRITE argument XDR decoders
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (15 preceding siblings ...)
  2018-03-27 14:53 ` [PATCH v2 16/18] nfsd: Trace NFSv4 COMPOUND execution Chuck Lever
@ 2018-03-27 14:54 ` Chuck Lever
  2018-03-27 14:54 ` [PATCH v2 18/18] NFSD: Clean up legacy NFS SYMLINK " Chuck Lever
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:54 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c         |    8 ++++++--
 fs/nfsd/nfs3xdr.c          |   16 ++++------------
 fs/nfsd/nfsproc.c          |    9 +++++++--
 fs/nfsd/nfsxdr.c           |   14 ++------------
 fs/nfsd/xdr.h              |    2 +-
 fs/nfsd/xdr3.h             |    2 +-
 include/linux/sunrpc/svc.h |    2 ++
 net/sunrpc/svc.c           |   42 ++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 65 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 1d0ce3c..2dd95eb 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,6 +192,7 @@
 	struct nfsd3_writeres *resp = rqstp->rq_resp;
 	__be32	nfserr;
 	unsigned long cnt = argp->len;
+	unsigned int nvecs;
 
 	dprintk("nfsd: WRITE(3)    %s %d bytes at %Lu%s\n",
 				SVCFH_fmt(&argp->fh),
@@ -201,9 +202,12 @@
 
 	fh_copy(&resp->fh, &argp->fh);
 	resp->committed = argp->stable;
+	nvecs = svc_fill_write_vector(rqstp, &argp->first, cnt);
+	if (!nvecs)
+		RETURN_STATUS(nfserr_io);
 	nfserr = nfsd_write(rqstp, &resp->fh, argp->offset,
-				rqstp->rq_vec, argp->vlen,
-				&cnt, resp->committed);
+			    rqstp->rq_vec, nvecs, &cnt,
+			    resp->committed);
 	resp->count = cnt;
 	RETURN_STATUS(nfserr);
 }
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 1a70581..e19fc5d 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -391,7 +391,7 @@ void fill_post_wcc(struct svc_fh *fhp)
 nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p)
 {
 	struct nfsd3_writeargs *args = rqstp->rq_argp;
-	unsigned int len, v, hdr, dlen;
+	unsigned int len, hdr, dlen;
 	u32 max_blocksize = svc_max_payload(rqstp);
 	struct kvec *head = rqstp->rq_arg.head;
 	struct kvec *tail = rqstp->rq_arg.tail;
@@ -433,17 +433,9 @@ void fill_post_wcc(struct svc_fh *fhp)
 		args->count = max_blocksize;
 		len = args->len = max_blocksize;
 	}
-	rqstp->rq_vec[0].iov_base = (void*)p;
-	rqstp->rq_vec[0].iov_len = head->iov_len - hdr;
-	v = 0;
-	while (len > rqstp->rq_vec[v].iov_len) {
-		len -= rqstp->rq_vec[v].iov_len;
-		v++;
-		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
-		rqstp->rq_vec[v].iov_len = PAGE_SIZE;
-	}
-	rqstp->rq_vec[v].iov_len = len;
-	args->vlen = v + 1;
+
+	args->first.iov_base = (void *)p;
+	args->first.iov_len = head->iov_len - hdr;
 	return 1;
 }
 
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 43c0419..1995ea6 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -212,13 +212,18 @@
 	struct nfsd_attrstat *resp = rqstp->rq_resp;
 	__be32	nfserr;
 	unsigned long cnt = argp->len;
+	unsigned int nvecs;
 
 	dprintk("nfsd: WRITE    %s %d bytes at %d\n",
 		SVCFH_fmt(&argp->fh),
 		argp->len, argp->offset);
 
-	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), argp->offset,
-				rqstp->rq_vec, argp->vlen, &cnt, NFS_DATA_SYNC);
+	nvecs = svc_fill_write_vector(rqstp, &argp->first, cnt);
+	if (!nvecs)
+		return nfserr_io;
+	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
+			    argp->offset, rqstp->rq_vec, nvecs,
+			    &cnt, NFS_DATA_SYNC);
 	return nfsd_return_attrs(nfserr, resp);
 }
 
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 79b6064..db24ae8 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -287,7 +287,6 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f
 	struct nfsd_writeargs *args = rqstp->rq_argp;
 	unsigned int len, hdr, dlen;
 	struct kvec *head = rqstp->rq_arg.head;
-	int v;
 
 	p = decode_fh(p, &args->fh);
 	if (!p)
@@ -323,17 +322,8 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f
 	if (dlen < XDR_QUADLEN(len)*4)
 		return 0;
 
-	rqstp->rq_vec[0].iov_base = (void*)p;
-	rqstp->rq_vec[0].iov_len = head->iov_len - hdr;
-	v = 0;
-	while (len > rqstp->rq_vec[v].iov_len) {
-		len -= rqstp->rq_vec[v].iov_len;
-		v++;
-		rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
-		rqstp->rq_vec[v].iov_len = PAGE_SIZE;
-	}
-	rqstp->rq_vec[v].iov_len = len;
-	args->vlen = v + 1;
+	args->first.iov_base = (void *)p;
+	args->first.iov_len = head->iov_len - hdr;
 	return 1;
 }
 
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h
index 2f4f22e..a765c41 100644
--- a/fs/nfsd/xdr.h
+++ b/fs/nfsd/xdr.h
@@ -34,7 +34,7 @@ struct nfsd_writeargs {
 	svc_fh			fh;
 	__u32			offset;
 	int			len;
-	int			vlen;
+	struct kvec		first;
 };
 
 struct nfsd_createargs {
diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
index 056bf8a..deccf7f 100644
--- a/fs/nfsd/xdr3.h
+++ b/fs/nfsd/xdr3.h
@@ -41,7 +41,7 @@ struct nfsd3_writeargs {
 	__u32			count;
 	int			stable;
 	__u32			len;
-	int			vlen;
+	struct kvec		first;
 };
 
 struct nfsd3_createargs {
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index dc4c009..fb3fcac 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -495,6 +495,8 @@ int		   svc_register(const struct svc_serv *, struct net *, const int,
 void		   svc_reserve(struct svc_rqst *rqstp, int space);
 struct svc_pool *  svc_pool_for_cpu(struct svc_serv *serv, int cpu);
 char *		   svc_print_addr(struct svc_rqst *, char *, size_t);
+unsigned int	   svc_fill_write_vector(struct svc_rqst *rqstp,
+					 struct kvec *first, size_t total);
 
 #define	RPC_MAX_ADDRBUFLEN	(63U)
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index f19987f..a155e2d 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1533,3 +1533,45 @@ u32 svc_max_payload(const struct svc_rqst *rqstp)
 	return max;
 }
 EXPORT_SYMBOL_GPL(svc_max_payload);
+
+/**
+ * svc_fill_write_vector - Construct data argument for VFS write call
+ * @rqstp: svc_rqst to operate on
+ * @first: buffer containing first section of write payload
+ * @total: total number of bytes of write payload
+ *
+ * Returns the number of elements populated in the data argument array.
+ */
+unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct kvec *first,
+				   size_t total)
+{
+	struct kvec *vec = rqstp->rq_vec;
+	struct page **pages;
+	unsigned int i;
+
+	/* Some types of transport can present the write payload
+	 * entirely in rq_arg.pages. In this case, @first is empty.
+	 */
+	i = 0;
+	if (first->iov_len) {
+		vec[i].iov_base = first->iov_base;
+		vec[i].iov_len = min_t(size_t, total, first->iov_len);
+		total -= vec[i].iov_len;
+		++i;
+	}
+
+	WARN_ON_ONCE(rqstp->rq_arg.page_base != 0);
+	pages = rqstp->rq_arg.pages;
+	while (total) {
+		vec[i].iov_base = page_address(*pages);
+		vec[i].iov_len = min_t(size_t, total, PAGE_SIZE);
+		total -= vec[i].iov_len;
+		++i;
+
+		++pages;
+	}
+
+	WARN_ON_ONCE(i > ARRAY_SIZE(rqstp->rq_vec));
+	return i;
+}
+EXPORT_SYMBOL_GPL(svc_fill_write_vector);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 18/18] NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
  2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
                   ` (16 preceding siblings ...)
  2018-03-27 14:54 ` [PATCH v2 17/18] NFSD: Clean up legacy NFS WRITE argument XDR decoders Chuck Lever
@ 2018-03-27 14:54 ` Chuck Lever
  17 siblings, 0 replies; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 14:54 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:

 - one fewer data copies on transports that support DDP
 - consistent error checking across all versions
 - reduction of code duplication
 - support for both legal forms of SYMLINK requests on RDMA
   transports for all versions of NFS (in particular, NFSv2, for
   completeness)

In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.

Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c         |   10 +++++++
 fs/nfsd/nfs3xdr.c          |   51 ++++++++-------------------------
 fs/nfsd/nfsproc.c          |   14 +++++----
 fs/nfsd/nfsxdr.c           |   49 +++++++++++++++++++-------------
 fs/nfsd/xdr.h              |    1 +
 fs/nfsd/xdr3.h             |    1 +
 fs/nfsd/xdr4.h             |    2 +
 include/linux/sunrpc/svc.h |    2 +
 net/sunrpc/svc.c           |   67 ++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 132 insertions(+), 65 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 2dd95eb..6259a4b 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -283,6 +283,16 @@
 	struct nfsd3_diropres *resp = rqstp->rq_resp;
 	__be32	nfserr;
 
+	if (argp->tlen == 0)
+		RETURN_STATUS(nfserr_inval);
+	if (argp->tlen > NFS3_MAXPATHLEN)
+		RETURN_STATUS(nfserr_nametoolong);
+
+	argp->tname = svc_fill_symlink_pathname(rqstp, &argp->first,
+						argp->tlen);
+	if (IS_ERR(argp->tname))
+		RETURN_STATUS(nfserrno(PTR_ERR(argp->tname)));
+
 	dprintk("nfsd: SYMLINK(3)  %s %.*s -> %.*s\n",
 				SVCFH_fmt(&argp->ffh),
 				argp->flen, argp->fname,
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index e19fc5d..3192b54 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -481,51 +481,24 @@ void fill_post_wcc(struct svc_fh *fhp)
 nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p)
 {
 	struct nfsd3_symlinkargs *args = rqstp->rq_argp;
-	unsigned int len, avail;
-	char *old, *new;
-	struct kvec *vec;
+	char *base = (char *)p;
+	size_t dlen;
 
 	if (!(p = decode_fh(p, &args->ffh)) ||
-	    !(p = decode_filename(p, &args->fname, &args->flen))
-		)
+	    !(p = decode_filename(p, &args->fname, &args->flen)))
 		return 0;
 	p = decode_sattr3(p, &args->attrs);
 
-	/* now decode the pathname, which might be larger than the first page.
-	 * As we have to check for nul's anyway, we copy it into a new page
-	 * This page appears in the rq_res.pages list, but as pages_len is always
-	 * 0, it won't get in the way
-	 */
-	len = ntohl(*p++);
-	if (len == 0 || len > NFS3_MAXPATHLEN || len >= PAGE_SIZE)
-		return 0;
-	args->tname = new = page_address(*(rqstp->rq_next_page++));
-	args->tlen = len;
-	/* first copy and check from the first page */
-	old = (char*)p;
-	vec = &rqstp->rq_arg.head[0];
-	if ((void *)old > vec->iov_base + vec->iov_len)
-		return 0;
-	avail = vec->iov_len - (old - (char*)vec->iov_base);
-	while (len && avail && *old) {
-		*new++ = *old++;
-		len--;
-		avail--;
-	}
-	/* now copy next page if there is one */
-	if (len && !avail && rqstp->rq_arg.page_len) {
-		avail = min_t(unsigned int, rqstp->rq_arg.page_len, PAGE_SIZE);
-		old = page_address(rqstp->rq_arg.pages[0]);
-	}
-	while (len && avail && *old) {
-		*new++ = *old++;
-		len--;
-		avail--;
-	}
-	*new = '\0';
-	if (len)
-		return 0;
+	args->tlen = ntohl(*p++);
 
+	args->first.iov_base = p;
+	args->first.iov_len = rqstp->rq_arg.head[0].iov_len;
+	args->first.iov_len -= (char *)p - base;
+
+	dlen = args->first.iov_len + rqstp->rq_arg.page_len +
+	       rqstp->rq_arg.tail[0].iov_len;
+	if (dlen < XDR_QUADLEN(args->tlen) << 2)
+		return 0;
 	return 1;
 }
 
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 1995ea6..f107f9f 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -449,17 +449,19 @@
 	struct svc_fh	newfh;
 	__be32		nfserr;
 
+	if (argp->tlen > NFS_MAXPATHLEN)
+		return nfserr_nametoolong;
+
+	argp->tname = svc_fill_symlink_pathname(rqstp, &argp->first,
+						argp->tlen);
+	if (IS_ERR(argp->tname))
+		return nfserrno(PTR_ERR(argp->tname));
+
 	dprintk("nfsd: SYMLINK  %s %.*s -> %.*s\n",
 		SVCFH_fmt(&argp->ffh), argp->flen, argp->fname,
 		argp->tlen, argp->tname);
 
 	fh_init(&newfh, NFS_FHSIZE);
-	/*
-	 * Crazy hack: the request fits in a page, and already-decoded
-	 * attributes follow argp->tname, so it's safe to just write a
-	 * null to ensure it's null-terminated:
-	 */
-	argp->tname[argp->tlen] = '\0';
 	nfserr = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen,
 						 argp->tname, &newfh);
 
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index db24ae8..a43e826 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -71,22 +71,6 @@ __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp)
 }
 
 static __be32 *
-decode_pathname(__be32 *p, char **namp, unsigned int *lenp)
-{
-	char		*name;
-	unsigned int	i;
-
-	if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS_MAXPATHLEN)) != NULL) {
-		for (i = 0, name = *namp; i < *lenp; i++, name++) {
-			if (*name == '\0')
-				return NULL;
-		}
-	}
-
-	return p;
-}
-
-static __be32 *
 decode_sattr(__be32 *p, struct iattr *iap)
 {
 	u32	tmp, tmp1;
@@ -384,14 +368,39 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f
 nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p)
 {
 	struct nfsd_symlinkargs *args = rqstp->rq_argp;
+	char *base = (char *)p;
+	size_t xdrlen;
 
 	if (   !(p = decode_fh(p, &args->ffh))
-	    || !(p = decode_filename(p, &args->fname, &args->flen))
-	    || !(p = decode_pathname(p, &args->tname, &args->tlen)))
+	    || !(p = decode_filename(p, &args->fname, &args->flen)))
 		return 0;
-	p = decode_sattr(p, &args->attrs);
 
-	return xdr_argsize_check(rqstp, p);
+	args->tlen = ntohl(*p++);
+	if (args->tlen == 0)
+		return 0;
+
+	args->first.iov_base = p;
+	args->first.iov_len = rqstp->rq_arg.head[0].iov_len;
+	args->first.iov_len -= (char *)p - base;
+
+	/* This request is never larger than a page. Therefore,
+	 * transport will deliver either:
+	 * 1. pathname in the pagelist -> sattr is in the tail.
+	 * 2. everything in the head buffer -> sattr is in the head.
+	 */
+	if (rqstp->rq_arg.page_len) {
+		if (args->tlen != rqstp->rq_arg.page_len)
+			return 0;
+		p = rqstp->rq_arg.tail[0].iov_base;
+	} else {
+		xdrlen = XDR_QUADLEN(args->tlen);
+		if (xdrlen > args->first.iov_len - (8 * sizeof(__be32)))
+			return 0;
+		p += xdrlen;
+	}
+	decode_sattr(p, &args->attrs);
+
+	return 1;
 }
 
 int
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h
index a765c41..ea7cca3 100644
--- a/fs/nfsd/xdr.h
+++ b/fs/nfsd/xdr.h
@@ -72,6 +72,7 @@ struct nfsd_symlinkargs {
 	char *			tname;
 	unsigned int		tlen;
 	struct iattr		attrs;
+	struct kvec		first;
 };
 
 struct nfsd_readdirargs {
diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h
index deccf7f..2cb29e9 100644
--- a/fs/nfsd/xdr3.h
+++ b/fs/nfsd/xdr3.h
@@ -90,6 +90,7 @@ struct nfsd3_symlinkargs {
 	char *			tname;
 	unsigned int		tlen;
 	struct iattr		attrs;
+	struct kvec		first;
 };
 
 struct nfsd3_readdirargs {
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index bc29511..fdab2a2 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -110,6 +110,7 @@ struct nfsd4_create {
 		struct {
 			u32 datalen;
 			char *data;
+			struct kvec first;
 		} link;   /* NF4LNK */
 		struct {
 			u32 specdata1;
@@ -124,6 +125,7 @@ struct nfsd4_create {
 };
 #define cr_datalen	u.link.datalen
 #define cr_data		u.link.data
+#define cr_first	u.link.first
 #define cr_specdata1	u.dev.specdata1
 #define cr_specdata2	u.dev.specdata2
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index fb3fcac..574368e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -497,6 +497,8 @@ int		   svc_register(const struct svc_serv *, struct net *, const int,
 char *		   svc_print_addr(struct svc_rqst *, char *, size_t);
 unsigned int	   svc_fill_write_vector(struct svc_rqst *rqstp,
 					 struct kvec *first, size_t total);
+char		  *svc_fill_symlink_pathname(struct svc_rqst *rqstp,
+					     struct kvec *first, size_t total);
 
 #define	RPC_MAX_ADDRBUFLEN	(63U)
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index a155e2d..30a4226 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1575,3 +1575,70 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct kvec *first,
 	return i;
 }
 EXPORT_SYMBOL_GPL(svc_fill_write_vector);
+
+/**
+ * svc_fill_symlink_pathname - Construct pathname argument for VFS symlink call
+ * @rqstp: svc_rqst to operate on
+ * @first: buffer containing first section of pathname
+ * @total: total length of the pathname argument
+ *
+ * Returns pointer to a NUL-terminated string, or an ERR_PTR. The buffer is
+ * released automatically when @rqstp is recycled.
+ */
+char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first,
+				size_t total)
+{
+	struct xdr_buf *arg = &rqstp->rq_arg;
+	struct page **pages;
+	char *result;
+
+	/* VFS API demands a NUL-terminated pathname. This function
+	 * uses a page from @rqstp as the pathname buffer, to enable
+	 * direct placement. Thus the total buffer size is PAGE_SIZE.
+	 * Space in this buffer for NUL-termination requires that we
+	 * cap the size of the returned symlink pathname just a
+	 * little early.
+	 */
+	if (total > PAGE_SIZE - 1)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	/* Some types of transport can present the pathname entirely
+	 * in rq_arg.pages. If not, then copy the pathname into one
+	 * page.
+	 */
+	pages = arg->pages;
+	WARN_ON_ONCE(arg->page_base != 0);
+	if (first->iov_base == 0) {
+		result = page_address(*pages);
+		result[total] = '\0';
+	} else {
+		size_t len, remaining;
+		char *dst;
+
+		result = page_address(*(rqstp->rq_next_page++));
+		dst = result;
+		remaining = total;
+
+		len = min_t(size_t, total, first->iov_len);
+		memcpy(dst, first->iov_base, len);
+		dst += len;
+		remaining -= len;
+
+		/* No more than one page left */
+		if (remaining) {
+			len = min_t(size_t, remaining, PAGE_SIZE);
+			memcpy(dst, page_address(*pages), len);
+			dst += len;
+		}
+
+		*dst = '\0';
+	}
+
+	/* Sanity check: we don't allow the pathname argument to
+	 * contain a NUL byte.
+	 */
+	if (strlen(result) != total)
+		return ERR_PTR(-EINVAL);
+	return result;
+}
+EXPORT_SYMBOL_GPL(svc_fill_symlink_pathname);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc
  2018-03-27 14:53 ` [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc Chuck Lever
@ 2018-03-27 16:57   ` Chuck Lever
  2018-03-27 20:14     ` Bruce Fields
  0 siblings, 1 reply; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 16:57 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Linux NFS Mailing List



> On Mar 27, 2018, at 10:53 AM, Chuck Lever <chuck.lever@oracle.com> =
wrote:
>=20
> NFSv4 read compound processing invokes nfsd_splice_read and
> nfs_readv directly, so the trace points currently in nfsd_read are
> not effective for NFSv4 reads.
>=20
> Move and copy the trace points so that NFSv4 reads are captured.
> Also, we want to record any local I/O error that occurs, and
> the total count of bytes that were actually moved. And, also
> whether splice or vectored read was used.
>=20
> The svc_fh is not passed to the read helpers, so some code
> duplication is necessary.
>=20
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/nfsd/nfs4proc.c |    5 +++++
> fs/nfsd/nfs4xdr.c  |   56 =
+++++++++++++++++++++++++++++++++++++---------------
> fs/nfsd/trace.h    |    4 +++-
> fs/nfsd/vfs.c      |   54 =
++++++++++++++++++++++----------------------------
> fs/nfsd/vfs.h      |    8 ++++---
> 5 files changed, 76 insertions(+), 51 deletions(-)
>=20
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index b93673e..39016b6 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -751,6 +751,9 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst =
*rqstp, struct svc_fh *fh)
> 	if (read->rd_offset >=3D OFFSET_MAX)
> 		return nfserr_inval;
>=20
> +	trace_nfsd_read_start(rqstp, &cstate->current_fh,
> +			      read->rd_offset, read->rd_length);
> +
> 	/*
> 	 * If we do a zero copy read, then a client will see read data
> 	 * that reflects the state of the file *after* performing the
> @@ -783,6 +786,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst =
*rqstp, struct svc_fh *fh)
> {
> 	if (u->read.rd_filp)
> 		fput(u->read.rd_filp);
> +	trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> +			     u->read.rd_offset, u->read.rd_length);
> }
>=20
> static __be32
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index e502fd1..d03059a 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -37,6 +37,7 @@
> #include <linux/file.h>
> #include <linux/slab.h>
> #include <linux/namei.h>
> +#include <linux/fsnotify.h>
> #include <linux/statfs.h>
> #include <linux/utsname.h>
> #include <linux/pagemap.h>
> @@ -50,6 +51,7 @@
> #include "cache.h"
> #include "netns.h"
> #include "pnfs.h"
> +#include "trace.h"
>=20
> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> #include <linux/security.h>
> @@ -3416,28 +3418,28 @@ static __be32 nfsd4_encode_splice_read(
> {
> 	struct xdr_stream *xdr =3D &resp->xdr;
> 	struct xdr_buf *buf =3D xdr->buf;
> +	int host_err;
> 	u32 eof;
> 	long len;
> 	int space_left;
> -	__be32 nfserr;
> 	__be32 *p =3D xdr->p - 2;
>=20
> 	/* Make sure there will be room for padding if needed */
> 	if (xdr->end - xdr->p < 1)
> 		return nfserr_resource;
>=20
> +	trace_nfsd_read_splice(resp->rqstp, read->rd_fhp,
> +			       read->rd_offset, maxcount);
> 	len =3D maxcount;
> -	nfserr =3D nfsd_splice_read(read->rd_rqstp, file,
> +	host_err =3D nfsd_splice_read(read->rd_rqstp, file,
> 				  read->rd_offset, &maxcount);
> -	if (nfserr) {
> -		/*
> -		 * nfsd_splice_actor may have already messed with the
> -		 * page length; reset it so as not to confuse
> -		 * xdr_truncate_encode:
> -		 */
> -		buf->page_len =3D 0;
> -		return nfserr;
> -	}
> +	if (host_err < 0)
> +		goto err;
> +	trace_nfsd_read_io_done(read->rd_rqstp, read->rd_fhp,
> +				read->rd_offset, maxcount);
> +	maxcount =3D host_err;
> +	nfsdstats.io_read +=3D maxcount;
> +	fsnotify_access(file);
>=20
> 	eof =3D nfsd_eof_on_read(len, maxcount, read->rd_offset,
> 				=
d_inode(read->rd_fhp->fh_dentry)->i_size);
> @@ -3470,6 +3472,17 @@ static __be32 nfsd4_encode_splice_read(
> 	xdr->end =3D (__be32 *)((void *)xdr->end + space_left);
>=20
> 	return 0;
> +
> +err:
> +	/*
> +	 * nfsd_splice_actor may have already messed with the
> +	 * page length; reset it so as not to confuse
> +	 * xdr_truncate_encode:
> +	 */
> +	buf->page_len =3D 0;
> +	trace_nfsd_read_err(read->rd_rqstp, read->rd_fhp,
> +			    read->rd_offset, host_err);
> +	return nfserrno(host_err);
> }
>=20
> static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> @@ -3477,12 +3490,12 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
> 				 struct file *file, unsigned long =
maxcount)
> {
> 	struct xdr_stream *xdr =3D &resp->xdr;
> +	int host_err;
> 	u32 eof;
> 	int v;
> 	int starting_len =3D xdr->buf->len - 8;
> 	long len;
> 	int thislen;
> -	__be32 nfserr;
> 	__be32 tmp;
> 	__be32 *p;
> 	u32 zzz =3D 0;
> @@ -3510,11 +3523,18 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
> 	}
> 	read->rd_vlen =3D v;
>=20
> +	trace_nfsd_read_vector(resp->rqstp, read->rd_fhp,
> +			       read->rd_offset, maxcount);
> 	len =3D maxcount;
> -	nfserr =3D nfsd_readv(file, read->rd_offset, =
resp->rqstp->rq_vec,
> -			read->rd_vlen, &maxcount);
> -	if (nfserr)
> -		return nfserr;
> +	host_err =3D nfsd_readv(file, read->rd_offset, =
resp->rqstp->rq_vec,
> +			      read->rd_vlen, &maxcount);
> +	if (host_err < 0)
> +		goto err;
> +	trace_nfsd_read_io_done(resp->rqstp, read->rd_fhp,
> +				read->rd_offset, maxcount);

I missed a spot.

+	maxcount =3D host_err;

> +	nfsdstats.io_read +=3D maxcount;
> +	fsnotify_access(file);
> +
> 	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
>=20
> 	eof =3D nfsd_eof_on_read(len, maxcount, read->rd_offset,
> @@ -3530,6 +3550,10 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
> 								&zzz, =
pad);
> 	return 0;
>=20
> +err:
> +	trace_nfsd_read_err(resp->rqstp, read->rd_fhp,
> +			    read->rd_offset, host_err);
> +	return nfserrno(host_err);
> }
>=20
> static __be32
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 653e9ee..a8bbd9d 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -43,7 +43,8 @@
> 	TP_ARGS(rqstp, fhp, offset, len))
>=20
> DEFINE_NFSD_IO_EVENT(read_start);
> -DEFINE_NFSD_IO_EVENT(read_opened);
> +DEFINE_NFSD_IO_EVENT(read_splice);
> +DEFINE_NFSD_IO_EVENT(read_vector);
> DEFINE_NFSD_IO_EVENT(read_io_done);
> DEFINE_NFSD_IO_EVENT(read_done);
> DEFINE_NFSD_IO_EVENT(write_start);
> @@ -82,6 +83,7 @@
> 		 int		len),		\
> 	TP_ARGS(rqstp, fhp, offset, len))
>=20
> +DEFINE_NFSD_ERR_EVENT(read_err);
> DEFINE_NFSD_ERR_EVENT(write_err);
>=20
> #include "state.h"
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index ee59a0b..0fa54b5 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -881,19 +881,7 @@ static int nfsd_direct_splice_actor(struct =
pipe_inode_info *pipe,
> 	return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
> }
>=20
> -static __be32
> -nfsd_finish_read(struct file *file, unsigned long *count, int =
host_err)
> -{
> -	if (host_err >=3D 0) {
> -		nfsdstats.io_read +=3D host_err;
> -		*count =3D host_err;
> -		fsnotify_access(file);
> -		return 0;
> -	} else=20
> -		return nfserrno(host_err);
> -}
> -
> -__be32 nfsd_splice_read(struct svc_rqst *rqstp,
> +int nfsd_splice_read(struct svc_rqst *rqstp,
> 		     struct file *file, loff_t offset, unsigned long =
*count)
> {
> 	struct splice_desc sd =3D {
> @@ -902,23 +890,18 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp,
> 		.pos		=3D offset,
> 		.u.data		=3D rqstp,
> 	};
> -	int host_err;
>=20
> 	rqstp->rq_next_page =3D rqstp->rq_respages + 1;
> -	host_err =3D splice_direct_to_actor(file, &sd, =
nfsd_direct_splice_actor);
> -	return nfsd_finish_read(file, count, host_err);
> +	return splice_direct_to_actor(file, &sd, =
nfsd_direct_splice_actor);
> }
>=20
> -__be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, =
int vlen,
> -		unsigned long *count)
> +int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, =
int vlen,
> +	       unsigned long *count)
> {
> 	struct iov_iter iter;
> -	int host_err;
>=20
> 	iov_iter_kvec(&iter, READ | ITER_KVEC, vec, vlen, *count);
> -	host_err =3D vfs_iter_read(file, &iter, &offset, 0);
> -
> -	return nfsd_finish_read(file, count, host_err);
> +	return vfs_iter_read(file, &iter, &offset, 0);
> }
>=20
> /*
> @@ -1025,6 +1008,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct =
svc_fh *fhp,
> {
> 	struct file *file;
> 	struct raparms	*ra;
> +	int host_err;
> 	__be32 err;
>=20
> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> @@ -1034,14 +1018,24 @@ __be32 nfsd_read(struct svc_rqst *rqstp, =
struct svc_fh *fhp,
>=20
> 	ra =3D nfsd_init_raparms(file);
>=20
> -	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
> -
> -	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, =
&rqstp->rq_flags))
> -		err =3D nfsd_splice_read(rqstp, file, offset, count);
> -	else
> -		err =3D nfsd_readv(file, offset, vec, vlen, count);
> -
> -	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
> +	if (file->f_op->splice_read &&
> +	    test_bit(RQ_SPLICE_OK, &rqstp->rq_flags)) {
> +		trace_nfsd_read_splice(rqstp, fhp, offset, *count);
> +		host_err =3D nfsd_splice_read(rqstp, file, offset, =
count);
> +	} else {
> +		trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> +		host_err =3D nfsd_readv(file, offset, vec, vlen, count);
> +	}
> +	if (host_err >=3D 0) {
> +		trace_nfsd_read_io_done(rqstp, fhp, offset, host_err);
> +		nfsdstats.io_read +=3D host_err;
> +		*count =3D host_err;
> +		fsnotify_access(file);
> +		err =3D nfs_ok;
> +	} else  {
> +		trace_nfsd_read_err(rqstp, fhp, offset, host_err);
> +		err =3D nfserrno(host_err);
> +	}
>=20
> 	if (ra)
> 		nfsd_put_raparams(file, ra);
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index be6d8e0..d9131c3 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -78,10 +78,10 @@ __be32		nfsd_commit(struct svc_rqst *, =
struct svc_fh *,
> __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, =
umode_t,
> 				int, struct file **);
> struct raparms;
> -__be32		nfsd_splice_read(struct svc_rqst *,
> -				struct file *, loff_t, unsigned long *);
> -__be32		nfsd_readv(struct file *, loff_t, struct kvec *, =
int,
> -				unsigned long *);
> +int		nfsd_splice_read(struct svc_rqst *rqstp, struct file =
*file,
> +				loff_t offset, unsigned long *count);
> +int		nfsd_readv(struct file *file, loff_t offset, struct kvec =
*vec,
> +				int vlen, unsigned long *count);
> __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
> 				loff_t, struct kvec *, int, unsigned =
long *);
> __be32 		nfsd_write(struct svc_rqst *, struct svc_fh *, =
loff_t,
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc
  2018-03-27 16:57   ` Chuck Lever
@ 2018-03-27 20:14     ` Bruce Fields
  2018-03-27 21:22       ` Chuck Lever
  0 siblings, 1 reply; 23+ messages in thread
From: Bruce Fields @ 2018-03-27 20:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

Could you check that I got this right in

	git://linux-nfs.org/~bfields/linux.git nfsd-next

--b.

On Tue, Mar 27, 2018 at 12:57:17PM -0400, Chuck Lever wrote:
> 
> 
> > On Mar 27, 2018, at 10:53 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > NFSv4 read compound processing invokes nfsd_splice_read and
> > nfs_readv directly, so the trace points currently in nfsd_read are
> > not effective for NFSv4 reads.
> > 
> > Move and copy the trace points so that NFSv4 reads are captured.
> > Also, we want to record any local I/O error that occurs, and
> > the total count of bytes that were actually moved. And, also
> > whether splice or vectored read was used.
> > 
> > The svc_fh is not passed to the read helpers, so some code
> > duplication is necessary.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> > fs/nfsd/nfs4proc.c |    5 +++++
> > fs/nfsd/nfs4xdr.c  |   56 +++++++++++++++++++++++++++++++++++++---------------
> > fs/nfsd/trace.h    |    4 +++-
> > fs/nfsd/vfs.c      |   54 ++++++++++++++++++++++----------------------------
> > fs/nfsd/vfs.h      |    8 ++++---
> > 5 files changed, 76 insertions(+), 51 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index b93673e..39016b6 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -751,6 +751,9 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> > 	if (read->rd_offset >= OFFSET_MAX)
> > 		return nfserr_inval;
> > 
> > +	trace_nfsd_read_start(rqstp, &cstate->current_fh,
> > +			      read->rd_offset, read->rd_length);
> > +
> > 	/*
> > 	 * If we do a zero copy read, then a client will see read data
> > 	 * that reflects the state of the file *after* performing the
> > @@ -783,6 +786,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> > {
> > 	if (u->read.rd_filp)
> > 		fput(u->read.rd_filp);
> > +	trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> > +			     u->read.rd_offset, u->read.rd_length);
> > }
> > 
> > static __be32
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index e502fd1..d03059a 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -37,6 +37,7 @@
> > #include <linux/file.h>
> > #include <linux/slab.h>
> > #include <linux/namei.h>
> > +#include <linux/fsnotify.h>
> > #include <linux/statfs.h>
> > #include <linux/utsname.h>
> > #include <linux/pagemap.h>
> > @@ -50,6 +51,7 @@
> > #include "cache.h"
> > #include "netns.h"
> > #include "pnfs.h"
> > +#include "trace.h"
> > 
> > #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> > #include <linux/security.h>
> > @@ -3416,28 +3418,28 @@ static __be32 nfsd4_encode_splice_read(
> > {
> > 	struct xdr_stream *xdr = &resp->xdr;
> > 	struct xdr_buf *buf = xdr->buf;
> > +	int host_err;
> > 	u32 eof;
> > 	long len;
> > 	int space_left;
> > -	__be32 nfserr;
> > 	__be32 *p = xdr->p - 2;
> > 
> > 	/* Make sure there will be room for padding if needed */
> > 	if (xdr->end - xdr->p < 1)
> > 		return nfserr_resource;
> > 
> > +	trace_nfsd_read_splice(resp->rqstp, read->rd_fhp,
> > +			       read->rd_offset, maxcount);
> > 	len = maxcount;
> > -	nfserr = nfsd_splice_read(read->rd_rqstp, file,
> > +	host_err = nfsd_splice_read(read->rd_rqstp, file,
> > 				  read->rd_offset, &maxcount);
> > -	if (nfserr) {
> > -		/*
> > -		 * nfsd_splice_actor may have already messed with the
> > -		 * page length; reset it so as not to confuse
> > -		 * xdr_truncate_encode:
> > -		 */
> > -		buf->page_len = 0;
> > -		return nfserr;
> > -	}
> > +	if (host_err < 0)
> > +		goto err;
> > +	trace_nfsd_read_io_done(read->rd_rqstp, read->rd_fhp,
> > +				read->rd_offset, maxcount);
> > +	maxcount = host_err;
> > +	nfsdstats.io_read += maxcount;
> > +	fsnotify_access(file);
> > 
> > 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
> > 				d_inode(read->rd_fhp->fh_dentry)->i_size);
> > @@ -3470,6 +3472,17 @@ static __be32 nfsd4_encode_splice_read(
> > 	xdr->end = (__be32 *)((void *)xdr->end + space_left);
> > 
> > 	return 0;
> > +
> > +err:
> > +	/*
> > +	 * nfsd_splice_actor may have already messed with the
> > +	 * page length; reset it so as not to confuse
> > +	 * xdr_truncate_encode:
> > +	 */
> > +	buf->page_len = 0;
> > +	trace_nfsd_read_err(read->rd_rqstp, read->rd_fhp,
> > +			    read->rd_offset, host_err);
> > +	return nfserrno(host_err);
> > }
> > 
> > static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > @@ -3477,12 +3490,12 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > 				 struct file *file, unsigned long maxcount)
> > {
> > 	struct xdr_stream *xdr = &resp->xdr;
> > +	int host_err;
> > 	u32 eof;
> > 	int v;
> > 	int starting_len = xdr->buf->len - 8;
> > 	long len;
> > 	int thislen;
> > -	__be32 nfserr;
> > 	__be32 tmp;
> > 	__be32 *p;
> > 	u32 zzz = 0;
> > @@ -3510,11 +3523,18 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > 	}
> > 	read->rd_vlen = v;
> > 
> > +	trace_nfsd_read_vector(resp->rqstp, read->rd_fhp,
> > +			       read->rd_offset, maxcount);
> > 	len = maxcount;
> > -	nfserr = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
> > -			read->rd_vlen, &maxcount);
> > -	if (nfserr)
> > -		return nfserr;
> > +	host_err = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
> > +			      read->rd_vlen, &maxcount);
> > +	if (host_err < 0)
> > +		goto err;
> > +	trace_nfsd_read_io_done(resp->rqstp, read->rd_fhp,
> > +				read->rd_offset, maxcount);
> 
> I missed a spot.
> 
> +	maxcount = host_err;
> 
> > +	nfsdstats.io_read += maxcount;
> > +	fsnotify_access(file);
> > +
> > 	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> > 
> > 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
> > @@ -3530,6 +3550,10 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > 								&zzz, pad);
> > 	return 0;
> > 
> > +err:
> > +	trace_nfsd_read_err(resp->rqstp, read->rd_fhp,
> > +			    read->rd_offset, host_err);
> > +	return nfserrno(host_err);
> > }
> > 
> > static __be32
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index 653e9ee..a8bbd9d 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -43,7 +43,8 @@
> > 	TP_ARGS(rqstp, fhp, offset, len))
> > 
> > DEFINE_NFSD_IO_EVENT(read_start);
> > -DEFINE_NFSD_IO_EVENT(read_opened);
> > +DEFINE_NFSD_IO_EVENT(read_splice);
> > +DEFINE_NFSD_IO_EVENT(read_vector);
> > DEFINE_NFSD_IO_EVENT(read_io_done);
> > DEFINE_NFSD_IO_EVENT(read_done);
> > DEFINE_NFSD_IO_EVENT(write_start);
> > @@ -82,6 +83,7 @@
> > 		 int		len),		\
> > 	TP_ARGS(rqstp, fhp, offset, len))
> > 
> > +DEFINE_NFSD_ERR_EVENT(read_err);
> > DEFINE_NFSD_ERR_EVENT(write_err);
> > 
> > #include "state.h"
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index ee59a0b..0fa54b5 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -881,19 +881,7 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> > 	return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
> > }
> > 
> > -static __be32
> > -nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
> > -{
> > -	if (host_err >= 0) {
> > -		nfsdstats.io_read += host_err;
> > -		*count = host_err;
> > -		fsnotify_access(file);
> > -		return 0;
> > -	} else 
> > -		return nfserrno(host_err);
> > -}
> > -
> > -__be32 nfsd_splice_read(struct svc_rqst *rqstp,
> > +int nfsd_splice_read(struct svc_rqst *rqstp,
> > 		     struct file *file, loff_t offset, unsigned long *count)
> > {
> > 	struct splice_desc sd = {
> > @@ -902,23 +890,18 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp,
> > 		.pos		= offset,
> > 		.u.data		= rqstp,
> > 	};
> > -	int host_err;
> > 
> > 	rqstp->rq_next_page = rqstp->rq_respages + 1;
> > -	host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
> > -	return nfsd_finish_read(file, count, host_err);
> > +	return splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
> > }
> > 
> > -__be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
> > -		unsigned long *count)
> > +int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
> > +	       unsigned long *count)
> > {
> > 	struct iov_iter iter;
> > -	int host_err;
> > 
> > 	iov_iter_kvec(&iter, READ | ITER_KVEC, vec, vlen, *count);
> > -	host_err = vfs_iter_read(file, &iter, &offset, 0);
> > -
> > -	return nfsd_finish_read(file, count, host_err);
> > +	return vfs_iter_read(file, &iter, &offset, 0);
> > }
> > 
> > /*
> > @@ -1025,6 +1008,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > {
> > 	struct file *file;
> > 	struct raparms	*ra;
> > +	int host_err;
> > 	__be32 err;
> > 
> > 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> > @@ -1034,14 +1018,24 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > 
> > 	ra = nfsd_init_raparms(file);
> > 
> > -	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
> > -
> > -	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
> > -		err = nfsd_splice_read(rqstp, file, offset, count);
> > -	else
> > -		err = nfsd_readv(file, offset, vec, vlen, count);
> > -
> > -	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
> > +	if (file->f_op->splice_read &&
> > +	    test_bit(RQ_SPLICE_OK, &rqstp->rq_flags)) {
> > +		trace_nfsd_read_splice(rqstp, fhp, offset, *count);
> > +		host_err = nfsd_splice_read(rqstp, file, offset, count);
> > +	} else {
> > +		trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> > +		host_err = nfsd_readv(file, offset, vec, vlen, count);
> > +	}
> > +	if (host_err >= 0) {
> > +		trace_nfsd_read_io_done(rqstp, fhp, offset, host_err);
> > +		nfsdstats.io_read += host_err;
> > +		*count = host_err;
> > +		fsnotify_access(file);
> > +		err = nfs_ok;
> > +	} else  {
> > +		trace_nfsd_read_err(rqstp, fhp, offset, host_err);
> > +		err = nfserrno(host_err);
> > +	}
> > 
> > 	if (ra)
> > 		nfsd_put_raparams(file, ra);
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index be6d8e0..d9131c3 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -78,10 +78,10 @@ __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
> > __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
> > 				int, struct file **);
> > struct raparms;
> > -__be32		nfsd_splice_read(struct svc_rqst *,
> > -				struct file *, loff_t, unsigned long *);
> > -__be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
> > -				unsigned long *);
> > +int		nfsd_splice_read(struct svc_rqst *rqstp, struct file *file,
> > +				loff_t offset, unsigned long *count);
> > +int		nfsd_readv(struct file *file, loff_t offset, struct kvec *vec,
> > +				int vlen, unsigned long *count);
> > __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
> > 				loff_t, struct kvec *, int, unsigned long *);
> > __be32 		nfsd_write(struct svc_rqst *, struct svc_fh *, loff_t,
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc
  2018-03-27 20:14     ` Bruce Fields
@ 2018-03-27 21:22       ` Chuck Lever
  2018-03-27 21:51         ` Bruce Fields
  0 siblings, 1 reply; 23+ messages in thread
From: Chuck Lever @ 2018-03-27 21:22 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Linux NFS Mailing List



> On Mar 27, 2018, at 4:14 PM, Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> Could you check that I got this right in
>=20
> 	git://linux-nfs.org/~bfields/linux.git nfsd-next

You changed it correctly.

However, the patch description says "Also, we want to record any local =
I/O error that occurs, and the total count of bytes that were actually =
moved."

Therefore I think we want to pass host_err rather than maxcount to =
trace_nfsd_read_io_done, in both NFSv4 read helpers.

Do you want me to resend with these fixes?


> --b.
>=20
> On Tue, Mar 27, 2018 at 12:57:17PM -0400, Chuck Lever wrote:
>>=20
>>=20
>>> On Mar 27, 2018, at 10:53 AM, Chuck Lever <chuck.lever@oracle.com> =
wrote:
>>>=20
>>> NFSv4 read compound processing invokes nfsd_splice_read and
>>> nfs_readv directly, so the trace points currently in nfsd_read are
>>> not effective for NFSv4 reads.
>>>=20
>>> Move and copy the trace points so that NFSv4 reads are captured.
>>> Also, we want to record any local I/O error that occurs, and
>>> the total count of bytes that were actually moved. And, also
>>> whether splice or vectored read was used.
>>>=20
>>> The svc_fh is not passed to the read helpers, so some code
>>> duplication is necessary.
>>>=20
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> fs/nfsd/nfs4proc.c |    5 +++++
>>> fs/nfsd/nfs4xdr.c  |   56 =
+++++++++++++++++++++++++++++++++++++---------------
>>> fs/nfsd/trace.h    |    4 +++-
>>> fs/nfsd/vfs.c      |   54 =
++++++++++++++++++++++----------------------------
>>> fs/nfsd/vfs.h      |    8 ++++---
>>> 5 files changed, 76 insertions(+), 51 deletions(-)
>>>=20
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index b93673e..39016b6 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -751,6 +751,9 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst =
*rqstp, struct svc_fh *fh)
>>> 	if (read->rd_offset >=3D OFFSET_MAX)
>>> 		return nfserr_inval;
>>>=20
>>> +	trace_nfsd_read_start(rqstp, &cstate->current_fh,
>>> +			      read->rd_offset, read->rd_length);
>>> +
>>> 	/*
>>> 	 * If we do a zero copy read, then a client will see read data
>>> 	 * that reflects the state of the file *after* performing the
>>> @@ -783,6 +786,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst =
*rqstp, struct svc_fh *fh)
>>> {
>>> 	if (u->read.rd_filp)
>>> 		fput(u->read.rd_filp);
>>> +	trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
>>> +			     u->read.rd_offset, u->read.rd_length);
>>> }
>>>=20
>>> static __be32
>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>> index e502fd1..d03059a 100644
>>> --- a/fs/nfsd/nfs4xdr.c
>>> +++ b/fs/nfsd/nfs4xdr.c
>>> @@ -37,6 +37,7 @@
>>> #include <linux/file.h>
>>> #include <linux/slab.h>
>>> #include <linux/namei.h>
>>> +#include <linux/fsnotify.h>
>>> #include <linux/statfs.h>
>>> #include <linux/utsname.h>
>>> #include <linux/pagemap.h>
>>> @@ -50,6 +51,7 @@
>>> #include "cache.h"
>>> #include "netns.h"
>>> #include "pnfs.h"
>>> +#include "trace.h"
>>>=20
>>> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
>>> #include <linux/security.h>
>>> @@ -3416,28 +3418,28 @@ static __be32 nfsd4_encode_splice_read(
>>> {
>>> 	struct xdr_stream *xdr =3D &resp->xdr;
>>> 	struct xdr_buf *buf =3D xdr->buf;
>>> +	int host_err;
>>> 	u32 eof;
>>> 	long len;
>>> 	int space_left;
>>> -	__be32 nfserr;
>>> 	__be32 *p =3D xdr->p - 2;
>>>=20
>>> 	/* Make sure there will be room for padding if needed */
>>> 	if (xdr->end - xdr->p < 1)
>>> 		return nfserr_resource;
>>>=20
>>> +	trace_nfsd_read_splice(resp->rqstp, read->rd_fhp,
>>> +			       read->rd_offset, maxcount);
>>> 	len =3D maxcount;
>>> -	nfserr =3D nfsd_splice_read(read->rd_rqstp, file,
>>> +	host_err =3D nfsd_splice_read(read->rd_rqstp, file,
>>> 				  read->rd_offset, &maxcount);
>>> -	if (nfserr) {
>>> -		/*
>>> -		 * nfsd_splice_actor may have already messed with the
>>> -		 * page length; reset it so as not to confuse
>>> -		 * xdr_truncate_encode:
>>> -		 */
>>> -		buf->page_len =3D 0;
>>> -		return nfserr;
>>> -	}
>>> +	if (host_err < 0)
>>> +		goto err;
>>> +	trace_nfsd_read_io_done(read->rd_rqstp, read->rd_fhp,
>>> +				read->rd_offset, maxcount);
>>> +	maxcount =3D host_err;
>>> +	nfsdstats.io_read +=3D maxcount;
>>> +	fsnotify_access(file);
>>>=20
>>> 	eof =3D nfsd_eof_on_read(len, maxcount, read->rd_offset,
>>> 				=
d_inode(read->rd_fhp->fh_dentry)->i_size);
>>> @@ -3470,6 +3472,17 @@ static __be32 nfsd4_encode_splice_read(
>>> 	xdr->end =3D (__be32 *)((void *)xdr->end + space_left);
>>>=20
>>> 	return 0;
>>> +
>>> +err:
>>> +	/*
>>> +	 * nfsd_splice_actor may have already messed with the
>>> +	 * page length; reset it so as not to confuse
>>> +	 * xdr_truncate_encode:
>>> +	 */
>>> +	buf->page_len =3D 0;
>>> +	trace_nfsd_read_err(read->rd_rqstp, read->rd_fhp,
>>> +			    read->rd_offset, host_err);
>>> +	return nfserrno(host_err);
>>> }
>>>=20
>>> static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
>>> @@ -3477,12 +3490,12 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
>>> 				 struct file *file, unsigned long =
maxcount)
>>> {
>>> 	struct xdr_stream *xdr =3D &resp->xdr;
>>> +	int host_err;
>>> 	u32 eof;
>>> 	int v;
>>> 	int starting_len =3D xdr->buf->len - 8;
>>> 	long len;
>>> 	int thislen;
>>> -	__be32 nfserr;
>>> 	__be32 tmp;
>>> 	__be32 *p;
>>> 	u32 zzz =3D 0;
>>> @@ -3510,11 +3523,18 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
>>> 	}
>>> 	read->rd_vlen =3D v;
>>>=20
>>> +	trace_nfsd_read_vector(resp->rqstp, read->rd_fhp,
>>> +			       read->rd_offset, maxcount);
>>> 	len =3D maxcount;
>>> -	nfserr =3D nfsd_readv(file, read->rd_offset, =
resp->rqstp->rq_vec,
>>> -			read->rd_vlen, &maxcount);
>>> -	if (nfserr)
>>> -		return nfserr;
>>> +	host_err =3D nfsd_readv(file, read->rd_offset, =
resp->rqstp->rq_vec,
>>> +			      read->rd_vlen, &maxcount);
>>> +	if (host_err < 0)
>>> +		goto err;
>>> +	trace_nfsd_read_io_done(resp->rqstp, read->rd_fhp,
>>> +				read->rd_offset, maxcount);
>>=20
>> I missed a spot.
>>=20
>> +	maxcount =3D host_err;
>>=20
>>> +	nfsdstats.io_read +=3D maxcount;
>>> +	fsnotify_access(file);
>>> +
>>> 	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
>>>=20
>>> 	eof =3D nfsd_eof_on_read(len, maxcount, read->rd_offset,
>>> @@ -3530,6 +3550,10 @@ static __be32 nfsd4_encode_readv(struct =
nfsd4_compoundres *resp,
>>> 								&zzz, =
pad);
>>> 	return 0;
>>>=20
>>> +err:
>>> +	trace_nfsd_read_err(resp->rqstp, read->rd_fhp,
>>> +			    read->rd_offset, host_err);
>>> +	return nfserrno(host_err);
>>> }
>>>=20
>>> static __be32
>>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
>>> index 653e9ee..a8bbd9d 100644
>>> --- a/fs/nfsd/trace.h
>>> +++ b/fs/nfsd/trace.h
>>> @@ -43,7 +43,8 @@
>>> 	TP_ARGS(rqstp, fhp, offset, len))
>>>=20
>>> DEFINE_NFSD_IO_EVENT(read_start);
>>> -DEFINE_NFSD_IO_EVENT(read_opened);
>>> +DEFINE_NFSD_IO_EVENT(read_splice);
>>> +DEFINE_NFSD_IO_EVENT(read_vector);
>>> DEFINE_NFSD_IO_EVENT(read_io_done);
>>> DEFINE_NFSD_IO_EVENT(read_done);
>>> DEFINE_NFSD_IO_EVENT(write_start);
>>> @@ -82,6 +83,7 @@
>>> 		 int		len),		\
>>> 	TP_ARGS(rqstp, fhp, offset, len))
>>>=20
>>> +DEFINE_NFSD_ERR_EVENT(read_err);
>>> DEFINE_NFSD_ERR_EVENT(write_err);
>>>=20
>>> #include "state.h"
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index ee59a0b..0fa54b5 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -881,19 +881,7 @@ static int nfsd_direct_splice_actor(struct =
pipe_inode_info *pipe,
>>> 	return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
>>> }
>>>=20
>>> -static __be32
>>> -nfsd_finish_read(struct file *file, unsigned long *count, int =
host_err)
>>> -{
>>> -	if (host_err >=3D 0) {
>>> -		nfsdstats.io_read +=3D host_err;
>>> -		*count =3D host_err;
>>> -		fsnotify_access(file);
>>> -		return 0;
>>> -	} else=20
>>> -		return nfserrno(host_err);
>>> -}
>>> -
>>> -__be32 nfsd_splice_read(struct svc_rqst *rqstp,
>>> +int nfsd_splice_read(struct svc_rqst *rqstp,
>>> 		     struct file *file, loff_t offset, unsigned long =
*count)
>>> {
>>> 	struct splice_desc sd =3D {
>>> @@ -902,23 +890,18 @@ __be32 nfsd_splice_read(struct svc_rqst =
*rqstp,
>>> 		.pos		=3D offset,
>>> 		.u.data		=3D rqstp,
>>> 	};
>>> -	int host_err;
>>>=20
>>> 	rqstp->rq_next_page =3D rqstp->rq_respages + 1;
>>> -	host_err =3D splice_direct_to_actor(file, &sd, =
nfsd_direct_splice_actor);
>>> -	return nfsd_finish_read(file, count, host_err);
>>> +	return splice_direct_to_actor(file, &sd, =
nfsd_direct_splice_actor);
>>> }
>>>=20
>>> -__be32 nfsd_readv(struct file *file, loff_t offset, struct kvec =
*vec, int vlen,
>>> -		unsigned long *count)
>>> +int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, =
int vlen,
>>> +	       unsigned long *count)
>>> {
>>> 	struct iov_iter iter;
>>> -	int host_err;
>>>=20
>>> 	iov_iter_kvec(&iter, READ | ITER_KVEC, vec, vlen, *count);
>>> -	host_err =3D vfs_iter_read(file, &iter, &offset, 0);
>>> -
>>> -	return nfsd_finish_read(file, count, host_err);
>>> +	return vfs_iter_read(file, &iter, &offset, 0);
>>> }
>>>=20
>>> /*
>>> @@ -1025,6 +1008,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, =
struct svc_fh *fhp,
>>> {
>>> 	struct file *file;
>>> 	struct raparms	*ra;
>>> +	int host_err;
>>> 	__be32 err;
>>>=20
>>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>> @@ -1034,14 +1018,24 @@ __be32 nfsd_read(struct svc_rqst *rqstp, =
struct svc_fh *fhp,
>>>=20
>>> 	ra =3D nfsd_init_raparms(file);
>>>=20
>>> -	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
>>> -
>>> -	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, =
&rqstp->rq_flags))
>>> -		err =3D nfsd_splice_read(rqstp, file, offset, count);
>>> -	else
>>> -		err =3D nfsd_readv(file, offset, vec, vlen, count);
>>> -
>>> -	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
>>> +	if (file->f_op->splice_read &&
>>> +	    test_bit(RQ_SPLICE_OK, &rqstp->rq_flags)) {
>>> +		trace_nfsd_read_splice(rqstp, fhp, offset, *count);
>>> +		host_err =3D nfsd_splice_read(rqstp, file, offset, =
count);
>>> +	} else {
>>> +		trace_nfsd_read_vector(rqstp, fhp, offset, *count);
>>> +		host_err =3D nfsd_readv(file, offset, vec, vlen, count);
>>> +	}
>>> +	if (host_err >=3D 0) {
>>> +		trace_nfsd_read_io_done(rqstp, fhp, offset, host_err);
>>> +		nfsdstats.io_read +=3D host_err;
>>> +		*count =3D host_err;
>>> +		fsnotify_access(file);
>>> +		err =3D nfs_ok;
>>> +	} else  {
>>> +		trace_nfsd_read_err(rqstp, fhp, offset, host_err);
>>> +		err =3D nfserrno(host_err);
>>> +	}
>>>=20
>>> 	if (ra)
>>> 		nfsd_put_raparams(file, ra);
>>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
>>> index be6d8e0..d9131c3 100644
>>> --- a/fs/nfsd/vfs.h
>>> +++ b/fs/nfsd/vfs.h
>>> @@ -78,10 +78,10 @@ __be32		nfsd_commit(struct svc_rqst *, =
struct svc_fh *,
>>> __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, =
umode_t,
>>> 				int, struct file **);
>>> struct raparms;
>>> -__be32		nfsd_splice_read(struct svc_rqst *,
>>> -				struct file *, loff_t, unsigned long *);
>>> -__be32		nfsd_readv(struct file *, loff_t, struct kvec *, =
int,
>>> -				unsigned long *);
>>> +int		nfsd_splice_read(struct svc_rqst *rqstp, struct =
file *file,
>>> +				loff_t offset, unsigned long *count);
>>> +int		nfsd_readv(struct file *file, loff_t offset, =
struct kvec *vec,
>>> +				int vlen, unsigned long *count);
>>> __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
>>> 				loff_t, struct kvec *, int, unsigned =
long *);
>>> __be32 		nfsd_write(struct svc_rqst *, struct svc_fh *, =
loff_t,
>>>=20
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>=20
>> --
>> Chuck Lever
>>=20
>>=20

--
Chuck Lever




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc
  2018-03-27 21:22       ` Chuck Lever
@ 2018-03-27 21:51         ` Bruce Fields
  0 siblings, 0 replies; 23+ messages in thread
From: Bruce Fields @ 2018-03-27 21:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Tue, Mar 27, 2018 at 05:22:28PM -0400, Chuck Lever wrote:
> 
> 
> > On Mar 27, 2018, at 4:14 PM, Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > Could you check that I got this right in
> > 
> > 	git://linux-nfs.org/~bfields/linux.git nfsd-next
> 
> You changed it correctly.
> 
> However, the patch description says "Also, we want to record any local I/O error that occurs, and the total count of bytes that were actually moved."
> 
> Therefore I think we want to pass host_err rather than maxcount to trace_nfsd_read_io_done, in both NFSv4 read helpers.
> 
> Do you want me to resend with these fixes?

Sure, thanks!--b.

> 
> 
> > --b.
> > 
> > On Tue, Mar 27, 2018 at 12:57:17PM -0400, Chuck Lever wrote:
> >> 
> >> 
> >>> On Mar 27, 2018, at 10:53 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>> 
> >>> NFSv4 read compound processing invokes nfsd_splice_read and
> >>> nfs_readv directly, so the trace points currently in nfsd_read are
> >>> not effective for NFSv4 reads.
> >>> 
> >>> Move and copy the trace points so that NFSv4 reads are captured.
> >>> Also, we want to record any local I/O error that occurs, and
> >>> the total count of bytes that were actually moved. And, also
> >>> whether splice or vectored read was used.
> >>> 
> >>> The svc_fh is not passed to the read helpers, so some code
> >>> duplication is necessary.
> >>> 
> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>> ---
> >>> fs/nfsd/nfs4proc.c |    5 +++++
> >>> fs/nfsd/nfs4xdr.c  |   56 +++++++++++++++++++++++++++++++++++++---------------
> >>> fs/nfsd/trace.h    |    4 +++-
> >>> fs/nfsd/vfs.c      |   54 ++++++++++++++++++++++----------------------------
> >>> fs/nfsd/vfs.h      |    8 ++++---
> >>> 5 files changed, 76 insertions(+), 51 deletions(-)
> >>> 
> >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>> index b93673e..39016b6 100644
> >>> --- a/fs/nfsd/nfs4proc.c
> >>> +++ b/fs/nfsd/nfs4proc.c
> >>> @@ -751,6 +751,9 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >>> 	if (read->rd_offset >= OFFSET_MAX)
> >>> 		return nfserr_inval;
> >>> 
> >>> +	trace_nfsd_read_start(rqstp, &cstate->current_fh,
> >>> +			      read->rd_offset, read->rd_length);
> >>> +
> >>> 	/*
> >>> 	 * If we do a zero copy read, then a client will see read data
> >>> 	 * that reflects the state of the file *after* performing the
> >>> @@ -783,6 +786,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >>> {
> >>> 	if (u->read.rd_filp)
> >>> 		fput(u->read.rd_filp);
> >>> +	trace_nfsd_read_done(u->read.rd_rqstp, u->read.rd_fhp,
> >>> +			     u->read.rd_offset, u->read.rd_length);
> >>> }
> >>> 
> >>> static __be32
> >>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >>> index e502fd1..d03059a 100644
> >>> --- a/fs/nfsd/nfs4xdr.c
> >>> +++ b/fs/nfsd/nfs4xdr.c
> >>> @@ -37,6 +37,7 @@
> >>> #include <linux/file.h>
> >>> #include <linux/slab.h>
> >>> #include <linux/namei.h>
> >>> +#include <linux/fsnotify.h>
> >>> #include <linux/statfs.h>
> >>> #include <linux/utsname.h>
> >>> #include <linux/pagemap.h>
> >>> @@ -50,6 +51,7 @@
> >>> #include "cache.h"
> >>> #include "netns.h"
> >>> #include "pnfs.h"
> >>> +#include "trace.h"
> >>> 
> >>> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> >>> #include <linux/security.h>
> >>> @@ -3416,28 +3418,28 @@ static __be32 nfsd4_encode_splice_read(
> >>> {
> >>> 	struct xdr_stream *xdr = &resp->xdr;
> >>> 	struct xdr_buf *buf = xdr->buf;
> >>> +	int host_err;
> >>> 	u32 eof;
> >>> 	long len;
> >>> 	int space_left;
> >>> -	__be32 nfserr;
> >>> 	__be32 *p = xdr->p - 2;
> >>> 
> >>> 	/* Make sure there will be room for padding if needed */
> >>> 	if (xdr->end - xdr->p < 1)
> >>> 		return nfserr_resource;
> >>> 
> >>> +	trace_nfsd_read_splice(resp->rqstp, read->rd_fhp,
> >>> +			       read->rd_offset, maxcount);
> >>> 	len = maxcount;
> >>> -	nfserr = nfsd_splice_read(read->rd_rqstp, file,
> >>> +	host_err = nfsd_splice_read(read->rd_rqstp, file,
> >>> 				  read->rd_offset, &maxcount);
> >>> -	if (nfserr) {
> >>> -		/*
> >>> -		 * nfsd_splice_actor may have already messed with the
> >>> -		 * page length; reset it so as not to confuse
> >>> -		 * xdr_truncate_encode:
> >>> -		 */
> >>> -		buf->page_len = 0;
> >>> -		return nfserr;
> >>> -	}
> >>> +	if (host_err < 0)
> >>> +		goto err;
> >>> +	trace_nfsd_read_io_done(read->rd_rqstp, read->rd_fhp,
> >>> +				read->rd_offset, maxcount);
> >>> +	maxcount = host_err;
> >>> +	nfsdstats.io_read += maxcount;
> >>> +	fsnotify_access(file);
> >>> 
> >>> 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
> >>> 				d_inode(read->rd_fhp->fh_dentry)->i_size);
> >>> @@ -3470,6 +3472,17 @@ static __be32 nfsd4_encode_splice_read(
> >>> 	xdr->end = (__be32 *)((void *)xdr->end + space_left);
> >>> 
> >>> 	return 0;
> >>> +
> >>> +err:
> >>> +	/*
> >>> +	 * nfsd_splice_actor may have already messed with the
> >>> +	 * page length; reset it so as not to confuse
> >>> +	 * xdr_truncate_encode:
> >>> +	 */
> >>> +	buf->page_len = 0;
> >>> +	trace_nfsd_read_err(read->rd_rqstp, read->rd_fhp,
> >>> +			    read->rd_offset, host_err);
> >>> +	return nfserrno(host_err);
> >>> }
> >>> 
> >>> static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> >>> @@ -3477,12 +3490,12 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> >>> 				 struct file *file, unsigned long maxcount)
> >>> {
> >>> 	struct xdr_stream *xdr = &resp->xdr;
> >>> +	int host_err;
> >>> 	u32 eof;
> >>> 	int v;
> >>> 	int starting_len = xdr->buf->len - 8;
> >>> 	long len;
> >>> 	int thislen;
> >>> -	__be32 nfserr;
> >>> 	__be32 tmp;
> >>> 	__be32 *p;
> >>> 	u32 zzz = 0;
> >>> @@ -3510,11 +3523,18 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> >>> 	}
> >>> 	read->rd_vlen = v;
> >>> 
> >>> +	trace_nfsd_read_vector(resp->rqstp, read->rd_fhp,
> >>> +			       read->rd_offset, maxcount);
> >>> 	len = maxcount;
> >>> -	nfserr = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
> >>> -			read->rd_vlen, &maxcount);
> >>> -	if (nfserr)
> >>> -		return nfserr;
> >>> +	host_err = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
> >>> +			      read->rd_vlen, &maxcount);
> >>> +	if (host_err < 0)
> >>> +		goto err;
> >>> +	trace_nfsd_read_io_done(resp->rqstp, read->rd_fhp,
> >>> +				read->rd_offset, maxcount);
> >> 
> >> I missed a spot.
> >> 
> >> +	maxcount = host_err;
> >> 
> >>> +	nfsdstats.io_read += maxcount;
> >>> +	fsnotify_access(file);
> >>> +
> >>> 	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> >>> 
> >>> 	eof = nfsd_eof_on_read(len, maxcount, read->rd_offset,
> >>> @@ -3530,6 +3550,10 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> >>> 								&zzz, pad);
> >>> 	return 0;
> >>> 
> >>> +err:
> >>> +	trace_nfsd_read_err(resp->rqstp, read->rd_fhp,
> >>> +			    read->rd_offset, host_err);
> >>> +	return nfserrno(host_err);
> >>> }
> >>> 
> >>> static __be32
> >>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> >>> index 653e9ee..a8bbd9d 100644
> >>> --- a/fs/nfsd/trace.h
> >>> +++ b/fs/nfsd/trace.h
> >>> @@ -43,7 +43,8 @@
> >>> 	TP_ARGS(rqstp, fhp, offset, len))
> >>> 
> >>> DEFINE_NFSD_IO_EVENT(read_start);
> >>> -DEFINE_NFSD_IO_EVENT(read_opened);
> >>> +DEFINE_NFSD_IO_EVENT(read_splice);
> >>> +DEFINE_NFSD_IO_EVENT(read_vector);
> >>> DEFINE_NFSD_IO_EVENT(read_io_done);
> >>> DEFINE_NFSD_IO_EVENT(read_done);
> >>> DEFINE_NFSD_IO_EVENT(write_start);
> >>> @@ -82,6 +83,7 @@
> >>> 		 int		len),		\
> >>> 	TP_ARGS(rqstp, fhp, offset, len))
> >>> 
> >>> +DEFINE_NFSD_ERR_EVENT(read_err);
> >>> DEFINE_NFSD_ERR_EVENT(write_err);
> >>> 
> >>> #include "state.h"
> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>> index ee59a0b..0fa54b5 100644
> >>> --- a/fs/nfsd/vfs.c
> >>> +++ b/fs/nfsd/vfs.c
> >>> @@ -881,19 +881,7 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
> >>> 	return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
> >>> }
> >>> 
> >>> -static __be32
> >>> -nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
> >>> -{
> >>> -	if (host_err >= 0) {
> >>> -		nfsdstats.io_read += host_err;
> >>> -		*count = host_err;
> >>> -		fsnotify_access(file);
> >>> -		return 0;
> >>> -	} else 
> >>> -		return nfserrno(host_err);
> >>> -}
> >>> -
> >>> -__be32 nfsd_splice_read(struct svc_rqst *rqstp,
> >>> +int nfsd_splice_read(struct svc_rqst *rqstp,
> >>> 		     struct file *file, loff_t offset, unsigned long *count)
> >>> {
> >>> 	struct splice_desc sd = {
> >>> @@ -902,23 +890,18 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp,
> >>> 		.pos		= offset,
> >>> 		.u.data		= rqstp,
> >>> 	};
> >>> -	int host_err;
> >>> 
> >>> 	rqstp->rq_next_page = rqstp->rq_respages + 1;
> >>> -	host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
> >>> -	return nfsd_finish_read(file, count, host_err);
> >>> +	return splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
> >>> }
> >>> 
> >>> -__be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
> >>> -		unsigned long *count)
> >>> +int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
> >>> +	       unsigned long *count)
> >>> {
> >>> 	struct iov_iter iter;
> >>> -	int host_err;
> >>> 
> >>> 	iov_iter_kvec(&iter, READ | ITER_KVEC, vec, vlen, *count);
> >>> -	host_err = vfs_iter_read(file, &iter, &offset, 0);
> >>> -
> >>> -	return nfsd_finish_read(file, count, host_err);
> >>> +	return vfs_iter_read(file, &iter, &offset, 0);
> >>> }
> >>> 
> >>> /*
> >>> @@ -1025,6 +1008,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> {
> >>> 	struct file *file;
> >>> 	struct raparms	*ra;
> >>> +	int host_err;
> >>> 	__be32 err;
> >>> 
> >>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> >>> @@ -1034,14 +1018,24 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> 
> >>> 	ra = nfsd_init_raparms(file);
> >>> 
> >>> -	trace_nfsd_read_opened(rqstp, fhp, offset, *count);
> >>> -
> >>> -	if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &rqstp->rq_flags))
> >>> -		err = nfsd_splice_read(rqstp, file, offset, count);
> >>> -	else
> >>> -		err = nfsd_readv(file, offset, vec, vlen, count);
> >>> -
> >>> -	trace_nfsd_read_io_done(rqstp, fhp, offset, *count);
> >>> +	if (file->f_op->splice_read &&
> >>> +	    test_bit(RQ_SPLICE_OK, &rqstp->rq_flags)) {
> >>> +		trace_nfsd_read_splice(rqstp, fhp, offset, *count);
> >>> +		host_err = nfsd_splice_read(rqstp, file, offset, count);
> >>> +	} else {
> >>> +		trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> >>> +		host_err = nfsd_readv(file, offset, vec, vlen, count);
> >>> +	}
> >>> +	if (host_err >= 0) {
> >>> +		trace_nfsd_read_io_done(rqstp, fhp, offset, host_err);
> >>> +		nfsdstats.io_read += host_err;
> >>> +		*count = host_err;
> >>> +		fsnotify_access(file);
> >>> +		err = nfs_ok;
> >>> +	} else  {
> >>> +		trace_nfsd_read_err(rqstp, fhp, offset, host_err);
> >>> +		err = nfserrno(host_err);
> >>> +	}
> >>> 
> >>> 	if (ra)
> >>> 		nfsd_put_raparams(file, ra);
> >>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> >>> index be6d8e0..d9131c3 100644
> >>> --- a/fs/nfsd/vfs.h
> >>> +++ b/fs/nfsd/vfs.h
> >>> @@ -78,10 +78,10 @@ __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
> >>> __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
> >>> 				int, struct file **);
> >>> struct raparms;
> >>> -__be32		nfsd_splice_read(struct svc_rqst *,
> >>> -				struct file *, loff_t, unsigned long *);
> >>> -__be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
> >>> -				unsigned long *);
> >>> +int		nfsd_splice_read(struct svc_rqst *rqstp, struct file *file,
> >>> +				loff_t offset, unsigned long *count);
> >>> +int		nfsd_readv(struct file *file, loff_t offset, struct kvec *vec,
> >>> +				int vlen, unsigned long *count);
> >>> __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
> >>> 				loff_t, struct kvec *, int, unsigned long *);
> >>> __be32 		nfsd_write(struct svc_rqst *, struct svc_fh *, loff_t,
> >>> 
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-03-27 21:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27 14:49 [PATCH v2 00/18] NFS/RDMA server for v4.17 Chuck Lever
2018-03-27 14:49 ` [PATCH v2 01/18] sunrpc: Remove unneeded pointer dereference Chuck Lever
2018-03-27 14:49 ` [PATCH v2 02/18] svc: Simplify ->xpo_secure_port Chuck Lever
2018-03-27 14:49 ` [PATCH v2 03/18] sunrpc: Update show_svc_xprt_flags() to include recently added flags Chuck Lever
2018-03-27 14:50 ` [PATCH v2 04/18] sunrpc: Move trace_svc_xprt_dequeue() Chuck Lever
2018-03-27 14:50 ` [PATCH v2 05/18] sunrpc: Simplify do_enqueue tracing Chuck Lever
2018-03-27 14:50 ` [PATCH v2 06/18] sunrpc: Simplify trace_svc_recv Chuck Lever
2018-03-27 14:51 ` [PATCH v2 07/18] sunrpc: Save remote presentation address in svc_xprt for trace events Chuck Lever
2018-03-27 14:51 ` [PATCH v2 08/18] sunrpc: Re-purpose trace_svc_process Chuck Lever
2018-03-27 14:51 ` [PATCH v2 09/18] sunrpc: Report per-RPC execution stats Chuck Lever
2018-03-27 14:52 ` [PATCH v2 10/18] svc: Report xprt dequeue latency Chuck Lever
2018-03-27 14:52 ` [PATCH v2 11/18] nfsd: Fix NFSD trace points Chuck Lever
2018-03-27 14:52 ` [PATCH v2 12/18] nfsd: Record request byte count, not count of vectors Chuck Lever
2018-03-27 14:53 ` [PATCH v2 13/18] nfsd: Add "nfsd_" to trace point names Chuck Lever
2018-03-27 14:53 ` [PATCH v2 14/18] nfsd: Add I/O trace points in the NFSv4 write path Chuck Lever
2018-03-27 14:53 ` [PATCH v2 15/18] nfsd: Add I/O trace points in the NFSv4 read proc Chuck Lever
2018-03-27 16:57   ` Chuck Lever
2018-03-27 20:14     ` Bruce Fields
2018-03-27 21:22       ` Chuck Lever
2018-03-27 21:51         ` Bruce Fields
2018-03-27 14:53 ` [PATCH v2 16/18] nfsd: Trace NFSv4 COMPOUND execution Chuck Lever
2018-03-27 14:54 ` [PATCH v2 17/18] NFSD: Clean up legacy NFS WRITE argument XDR decoders Chuck Lever
2018-03-27 14:54 ` [PATCH v2 18/18] NFSD: Clean up legacy NFS SYMLINK " Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.