All of lore.kernel.org
 help / color / mirror / Atom feed
* [00/05] Add zero copy capability to virtio transport
@ 2010-08-17 17:27 Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer; +Cc: linux-fsdevel, Venkateswararao Jujjuri

The newly added capability() to p9_trans_module
helps the transport agnostic code to determine the transport
capability and act accordingly. No changes needed for the other transports.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
@ 2010-08-17 17:27 ` Venkateswararao Jujjuri (JV)
  2010-08-17 20:43   ` [V9fs-developer] " Eric Van Hensbergen
  2010-08-17 17:27 ` [PATCH 2/5] [net/9p] Pass p9_client structure to pdu perpartion routines Venkateswararao Jujjuri (JV)
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV), Badari Pulavarty

Every transport layer may have unique capabilities. This function is employed
to query and make use of those special/unique cabailities.
To start with we will be defining P9_CAP_GET_MAX_SG_PAGES.
If capability(P9_CAP_GET_MAX_SG_PAGES) exists AND returns a value greater
than 0, it means that the transport can support the transportation
of that many mapped pages between the client and server.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 include/net/9p/transport.h |    2 ++
 net/9p/trans_virtio.c      |   13 +++++++++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h
index 6d5886e..495a118 100644
--- a/include/net/9p/transport.h
+++ b/include/net/9p/transport.h
@@ -25,6 +25,7 @@
 
 #ifndef NET_9P_TRANSPORT_H
 #define NET_9P_TRANSPORT_H
+#define P9_CAP_GET_MAX_SG_PAGES 0x01
 
 /**
  * struct p9_trans_module - transport module interface
@@ -53,6 +54,7 @@ struct p9_trans_module {
 	void (*close) (struct p9_client *);
 	int (*request) (struct p9_client *, struct p9_req_t *req);
 	int (*cancel) (struct p9_client *, struct p9_req_t *req);
+	int (*capability) (int req);
 };
 
 void v9fs_register_trans(struct p9_trans_module *m);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index dcfbe99..762c19f 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -374,6 +374,18 @@ static void p9_virtio_remove(struct virtio_device *vdev)
 
 }
 
+/**
+ * p9_virtio_capability - Return the queried capability related information.
+ *
+ */
+static int p9_virtio_capability(int req)
+{
+	int ret = 0;
+	if (req == P9_CAP_GET_MAX_SG_PAGES)
+		ret = VIRTQUEUE_NUM - 8; /* Keep 8 for in/out headers */
+	return ret;
+}
+
 static struct virtio_device_id id_table[] = {
 	{ VIRTIO_ID_9P, VIRTIO_DEV_ANY_ID },
 	{ 0 },
@@ -400,6 +412,7 @@ static struct p9_trans_module p9_virtio_trans = {
 	.close = p9_virtio_close,
 	.request = p9_virtio_request,
 	.cancel = p9_virtio_cancel,
+	.capability = p9_virtio_capability,
 	.maxsize = PAGE_SIZE*16,
 	.def = 0,
 	.owner = THIS_MODULE,
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] [net/9p] Pass p9_client structure to pdu perpartion routines.
  2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
@ 2010-08-17 17:27 ` Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list Venkateswararao Jujjuri (JV)
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV), Badari Pulavarty

This patch passes struct p9_client to p9pdu_vwritef(), p9pdu_prepare()
p9pdu_finalize(), p9pdu_writef() in preparation for adding zero copy
support for virtio transport.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 net/9p/client.c   |    6 +++---
 net/9p/protocol.c |   49 ++++++++++++++++++++++---------------------------
 net/9p/protocol.h |    7 ++++---
 3 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index dc6f2f2..29bbbbd 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -562,11 +562,11 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 		return req;
 
 	/* marshall the data */
-	p9pdu_prepare(req->tc, tag, type);
+	p9pdu_prepare(c, req->tc, tag, type);
 	va_start(ap, fmt);
-	err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
+	err = p9pdu_vwritef(req->tc, c, fmt, ap);
 	va_end(ap);
-	p9pdu_finalize(req->tc);
+	p9pdu_finalize(c, req->tc);
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 3acd3af..ca63aff 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -53,7 +53,7 @@
 #endif
 
 static int
-p9pdu_writef(struct p9_fcall *pdu, int proto_version, const char *fmt, ...);
+p9pdu_writef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt, ...);
 
 #ifdef CONFIG_NET_9P_DEBUG
 void
@@ -386,11 +386,12 @@ p9pdu_vreadf(struct p9_fcall *pdu, int proto_version, const char *fmt,
 }
 
 int
-p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
+p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
 	va_list ap)
 {
 	const char *ptr;
 	int errcode = 0;
+	int proto_version = c->proto_version;
 
 	for (ptr = fmt; *ptr; ptr++) {
 		switch (*ptr) {
@@ -424,8 +425,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				if (sptr)
 					len = MIN(strlen(sptr), USHRT_MAX);
 
-				errcode = p9pdu_writef(pdu, proto_version,
-								"w", len);
+				errcode = p9pdu_writef(pdu, c, "w", len);
 				if (!errcode && pdu_write(pdu, sptr, len))
 					errcode = -EFAULT;
 			}
@@ -434,7 +434,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				const struct p9_qid *qid =
 				    va_arg(ap, const struct p9_qid *);
 				errcode =
-				    p9pdu_writef(pdu, proto_version, "bdq",
+				    p9pdu_writef(pdu, c, "bdq",
 						 qid->type, qid->version,
 						 qid->path);
 			} break;
@@ -442,7 +442,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				const struct p9_wstat *stbuf =
 				    va_arg(ap, const struct p9_wstat *);
 				errcode =
-				    p9pdu_writef(pdu, proto_version,
+				    p9pdu_writef(pdu, c,
 						 "wwdQdddqssss?sddd",
 						 stbuf->size, stbuf->type,
 						 stbuf->dev, &stbuf->qid,
@@ -457,8 +457,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				int32_t count = va_arg(ap, int32_t);
 				const void *data = va_arg(ap, const void *);
 
-				errcode = p9pdu_writef(pdu, proto_version, "d",
-									count);
+				errcode = p9pdu_writef(pdu, c, "d", count);
 				if (!errcode && pdu_write(pdu, data, count))
 					errcode = -EFAULT;
 			}
@@ -467,8 +466,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				int32_t count = va_arg(ap, int32_t);
 				const char __user *udata =
 						va_arg(ap, const void __user *);
-				errcode = p9pdu_writef(pdu, proto_version, "d",
-									count);
+				errcode = p9pdu_writef(pdu, c, "d", count);
 				if (!errcode && pdu_write_u(pdu, udata, count))
 					errcode = -EFAULT;
 			}
@@ -477,17 +475,15 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				int16_t nwname = va_arg(ap, int);
 				const char **wnames = va_arg(ap, const char **);
 
-				errcode = p9pdu_writef(pdu, proto_version, "w",
-									nwname);
+				errcode = p9pdu_writef(pdu, c, "w", nwname);
 				if (!errcode) {
 					int i;
 
 					for (i = 0; i < nwname; i++) {
 						errcode =
 						    p9pdu_writef(pdu,
-								proto_version,
-								 "s",
-								 wnames[i]);
+								c, "s",
+								wnames[i]);
 						if (errcode)
 							break;
 					}
@@ -499,17 +495,15 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				struct p9_qid *wqids =
 				    va_arg(ap, struct p9_qid *);
 
-				errcode = p9pdu_writef(pdu, proto_version, "w",
-									nwqid);
+				errcode = p9pdu_writef(pdu, c, "w", nwqid);
 				if (!errcode) {
 					int i;
 
 					for (i = 0; i < nwqid; i++) {
 						errcode =
 						    p9pdu_writef(pdu,
-								proto_version,
-								 "Q",
-								 &wqids[i]);
+								c, "Q",
+								&wqids[i]);
 						if (errcode)
 							break;
 					}
@@ -520,7 +514,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				struct p9_iattr_dotl *p9attr = va_arg(ap,
 							struct p9_iattr_dotl *);
 
-				errcode = p9pdu_writef(pdu, proto_version,
+				errcode = p9pdu_writef(pdu, c,
 							"ddddqqqqq",
 							p9attr->valid,
 							p9attr->mode,
@@ -563,13 +557,13 @@ int p9pdu_readf(struct p9_fcall *pdu, int proto_version, const char *fmt, ...)
 }
 
 static int
-p9pdu_writef(struct p9_fcall *pdu, int proto_version, const char *fmt, ...)
+p9pdu_writef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt, ...)
 {
 	va_list ap;
 	int ret;
 
 	va_start(ap, fmt);
-	ret = p9pdu_vwritef(pdu, proto_version, fmt, ap);
+	ret = p9pdu_vwritef(pdu, c, fmt, ap);
 	va_end(ap);
 
 	return ret;
@@ -595,18 +589,19 @@ int p9stat_read(char *buf, int len, struct p9_wstat *st, int proto_version)
 }
 EXPORT_SYMBOL(p9stat_read);
 
-int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type)
+int p9pdu_prepare(struct p9_client *c, struct p9_fcall *pdu, int16_t tag,
+								int8_t type)
 {
-	return p9pdu_writef(pdu, 0, "dbw", 0, type, tag);
+	return p9pdu_writef(pdu, c, "dbw", 0, type, tag);
 }
 
-int p9pdu_finalize(struct p9_fcall *pdu)
+int p9pdu_finalize(struct p9_client *c, struct p9_fcall *pdu)
 {
 	int size = pdu->size;
 	int err;
 
 	pdu->size = 0;
-	err = p9pdu_writef(pdu, 0, "d", size);
+	err = p9pdu_writef(pdu, c, "d", size);
 	pdu->size = size;
 
 #ifdef CONFIG_NET_9P_DEBUG
diff --git a/net/9p/protocol.h b/net/9p/protocol.h
index 2431c0f..6d059a6 100644
--- a/net/9p/protocol.h
+++ b/net/9p/protocol.h
@@ -25,10 +25,11 @@
  *
  */
 
-int p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
+int p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
 								va_list ap);
 int p9pdu_readf(struct p9_fcall *pdu, int proto_version, const char *fmt, ...);
-int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type);
-int p9pdu_finalize(struct p9_fcall *pdu);
+int p9pdu_prepare(struct p9_client *c, struct p9_fcall *pdu, int16_t tag,
+								int8_t type);
+int p9pdu_finalize(struct p9_client *c, struct p9_fcall *pdu);
 void p9pdu_dump(int, struct p9_fcall *);
 void p9pdu_reset(struct p9_fcall *pdu);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 2/5] [net/9p] Pass p9_client structure to pdu perpartion routines Venkateswararao Jujjuri (JV)
@ 2010-08-17 17:27 ` Venkateswararao Jujjuri (JV)
  2010-08-18 20:50   ` [V9fs-developer] " Latchesar Ionkov
  2010-08-17 17:27 ` [PATCH 4/5] [net/9p] Achieve zero copy on write path Venkateswararao Jujjuri (JV)
  2010-08-17 17:27 ` [PATCH 5/5] [net/9p] Achieve zero copy on read path Venkateswararao Jujjuri (JV)
  4 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV), Badari Pulavarty

This patch adds necessary infrastructure for placing page addresses
directly on the sg list for the server to consume.

The newly added routine pack_sg_list_p() is just like pack_sg_list()
except that it takes page array as an input and directly places them on
the sg list after taking care of the first page offset.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 include/net/9p/9p.h   |    6 ++++-
 net/9p/client.c       |    4 +++
 net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index a8de812..382ef22 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -651,7 +651,11 @@ struct p9_fcall {
 
 	size_t offset;
 	size_t capacity;
-
+	struct page **pdata;
+	uint32_t pdata_mapped_pages;
+	uint32_t pdata_off;
+	uint32_t pdata_write_len;
+	uint32_t pdata_read_len;
 	uint8_t *sdata;
 };
 
diff --git a/net/9p/client.c b/net/9p/client.c
index 29bbbbd..5487896 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
 		}
 		req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
 		req->tc->capacity = c->msize;
+		req->tc->pdata_write_len = 0;
+		req->tc->pdata_read_len = 0;
 		req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
 		req->rc->capacity = c->msize;
+		req->rc->pdata_write_len = 0;
+		req->rc->pdata_read_len = 0;
 	}
 
 	p9pdu_reset(req->tc);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 762c19f..8f86cb5 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
 	return index-start;
 }
 
+/**
+ * pack_sg_list_p - Pack a scatter gather list from an array of pages.
+ * @sg: scatter/gather list to pack into
+ * @start: which segment of the sg_list to start at
+ * @limit: maximum segment to pack data to
+ * @pdu: pdu prepared to put on the wire.
+ * @count: amount of data to pack into the scatter/gather list
+ *
+ * This is just like pack_sg_list() except that it takes page array
+ * as an input and directly places them on the sg list after taking
+ * care of the first page offset.
+ */
+
+static int
+pack_sg_list_p(struct scatterlist *sg, int start, int limit,
+		struct p9_fcall *pdu, int count)
+{
+	int s;
+	int i = 0;
+	int index = start;
+
+	if (pdu->pdata_off) {
+		s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
+		sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
+		count -= s;
+	}
+
+	while (count) {
+		BUG_ON(index > limit);
+		s = min((int)PAGE_SIZE, count);
+		sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
+		count -= s;
+	}
+
+	return index-start;
+}
+
+
 /* We don't currently allow canceling of virtio requests */
 static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
 {
@@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
 static int
 p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
 {
-	int in, out;
+	int in, out, outp, inp;
 	struct virtio_chan *chan = client->trans;
 	char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
 
 	P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
 
 	out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
-								req->tc->size);
-	in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
+			req->tc->size);
+
+	BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
+
+	if (req->tc->pdata_write_len) {
+		outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
+					req->tc, req->tc->pdata_write_len);
+		out += outp;
+	}
+	if (req->tc->pdata_read_len) {
+		inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
+		in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
+					req->tc, req->tc->pdata_read_len);
+		in += inp;
+	} else {
+		in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
 								client->msize);
+	}
 
 	req->status = REQ_STATUS_SENT;
 
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] [net/9p] Achieve zero copy on write path.
  2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
                   ` (2 preceding siblings ...)
  2010-08-17 17:27 ` [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list Venkateswararao Jujjuri (JV)
@ 2010-08-17 17:27 ` Venkateswararao Jujjuri (JV)
  2010-08-19 19:30   ` [V9fs-developer] " Latchesar Ionkov
  2010-08-17 17:27 ` [PATCH 5/5] [net/9p] Achieve zero copy on read path Venkateswararao Jujjuri (JV)
  4 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV), Badari Pulavarty

This patch avoids copy_from_user by employing get_user_pages_fast() on the
udata buffer. This will eliminate an additonal copy of user buffer into
kernel buffer befre placing on the virtio ring.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 net/9p/client.c   |   32 ++++++++++++++++++++-
 net/9p/protocol.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 5487896..7ce58fb 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -36,6 +36,7 @@
 #include <linux/parser.h>
 #include <net/9p/client.h>
 #include <net/9p/transport.h>
+#include <linux/mm.h>
 #include "protocol.h"
 
 /*
@@ -521,6 +522,25 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
 }
 
 /**
+ * p9_release_req_pages - Release pages after the transaction.
+ * @req - Request buffer.
+ *
+ */
+static void
+p9_release_req_pages(struct p9_req_t *req)
+{
+	int i = 0;
+	while (req->tc->pdata[i] && req->tc->pdata_mapped_pages--) {
+		put_page(req->tc->pdata[i]);
+		req->tc->pdata[i] = NULL;
+		i++;
+	}
+	req->tc->pdata_write_len = 0;
+	req->tc->pdata_read_len = 0;
+}
+
+
+/**
  * p9_client_rpc - issue a request and wait for a response
  * @c: client session
  * @type: type of request
@@ -575,6 +595,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
 		c->status = Disconnected;
+		if (req->tc->pdata_write_len || req->tc->pdata_read_len)
+			p9_release_req_pages(req);
 		goto reterr;
 	}
 
@@ -583,6 +605,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 						req->status >= REQ_STATUS_RCVD);
 	P9_DPRINTK(P9_DEBUG_MUX, "wait %p tag: %d returned %d\n",
 						req->wq, tag, err);
+	if (req->tc->pdata_write_len || req->tc->pdata_read_len)
+		p9_release_req_pages(req);
 
 	if (req->status == REQ_STATUS_ERROR) {
 		P9_DPRINTK(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -1331,9 +1355,15 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
 	if (data)
 		req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
 								rsize, data);
-	else
+	else {
+		if (clnt->trans_mod->capability &&
+			clnt->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES)) {
+
+			rsize = count;
+		}
 		req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
 								rsize, udata);
+	}
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto error;
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index ca63aff..97f313d 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -31,9 +31,12 @@
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/types.h>
+#include <linux/parser.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 #include "protocol.h"
+#include <net/9p/transport.h>
+#include <linux/pagemap.h>
 
 #ifndef MIN
 #define MIN(a, b) (((a) < (b)) ? (a) : (b))
@@ -110,6 +113,51 @@ static size_t pdu_read(struct p9_fcall *pdu, void *data, size_t size)
 	return size - len;
 }
 
+static int
+pdu_fill_pages(struct p9_fcall *pdu, const char __user *udata, size_t size,
+		int rw, int max_sg_pages)
+{
+	int nr_pages;
+	uint32_t first_page_bytes = 0;
+	int pdata_len;
+
+	nr_pages = size >> PAGE_SHIFT;
+	pdu->pdata_off = (size_t)udata & (PAGE_SIZE-1);
+	if (pdu->pdata_off)
+		first_page_bytes = PAGE_SIZE - pdu->pdata_off;
+	if (size - (first_page_bytes + (nr_pages << PAGE_SHIFT))) {
+		/* trailing partial page */
+		nr_pages++;
+	}
+	if (first_page_bytes) {
+		/* leading partial page */
+		nr_pages++;
+	}
+	nr_pages = min(max_sg_pages, nr_pages);
+	pdu->pdata = (struct page **)(pdu->sdata + pdu->size);
+	pdu->pdata_write_len = 0;
+	pdu->pdata_read_len = 0;
+	pdu->pdata_mapped_pages = get_user_pages_fast((unsigned long)udata,
+			nr_pages, rw, pdu->pdata);
+	if (pdu->pdata_mapped_pages < 0) {
+		printk(KERN_WARNING "get_user_pages_fast failed:%d udata:%p"
+				"nr_pages:%d\n", pdu->pdata_mapped_pages,
+				udata, nr_pages);
+		pdu->pdata_mapped_pages = 0;
+		return -1;
+	}
+	if (pdu->pdata_off) {
+		pdata_len = first_page_bytes;
+		pdata_len += min((size - pdata_len),
+				((size_t)pdu->pdata_mapped_pages - 1) <<
+								PAGE_SHIFT);
+	} else {
+		pdata_len = min(size, (size_t)pdu->pdata_mapped_pages <<
+								PAGE_SHIFT);
+	}
+	return pdata_len;
+}
+
 static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
 {
 	size_t len = MIN(pdu->capacity - pdu->size, size);
@@ -119,15 +167,31 @@ static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
 }
 
 static size_t
-pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
+pdu_write_u(struct p9_fcall *pdu, struct p9_client *c, const char __user *udata,
+		size_t size)
 {
-	size_t len = MIN(pdu->capacity - pdu->size, size);
-	int err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
-	if (err)
-		printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
+	size_t len;
+	int err;
+	int max_req_sg_pages = 0;
 
-	pdu->size += len;
-	return size - len;
+	if (c->trans_mod->capability &&
+			(udata && !segment_eq(get_fs(), KERNEL_DS))) {
+		max_req_sg_pages =
+			c->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES);
+	}
+	if (max_req_sg_pages) {
+		len = pdu_fill_pages(pdu, udata, size, 0, max_req_sg_pages);
+		if (len < 0)
+			return len;
+		pdu->pdata_write_len = len;
+	} else {
+		len = MIN(pdu->capacity - pdu->size, size);
+		err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
+		if (err)
+			printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
+		pdu->size += len;
+	}
+	return len;
 }
 
 /*
@@ -467,7 +531,8 @@ p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
 				const char __user *udata =
 						va_arg(ap, const void __user *);
 				errcode = p9pdu_writef(pdu, c, "d", count);
-				if (!errcode && pdu_write_u(pdu, udata, count))
+				if (!errcode &&
+					pdu_write_u(pdu, c, udata, count) < 0)
 					errcode = -EFAULT;
 			}
 			break;
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] [net/9p] Achieve zero copy on read path.
  2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
                   ` (3 preceding siblings ...)
  2010-08-17 17:27 ` [PATCH 4/5] [net/9p] Achieve zero copy on write path Venkateswararao Jujjuri (JV)
@ 2010-08-17 17:27 ` Venkateswararao Jujjuri (JV)
  4 siblings, 0 replies; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 17:27 UTC (permalink / raw)
  To: v9fs-developer
  Cc: linux-fsdevel, Venkateswararao Jujjuri (JV), Badari Pulavarty

This patch avoids copy_to_user by employing get_user_pages_fast() on the
udata buffer. This will eliminate an additonal copy of kernel buffer into
user buffer. We filter out the kernel buffers (kernel_read()) by comparing
segments.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 net/9p/client.c   |   18 ++++++++++++++++--
 net/9p/protocol.c |   33 +++++++++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 7ce58fb..d11c7dd 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1282,6 +1282,7 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
 	struct p9_client *clnt;
 	struct p9_req_t *req;
 	char *dataptr;
+	int page_direct = 0;
 
 	P9_DPRINTK(P9_DEBUG_9P, ">>> TREAD fid %d offset %llu %d\n", fid->fid,
 					(long long unsigned) offset, count);
@@ -1296,7 +1297,20 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
 	if (count < rsize)
 		rsize = count;
 
-	req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
+	if (clnt->trans_mod->capability &&
+		clnt->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES) &&
+			(udata && !segment_eq(get_fs(), KERNEL_DS)))
+		page_direct = 1;
+
+	if (page_direct) {
+		rsize = count;
+		req = p9_client_rpc(clnt, P9_TREAD, "dqE", fid->fid, offset,
+				rsize, udata);
+	} else {
+		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
+				rsize);
+	}
+
 	if (IS_ERR(req)) {
 		err = PTR_ERR(req);
 		goto error;
@@ -1314,7 +1328,7 @@ p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset,
 		memmove(data, dataptr, count);
 	}
 
-	if (udata) {
+	if (udata && !page_direct) {
 		err = copy_to_user(udata, dataptr, count);
 		if (err) {
 			err = -EFAULT;
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 97f313d..b82d117 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -194,6 +194,25 @@ pdu_write_u(struct p9_fcall *pdu, struct p9_client *c, const char __user *udata,
 	return len;
 }
 
+	static size_t
+pdu_write_ur(struct p9_fcall *pdu, struct p9_client *c,
+		const char __user *udata, size_t size)
+{
+	size_t len = size;
+	int max_req_sg_pages = 0;
+
+	if (c->trans_mod->capability)
+		max_req_sg_pages =
+			c->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES);
+	if (max_req_sg_pages) {
+		len = pdu_fill_pages(pdu, udata, size, 1, max_req_sg_pages);
+		if (len < 0)
+			return len;
+		pdu->pdata_read_len = len;
+	}
+	return len;
+}
+
 /*
 	b - int8_t
 	w - int16_t
@@ -534,8 +553,18 @@ p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
 				if (!errcode &&
 					pdu_write_u(pdu, c, udata, count) < 0)
 					errcode = -EFAULT;
-			}
-			break;
+			 }
+			 break;
+		case 'E':{
+				 int32_t count = va_arg(ap, int32_t);
+				 const char __user *udata =
+					 va_arg(ap, const void __user *);
+				 errcode = p9pdu_writef(pdu, c, "d", count);
+				 if (!errcode &&
+					pdu_write_ur(pdu, c, udata, count) < 0)
+					errcode = -EFAULT;
+			 }
+			 break;
 		case 'T':{
 				int16_t nwname = va_arg(ap, int);
 				const char **wnames = va_arg(ap, const char **);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
@ 2010-08-17 20:43   ` Eric Van Hensbergen
  2010-08-17 20:46     ` Latchesar Ionkov
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Van Hensbergen @ 2010-08-17 20:43 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

On Tue, Aug 17, 2010 at 12:27 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Every transport layer may have unique capabilities. This function is employed
> to query and make use of those special/unique cabailities.
> To start with we will be defining P9_CAP_GET_MAX_SG_PAGES.
> If capability(P9_CAP_GET_MAX_SG_PAGES) exists AND returns a value greater
> than 0, it means that the transport can support the transportation
> of that many mapped pages between the client and server.
>

Is there a good reason to make this a function versus a flag/bit-mask
in the transport structure?  Having to call an extra function
(hopefully it gets inlined, but its function pointer so I'm thinking
it won't) every client_rpc call seems like a mistake.

Additionally, given we know the page size constant, couldn't we infer
this from a negotiated msize?  Transports already have an maxsize
field which limits the msize selections (and defaults? if not maybe it
should?) -- why not just use that?

      -eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-17 20:43   ` [V9fs-developer] " Eric Van Hensbergen
@ 2010-08-17 20:46     ` Latchesar Ionkov
  2010-08-17 23:31       ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Latchesar Ionkov @ 2010-08-17 20:46 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: Venkateswararao Jujjuri (JV),
	linux-fsdevel, v9fs-developer, Badari Pulavarty

Or call the function once when the client is created and store the
value in the client struct. I don't think the call will be inlined.

Thanks,
    Lucho

On Tue, Aug 17, 2010 at 2:43 PM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
> On Tue, Aug 17, 2010 at 12:27 PM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>> Every transport layer may have unique capabilities. This function is employed
>> to query and make use of those special/unique cabailities.
>> To start with we will be defining P9_CAP_GET_MAX_SG_PAGES.
>> If capability(P9_CAP_GET_MAX_SG_PAGES) exists AND returns a value greater
>> than 0, it means that the transport can support the transportation
>> of that many mapped pages between the client and server.
>>
>
> Is there a good reason to make this a function versus a flag/bit-mask
> in the transport structure?  Having to call an extra function
> (hopefully it gets inlined, but its function pointer so I'm thinking
> it won't) every client_rpc call seems like a mistake.
>
> Additionally, given we know the page size constant, couldn't we infer
> this from a negotiated msize?  Transports already have an maxsize
> field which limits the msize selections (and defaults? if not maybe it
> should?) -- why not just use that?
>
>      -eric
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-17 20:46     ` Latchesar Ionkov
@ 2010-08-17 23:31       ` Venkateswararao Jujjuri (JV)
  2010-08-18 15:16         ` Eric Van Hensbergen
  0 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-17 23:31 UTC (permalink / raw)
  To: Latchesar Ionkov
  Cc: Eric Van Hensbergen, linux-fsdevel, v9fs-developer, Badari Pulavarty

Latchesar Ionkov wrote:
> Or call the function once when the client is created and store the
> value in the client struct. I don't think the call will be inlined.

I think we can do this. Add a filed in the p9_client structure and do this call
during
client initialization..


> 
> Thanks,
>     Lucho
> 
> On Tue, Aug 17, 2010 at 2:43 PM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
>> On Tue, Aug 17, 2010 at 12:27 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>> Every transport layer may have unique capabilities. This function is employed
>>> to query and make use of those special/unique cabailities.
>>> To start with we will be defining P9_CAP_GET_MAX_SG_PAGES.
>>> If capability(P9_CAP_GET_MAX_SG_PAGES) exists AND returns a value greater
>>> than 0, it means that the transport can support the transportation
>>> of that many mapped pages between the client and server.
>>>
>> Is there a good reason to make this a function versus a flag/bit-mask
>> in the transport structure?  Having to call an extra function
>> (hopefully it gets inlined, but its function pointer so I'm thinking
>> it won't) every client_rpc call seems like a mistake.
>>
>> Additionally, given we know the page size constant, couldn't we infer
>> this from a negotiated msize?  Transports already have an maxsize
>> field which limits the msize selections (and defaults? if not maybe it
>> should?) -- why not just use that?

The IO size in this mode is derived by virtio ring size rather than the msize.
msize is restricted to the maximum amount of data that gets copied on the the
kmalloc'd buffer.
In this case it is just the header.

I think moving the #of pages that can be accommodated onto the virtio ring to
the p9_client,
we can use it along with the page_size to determine how much data we are passing.
With this we can tune our calculations on the read/write side .. so that we can
derive better
logic to handle short reads.

Thanks,
JV

>>
>>      -eric
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> V9fs-developer mailing list
>> V9fs-developer@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-17 23:31       ` Venkateswararao Jujjuri (JV)
@ 2010-08-18 15:16         ` Eric Van Hensbergen
  2010-08-18 16:56           ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Van Hensbergen @ 2010-08-18 15:16 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: Latchesar Ionkov, linux-fsdevel, Badari Pulavarty, v9fs-developer

On Tue, Aug 17, 2010 at 6:31 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
>>>
>>> Additionally, given we know the page size constant, couldn't we infer
>>> this from a negotiated msize?  Transports already have an maxsize
>>> field which limits the msize selections (and defaults? if not maybe it
>>> should?) -- why not just use that?
>
> The IO size in this mode is derived by virtio ring size rather than the msize.
> msize is restricted to the maximum amount of data that gets copied on the the
> kmalloc'd buffer.
> In this case it is just the header.
>

Actually, msize is supposed to be the maximum amount of data that gets
copied over the transport which should be a factor determined by the
client and server and the underlying technology which links them.
From that aspect I think its a perfectly valid way to set this up and
is already part of the mechanisms.

>
> I think moving the #of pages that can be accommodated onto the virtio ring to
> the p9_client,
> we can use it along with the page_size to determine how much data we are passing.
> With this we can tune our calculations on the read/write side .. so that we can
> derive better
> logic to handle short reads.
>

I don't know if I saw such a use in the existing patch.  Can you give
me a concrete use case where we would use multiple "short" pages
instead of mostly full pages and potentially a single short page at
the end?

           -eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-18 15:16         ` Eric Van Hensbergen
@ 2010-08-18 16:56           ` Venkateswararao Jujjuri (JV)
  2010-08-18 18:26             ` Eric Van Hensbergen
  0 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-18 16:56 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: Latchesar Ionkov, linux-fsdevel, Badari Pulavarty, v9fs-developer

Eric Van Hensbergen wrote:
> On Tue, Aug 17, 2010 at 6:31 PM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>>>> Additionally, given we know the page size constant, couldn't we infer
>>>> this from a negotiated msize?  Transports already have an maxsize
>>>> field which limits the msize selections (and defaults? if not maybe it
>>>> should?) -- why not just use that?
>> The IO size in this mode is derived by virtio ring size rather than the msize.
>> msize is restricted to the maximum amount of data that gets copied on the the
>> kmalloc'd buffer.
>> In this case it is just the header.
>>
> 
> Actually, msize is supposed to be the maximum amount of data that gets
> copied over the transport which should be a factor determined by the
> client and server and the underlying technology which links them.
> From that aspect I think its a perfectly valid way to set this up and
> is already part of the mechanisms.
Our intention of this patch was to achieve zero copy for reads/writes with
minimal changes, not affecting other transports and other vfs transactions.

In this method, if the transport has capability to send user pages directly, we
are using/treating
the msize as the maximum allowed size for header, and the actual payload is
limited/controlled
by the virtio ring size.  With these minor set of changes, we achieved the zero
copy on the most
important code path, read/write...without any changes to other sections of the code.

This could be a step in the right direction, in the future when we are ready to
modify
other code paths like readdir, xattr we can take more broader set of changes.
Currently we are using 8k as the msize...kmalloc() on anything beyond 1 page(4k)
may fail
under high load.. so we may have to revisit our entire logic somewhere down the line
on how do we segregate header from the actual payload.


> 
>> I think moving the #of pages that can be accommodated onto the virtio ring to
>> the p9_client,
>> we can use it along with the page_size to determine how much data we are passing.
>> With this we can tune our calculations on the read/write side .. so that we can
>> derive better
>> logic to handle short reads.
>>
> 
> I don't know if I saw such a use in the existing patch.  Can you give
> me a concrete use case where we would use multiple "short" pages
> instead of mostly full pages and potentially a single short page at
> the end?

I am talking about short reads/writes not short pages.
Yes at the most you will have two partial pages. (un-aligned start, and may be
last page)

The case where the request into VFS is more than the actual read/write we could do
either because of transport restriction/msize .. or something else.

Say, if the read size of 512k comes into vfs, and we could put only 480k into one
request because of virtio ring limitation (128-8(header) * 4k). At VFS/client level
we need to retry the read for the rest of the IO..before we return back to the user.

And this change is not in the patch yet.. I am proposing to have it in in my
next version.

> 
>            -eric



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 1/5] [net/9p] Add capability() to p9_trans_module
  2010-08-18 16:56           ` Venkateswararao Jujjuri (JV)
@ 2010-08-18 18:26             ` Eric Van Hensbergen
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Van Hensbergen @ 2010-08-18 18:26 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: Latchesar Ionkov, linux-fsdevel, Badari Pulavarty, v9fs-developer

On Wed, Aug 18, 2010 at 11:56 AM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Eric Van Hensbergen wrote:
>> On Tue, Aug 17, 2010 at 6:31 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>> Additionally, given we know the page size constant, couldn't we infer
>>>>> this from a negotiated msize?  Transports already have an maxsize
>>>>> field which limits the msize selections (and defaults? if not maybe it
>>>>> should?) -- why not just use that?
>>> The IO size in this mode is derived by virtio ring size rather than the msize.
>>> msize is restricted to the maximum amount of data that gets copied on the the
>>> kmalloc'd buffer.
>>> In this case it is just the header.
>>>
>>
>> Actually, msize is supposed to be the maximum amount of data that gets
>> copied over the transport which should be a factor determined by the
>> client and server and the underlying technology which links them.
>> From that aspect I think its a perfectly valid way to set this up and
>> is already part of the mechanisms.
> Our intention of this patch was to achieve zero copy for reads/writes with
> minimal changes, not affecting other transports and other vfs transactions.
>
> In this method, if the transport has capability to send user pages directly, we
> are using/treating
> the msize as the maximum allowed size for header, and the actual payload is
> limited/controlled
> by the virtio ring size.  With these minor set of changes, we achieved the zero
> copy on the most
> important code path, read/write...without any changes to other sections of the code.
>

I think this is a misuse of msize.  iohdrsize is the maximum allowed
header size, it is a component of msize and msize is always greater
than iohdrsize.  It just feels to me like we are duplicating
information.

> This could be a step in the right direction, in the future when we are ready to
> modify
> other code paths like readdir, xattr we can take more broader set of changes.
> Currently we are using 8k as the msize...kmalloc() on anything beyond 1 page(4k)
> may fail
> under high load.. so we may have to revisit our entire logic somewhere down the line
> on how do we segregate header from the actual payload.

This is a separate issue IMHO, but an important one to track.

>>
>> I don't know if I saw such a use in the existing patch.  Can you give
>> me a concrete use case where we would use multiple "short" pages
>> instead of mostly full pages and potentially a single short page at
>> the end?
>
> I am talking about short reads/writes not short pages.
> Yes at the most you will have two partial pages. (un-aligned start, and may be
> last page)
>
> The case where the request into VFS is more than the actual read/write we could do
> either because of transport restriction/msize .. or something else.
>
> Say, if the read size of 512k comes into vfs, and we could put only 480k into one
> request because of virtio ring limitation (128-8(header) * 4k). At VFS/client level
> we need to retry the read for the rest of the IO..before we return back to the user.
>

Okay - so the concern is that an unaligned start of the buffer might
lead to an oversubscription of the ring elements because msize is just
a static limit and does not take into consideration the alignment of
the initial buffer.  However, given that we are checking this in the
client code anyways, it still seems like we can use the msize as an
indication of how many elements we have in the scatter-gather ring
since it'll always be msize/pagesize = number of ring elements when we
are using an sg_array.  However, that may be a bit of an
oversubscription of the msize field, so I think I'm willing to accept
a field in a structure which represents the maximum number of elements
in a scatter/gather array.  I guess the only remaining question is
whether there would ever be case where this would need to be per
client or if we could just make it per transport.  My feeling right
now is that per-transport is fine as its all constants anyways - so
why not just add a field to the transport structure?

          -eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-17 17:27 ` [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list Venkateswararao Jujjuri (JV)
@ 2010-08-18 20:50   ` Latchesar Ionkov
  2010-08-19 18:28     ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Latchesar Ionkov @ 2010-08-18 20:50 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

Is there any particular reason p9_fcall to have pointers to pages
instead of a scatterlist?

Thanks,
    Lucho

On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> This patch adds necessary infrastructure for placing page addresses
> directly on the sg list for the server to consume.
>
> The newly added routine pack_sg_list_p() is just like pack_sg_list()
> except that it takes page array as an input and directly places them on
> the sg list after taking care of the first page offset.
>
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> ---
>  include/net/9p/9p.h   |    6 ++++-
>  net/9p/client.c       |    4 +++
>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
> index a8de812..382ef22 100644
> --- a/include/net/9p/9p.h
> +++ b/include/net/9p/9p.h
> @@ -651,7 +651,11 @@ struct p9_fcall {
>
>        size_t offset;
>        size_t capacity;
> -
> +       struct page **pdata;
> +       uint32_t pdata_mapped_pages;
> +       uint32_t pdata_off;
> +       uint32_t pdata_write_len;
> +       uint32_t pdata_read_len;
>        uint8_t *sdata;
>  };
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 29bbbbd..5487896 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>                }
>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>                req->tc->capacity = c->msize;
> +               req->tc->pdata_write_len = 0;
> +               req->tc->pdata_read_len = 0;
>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>                req->rc->capacity = c->msize;
> +               req->rc->pdata_write_len = 0;
> +               req->rc->pdata_read_len = 0;
>        }
>
>        p9pdu_reset(req->tc);
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 762c19f..8f86cb5 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>        return index-start;
>  }
>
> +/**
> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
> + * @sg: scatter/gather list to pack into
> + * @start: which segment of the sg_list to start at
> + * @limit: maximum segment to pack data to
> + * @pdu: pdu prepared to put on the wire.
> + * @count: amount of data to pack into the scatter/gather list
> + *
> + * This is just like pack_sg_list() except that it takes page array
> + * as an input and directly places them on the sg list after taking
> + * care of the first page offset.
> + */
> +
> +static int
> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
> +               struct p9_fcall *pdu, int count)
> +{
> +       int s;
> +       int i = 0;
> +       int index = start;
> +
> +       if (pdu->pdata_off) {
> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
> +               count -= s;
> +       }
> +
> +       while (count) {
> +               BUG_ON(index > limit);
> +               s = min((int)PAGE_SIZE, count);
> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
> +               count -= s;
> +       }
> +
> +       return index-start;
> +}
> +
> +
>  /* We don't currently allow canceling of virtio requests */
>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  {
> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  static int
>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -       int in, out;
> +       int in, out, outp, inp;
>        struct virtio_chan *chan = client->trans;
>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>
>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>
>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
> -                                                               req->tc->size);
> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
> +                       req->tc->size);
> +
> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
> +
> +       if (req->tc->pdata_write_len) {
> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
> +                                       req->tc, req->tc->pdata_write_len);
> +               out += outp;
> +       }
> +       if (req->tc->pdata_read_len) {
> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
> +                                       req->tc, req->tc->pdata_read_len);
> +               in += inp;
> +       } else {
> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>                                                                client->msize);
> +       }
>
>        req->status = REQ_STATUS_SENT;
>
> --
> 1.6.5.2
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-18 20:50   ` [V9fs-developer] " Latchesar Ionkov
@ 2010-08-19 18:28     ` Venkateswararao Jujjuri (JV)
  2010-08-19 18:49       ` Latchesar Ionkov
  0 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-19 18:28 UTC (permalink / raw)
  To: Latchesar Ionkov; +Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

Latchesar Ionkov wrote:
> Is there any particular reason p9_fcall to have pointers to pages
> instead of a scatterlist?

Given that page sizes are constant, all we need is offset into the first page.
IO size determines the last page. So we decided no need to put sg list in the
p9_fcall.

Thanks,
JV

> 
> Thanks,
>     Lucho
> 
> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>> This patch adds necessary infrastructure for placing page addresses
>> directly on the sg list for the server to consume.
>>
>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>> except that it takes page array as an input and directly places them on
>> the sg list after taking care of the first page offset.
>>
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>> ---
>>  include/net/9p/9p.h   |    6 ++++-
>>  net/9p/client.c       |    4 +++
>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>> index a8de812..382ef22 100644
>> --- a/include/net/9p/9p.h
>> +++ b/include/net/9p/9p.h
>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>
>>        size_t offset;
>>        size_t capacity;
>> -
>> +       struct page **pdata;
>> +       uint32_t pdata_mapped_pages;
>> +       uint32_t pdata_off;
>> +       uint32_t pdata_write_len;
>> +       uint32_t pdata_read_len;
>>        uint8_t *sdata;
>>  };
>>
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 29bbbbd..5487896 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>                }
>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>                req->tc->capacity = c->msize;
>> +               req->tc->pdata_write_len = 0;
>> +               req->tc->pdata_read_len = 0;
>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>                req->rc->capacity = c->msize;
>> +               req->rc->pdata_write_len = 0;
>> +               req->rc->pdata_read_len = 0;
>>        }
>>
>>        p9pdu_reset(req->tc);
>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>> index 762c19f..8f86cb5 100644
>> --- a/net/9p/trans_virtio.c
>> +++ b/net/9p/trans_virtio.c
>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>        return index-start;
>>  }
>>
>> +/**
>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>> + * @sg: scatter/gather list to pack into
>> + * @start: which segment of the sg_list to start at
>> + * @limit: maximum segment to pack data to
>> + * @pdu: pdu prepared to put on the wire.
>> + * @count: amount of data to pack into the scatter/gather list
>> + *
>> + * This is just like pack_sg_list() except that it takes page array
>> + * as an input and directly places them on the sg list after taking
>> + * care of the first page offset.
>> + */
>> +
>> +static int
>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>> +               struct p9_fcall *pdu, int count)
>> +{
>> +       int s;
>> +       int i = 0;
>> +       int index = start;
>> +
>> +       if (pdu->pdata_off) {
>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>> +               count -= s;
>> +       }
>> +
>> +       while (count) {
>> +               BUG_ON(index > limit);
>> +               s = min((int)PAGE_SIZE, count);
>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>> +               count -= s;
>> +       }
>> +
>> +       return index-start;
>> +}
>> +
>> +
>>  /* We don't currently allow canceling of virtio requests */
>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  {
>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>  static int
>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>  {
>> -       int in, out;
>> +       int in, out, outp, inp;
>>        struct virtio_chan *chan = client->trans;
>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>
>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>
>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>> -                                                               req->tc->size);
>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>> +                       req->tc->size);
>> +
>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>> +
>> +       if (req->tc->pdata_write_len) {
>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>> +                                       req->tc, req->tc->pdata_write_len);
>> +               out += outp;
>> +       }
>> +       if (req->tc->pdata_read_len) {
>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>> +                                       req->tc, req->tc->pdata_read_len);
>> +               in += inp;
>> +       } else {
>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>                                                                client->msize);
>> +       }
>>
>>        req->status = REQ_STATUS_SENT;
>>
>> --
>> 1.6.5.2
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> V9fs-developer mailing list
>> V9fs-developer@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 18:28     ` Venkateswararao Jujjuri (JV)
@ 2010-08-19 18:49       ` Latchesar Ionkov
  2010-08-19 20:47         ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Latchesar Ionkov @ 2010-08-19 18:49 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

It is kind of strange that part of the fcall packet (everything other
than the Rread/Twrite data) is located in a buffer pointed by sdata,
and the rest is in pages pointed by pdata. A scatterlist would make it
tidier and easier to figure out what is where.

Thanks,
    Lucho

On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Latchesar Ionkov wrote:
>> Is there any particular reason p9_fcall to have pointers to pages
>> instead of a scatterlist?
>
> Given that page sizes are constant, all we need is offset into the first page.
> IO size determines the last page. So we decided no need to put sg list in the
> p9_fcall.
>
> Thanks,
> JV
>
>>
>> Thanks,
>>     Lucho
>>
>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>> This patch adds necessary infrastructure for placing page addresses
>>> directly on the sg list for the server to consume.
>>>
>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>> except that it takes page array as an input and directly places them on
>>> the sg list after taking care of the first page offset.
>>>
>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>> ---
>>>  include/net/9p/9p.h   |    6 ++++-
>>>  net/9p/client.c       |    4 +++
>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>> index a8de812..382ef22 100644
>>> --- a/include/net/9p/9p.h
>>> +++ b/include/net/9p/9p.h
>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>
>>>        size_t offset;
>>>        size_t capacity;
>>> -
>>> +       struct page **pdata;
>>> +       uint32_t pdata_mapped_pages;
>>> +       uint32_t pdata_off;
>>> +       uint32_t pdata_write_len;
>>> +       uint32_t pdata_read_len;
>>>        uint8_t *sdata;
>>>  };
>>>
>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>> index 29bbbbd..5487896 100644
>>> --- a/net/9p/client.c
>>> +++ b/net/9p/client.c
>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>                }
>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>                req->tc->capacity = c->msize;
>>> +               req->tc->pdata_write_len = 0;
>>> +               req->tc->pdata_read_len = 0;
>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>                req->rc->capacity = c->msize;
>>> +               req->rc->pdata_write_len = 0;
>>> +               req->rc->pdata_read_len = 0;
>>>        }
>>>
>>>        p9pdu_reset(req->tc);
>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>> index 762c19f..8f86cb5 100644
>>> --- a/net/9p/trans_virtio.c
>>> +++ b/net/9p/trans_virtio.c
>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>        return index-start;
>>>  }
>>>
>>> +/**
>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>> + * @sg: scatter/gather list to pack into
>>> + * @start: which segment of the sg_list to start at
>>> + * @limit: maximum segment to pack data to
>>> + * @pdu: pdu prepared to put on the wire.
>>> + * @count: amount of data to pack into the scatter/gather list
>>> + *
>>> + * This is just like pack_sg_list() except that it takes page array
>>> + * as an input and directly places them on the sg list after taking
>>> + * care of the first page offset.
>>> + */
>>> +
>>> +static int
>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>> +               struct p9_fcall *pdu, int count)
>>> +{
>>> +       int s;
>>> +       int i = 0;
>>> +       int index = start;
>>> +
>>> +       if (pdu->pdata_off) {
>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>> +               count -= s;
>>> +       }
>>> +
>>> +       while (count) {
>>> +               BUG_ON(index > limit);
>>> +               s = min((int)PAGE_SIZE, count);
>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>> +               count -= s;
>>> +       }
>>> +
>>> +       return index-start;
>>> +}
>>> +
>>> +
>>>  /* We don't currently allow canceling of virtio requests */
>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>  {
>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>  static int
>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>  {
>>> -       int in, out;
>>> +       int in, out, outp, inp;
>>>        struct virtio_chan *chan = client->trans;
>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>
>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>
>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>> -                                                               req->tc->size);
>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>> +                       req->tc->size);
>>> +
>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>> +
>>> +       if (req->tc->pdata_write_len) {
>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>> +                                       req->tc, req->tc->pdata_write_len);
>>> +               out += outp;
>>> +       }
>>> +       if (req->tc->pdata_read_len) {
>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>> +                                       req->tc, req->tc->pdata_read_len);
>>> +               in += inp;
>>> +       } else {
>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>                                                                client->msize);
>>> +       }
>>>
>>>        req->status = REQ_STATUS_SENT;
>>>
>>> --
>>> 1.6.5.2
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by
>>>
>>> Make an app they can't live without
>>> Enter the BlackBerry Developer Challenge
>>> http://p.sf.net/sfu/RIM-dev2dev
>>> _______________________________________________
>>> V9fs-developer mailing list
>>> V9fs-developer@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 4/5] [net/9p] Achieve zero copy on write path.
  2010-08-17 17:27 ` [PATCH 4/5] [net/9p] Achieve zero copy on write path Venkateswararao Jujjuri (JV)
@ 2010-08-19 19:30   ` Latchesar Ionkov
  2010-08-19 20:55     ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Latchesar Ionkov @ 2010-08-19 19:30 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

pdu_fill_pages doesn't calculate correctly pdata_len. first_page_bytes
should be min(PAGE_SIZE-pdu->pdata_off, size)

Thanks,
    Lucho

On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> This patch avoids copy_from_user by employing get_user_pages_fast() on the
> udata buffer. This will eliminate an additonal copy of user buffer into
> kernel buffer befre placing on the virtio ring.
>
> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> ---
>  net/9p/client.c   |   32 ++++++++++++++++++++-
>  net/9p/protocol.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 105 insertions(+), 10 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 5487896..7ce58fb 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -36,6 +36,7 @@
>  #include <linux/parser.h>
>  #include <net/9p/client.h>
>  #include <net/9p/transport.h>
> +#include <linux/mm.h>
>  #include "protocol.h"
>
>  /*
> @@ -521,6 +522,25 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
>  }
>
>  /**
> + * p9_release_req_pages - Release pages after the transaction.
> + * @req - Request buffer.
> + *
> + */
> +static void
> +p9_release_req_pages(struct p9_req_t *req)
> +{
> +       int i = 0;
> +       while (req->tc->pdata[i] && req->tc->pdata_mapped_pages--) {
> +               put_page(req->tc->pdata[i]);
> +               req->tc->pdata[i] = NULL;
> +               i++;
> +       }
> +       req->tc->pdata_write_len = 0;
> +       req->tc->pdata_read_len = 0;
> +}
> +
> +
> +/**
>  * p9_client_rpc - issue a request and wait for a response
>  * @c: client session
>  * @type: type of request
> @@ -575,6 +595,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>        err = c->trans_mod->request(c, req);
>        if (err < 0) {
>                c->status = Disconnected;
> +               if (req->tc->pdata_write_len || req->tc->pdata_read_len)
> +                       p9_release_req_pages(req);
>                goto reterr;
>        }
>
> @@ -583,6 +605,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>                                                req->status >= REQ_STATUS_RCVD);
>        P9_DPRINTK(P9_DEBUG_MUX, "wait %p tag: %d returned %d\n",
>                                                req->wq, tag, err);
> +       if (req->tc->pdata_write_len || req->tc->pdata_read_len)
> +               p9_release_req_pages(req);
>
>        if (req->status == REQ_STATUS_ERROR) {
>                P9_DPRINTK(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
> @@ -1331,9 +1355,15 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
>        if (data)
>                req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
>                                                                rsize, data);
> -       else
> +       else {
> +               if (clnt->trans_mod->capability &&
> +                       clnt->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES)) {
> +
> +                       rsize = count;
> +               }
>                req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
>                                                                rsize, udata);
> +       }
>        if (IS_ERR(req)) {
>                err = PTR_ERR(req);
>                goto error;
> diff --git a/net/9p/protocol.c b/net/9p/protocol.c
> index ca63aff..97f313d 100644
> --- a/net/9p/protocol.c
> +++ b/net/9p/protocol.c
> @@ -31,9 +31,12 @@
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/types.h>
> +#include <linux/parser.h>
>  #include <net/9p/9p.h>
>  #include <net/9p/client.h>
>  #include "protocol.h"
> +#include <net/9p/transport.h>
> +#include <linux/pagemap.h>
>
>  #ifndef MIN
>  #define MIN(a, b) (((a) < (b)) ? (a) : (b))
> @@ -110,6 +113,51 @@ static size_t pdu_read(struct p9_fcall *pdu, void *data, size_t size)
>        return size - len;
>  }
>
> +static int
> +pdu_fill_pages(struct p9_fcall *pdu, const char __user *udata, size_t size,
> +               int rw, int max_sg_pages)
> +{
> +       int nr_pages;
> +       uint32_t first_page_bytes = 0;
> +       int pdata_len;
> +
> +       nr_pages = size >> PAGE_SHIFT;
> +       pdu->pdata_off = (size_t)udata & (PAGE_SIZE-1);
> +       if (pdu->pdata_off)
> +               first_page_bytes = PAGE_SIZE - pdu->pdata_off;
> +       if (size - (first_page_bytes + (nr_pages << PAGE_SHIFT))) {
> +               /* trailing partial page */
> +               nr_pages++;
> +       }
> +       if (first_page_bytes) {
> +               /* leading partial page */
> +               nr_pages++;
> +       }
> +       nr_pages = min(max_sg_pages, nr_pages);
> +       pdu->pdata = (struct page **)(pdu->sdata + pdu->size);
> +       pdu->pdata_write_len = 0;
> +       pdu->pdata_read_len = 0;
> +       pdu->pdata_mapped_pages = get_user_pages_fast((unsigned long)udata,
> +                       nr_pages, rw, pdu->pdata);
> +       if (pdu->pdata_mapped_pages < 0) {
> +               printk(KERN_WARNING "get_user_pages_fast failed:%d udata:%p"
> +                               "nr_pages:%d\n", pdu->pdata_mapped_pages,
> +                               udata, nr_pages);
> +               pdu->pdata_mapped_pages = 0;
> +               return -1;
> +       }
> +       if (pdu->pdata_off) {
> +               pdata_len = first_page_bytes;
> +               pdata_len += min((size - pdata_len),
> +                               ((size_t)pdu->pdata_mapped_pages - 1) <<
> +                                                               PAGE_SHIFT);
> +       } else {
> +               pdata_len = min(size, (size_t)pdu->pdata_mapped_pages <<
> +                                                               PAGE_SHIFT);
> +       }
> +       return pdata_len;
> +}
> +
>  static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
>  {
>        size_t len = MIN(pdu->capacity - pdu->size, size);
> @@ -119,15 +167,31 @@ static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
>  }
>
>  static size_t
> -pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
> +pdu_write_u(struct p9_fcall *pdu, struct p9_client *c, const char __user *udata,
> +               size_t size)
>  {
> -       size_t len = MIN(pdu->capacity - pdu->size, size);
> -       int err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
> -       if (err)
> -               printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
> +       size_t len;
> +       int err;
> +       int max_req_sg_pages = 0;
>
> -       pdu->size += len;
> -       return size - len;
> +       if (c->trans_mod->capability &&
> +                       (udata && !segment_eq(get_fs(), KERNEL_DS))) {
> +               max_req_sg_pages =
> +                       c->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES);
> +       }
> +       if (max_req_sg_pages) {
> +               len = pdu_fill_pages(pdu, udata, size, 0, max_req_sg_pages);
> +               if (len < 0)
> +                       return len;
> +               pdu->pdata_write_len = len;
> +       } else {
> +               len = MIN(pdu->capacity - pdu->size, size);
> +               err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
> +               if (err)
> +                       printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
> +               pdu->size += len;
> +       }
> +       return len;
>  }
>
>  /*
> @@ -467,7 +531,8 @@ p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
>                                const char __user *udata =
>                                                va_arg(ap, const void __user *);
>                                errcode = p9pdu_writef(pdu, c, "d", count);
> -                               if (!errcode && pdu_write_u(pdu, udata, count))
> +                               if (!errcode &&
> +                                       pdu_write_u(pdu, c, udata, count) < 0)
>                                        errcode = -EFAULT;
>                        }
>                        break;
> --
> 1.6.5.2
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 18:49       ` Latchesar Ionkov
@ 2010-08-19 20:47         ` Venkateswararao Jujjuri (JV)
  2010-08-19 21:07           ` Latchesar Ionkov
  0 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-19 20:47 UTC (permalink / raw)
  To: Latchesar Ionkov; +Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

Latchesar Ionkov wrote:
> It is kind of strange that part of the fcall packet (everything other
> than the Rread/Twrite data) is located in a buffer pointed by sdata,
> and the rest is in pages pointed by pdata. A scatterlist would make it
> tidier and easier to figure out what is where.

A separate sg list will further increase the size of PDU. Initially I had a sg list
sitting in the fcall structure. But when we are using the page address directly,
we are not making use of sdata completely ...only initial portion is used for
the header.
To make use of kernel memory efficiently, we came up with this plan of overloading
sdata with page pointers during the user initiated Rread/Twrite calls.

The major advantage we saw with this method is, changes are very modular.
Other parts of the code, and other transports work without a change.

If needed, we can easily have a separate sg list vector in the fcall, but it may
not be using
the kernel memory efficiently as we have the whole sdata allocated but not being
used.

Thanks,
JV

> 
> Thanks,
>     Lucho
> 
> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>> Latchesar Ionkov wrote:
>>> Is there any particular reason p9_fcall to have pointers to pages
>>> instead of a scatterlist?
>> Given that page sizes are constant, all we need is offset into the first page.
>> IO size determines the last page. So we decided no need to put sg list in the
>> p9_fcall.
>>
>> Thanks,
>> JV
>>
>>> Thanks,
>>>     Lucho
>>>
>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>> This patch adds necessary infrastructure for placing page addresses
>>>> directly on the sg list for the server to consume.
>>>>
>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>> except that it takes page array as an input and directly places them on
>>>> the sg list after taking care of the first page offset.
>>>>
>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>> ---
>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>  net/9p/client.c       |    4 +++
>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>> index a8de812..382ef22 100644
>>>> --- a/include/net/9p/9p.h
>>>> +++ b/include/net/9p/9p.h
>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>
>>>>        size_t offset;
>>>>        size_t capacity;
>>>> -
>>>> +       struct page **pdata;
>>>> +       uint32_t pdata_mapped_pages;
>>>> +       uint32_t pdata_off;
>>>> +       uint32_t pdata_write_len;
>>>> +       uint32_t pdata_read_len;
>>>>        uint8_t *sdata;
>>>>  };
>>>>
>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>> index 29bbbbd..5487896 100644
>>>> --- a/net/9p/client.c
>>>> +++ b/net/9p/client.c
>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>                }
>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>                req->tc->capacity = c->msize;
>>>> +               req->tc->pdata_write_len = 0;
>>>> +               req->tc->pdata_read_len = 0;
>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>                req->rc->capacity = c->msize;
>>>> +               req->rc->pdata_write_len = 0;
>>>> +               req->rc->pdata_read_len = 0;
>>>>        }
>>>>
>>>>        p9pdu_reset(req->tc);
>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>> index 762c19f..8f86cb5 100644
>>>> --- a/net/9p/trans_virtio.c
>>>> +++ b/net/9p/trans_virtio.c
>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>        return index-start;
>>>>  }
>>>>
>>>> +/**
>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>> + * @sg: scatter/gather list to pack into
>>>> + * @start: which segment of the sg_list to start at
>>>> + * @limit: maximum segment to pack data to
>>>> + * @pdu: pdu prepared to put on the wire.
>>>> + * @count: amount of data to pack into the scatter/gather list
>>>> + *
>>>> + * This is just like pack_sg_list() except that it takes page array
>>>> + * as an input and directly places them on the sg list after taking
>>>> + * care of the first page offset.
>>>> + */
>>>> +
>>>> +static int
>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>> +               struct p9_fcall *pdu, int count)
>>>> +{
>>>> +       int s;
>>>> +       int i = 0;
>>>> +       int index = start;
>>>> +
>>>> +       if (pdu->pdata_off) {
>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>> +               count -= s;
>>>> +       }
>>>> +
>>>> +       while (count) {
>>>> +               BUG_ON(index > limit);
>>>> +               s = min((int)PAGE_SIZE, count);
>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>> +               count -= s;
>>>> +       }
>>>> +
>>>> +       return index-start;
>>>> +}
>>>> +
>>>> +
>>>>  /* We don't currently allow canceling of virtio requests */
>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>  {
>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>  static int
>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>  {
>>>> -       int in, out;
>>>> +       int in, out, outp, inp;
>>>>        struct virtio_chan *chan = client->trans;
>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>
>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>
>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>> -                                                               req->tc->size);
>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>> +                       req->tc->size);
>>>> +
>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>> +
>>>> +       if (req->tc->pdata_write_len) {
>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>> +               out += outp;
>>>> +       }
>>>> +       if (req->tc->pdata_read_len) {
>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>> +               in += inp;
>>>> +       } else {
>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>                                                                client->msize);
>>>> +       }
>>>>
>>>>        req->status = REQ_STATUS_SENT;
>>>>
>>>> --
>>>> 1.6.5.2
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by
>>>>
>>>> Make an app they can't live without
>>>> Enter the BlackBerry Developer Challenge
>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>> _______________________________________________
>>>> V9fs-developer mailing list
>>>> V9fs-developer@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 4/5] [net/9p] Achieve zero copy on write path.
  2010-08-19 19:30   ` [V9fs-developer] " Latchesar Ionkov
@ 2010-08-19 20:55     ` Venkateswararao Jujjuri (JV)
  0 siblings, 0 replies; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-19 20:55 UTC (permalink / raw)
  To: Latchesar Ionkov; +Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

Latchesar Ionkov wrote:
> pdu_fill_pages doesn't calculate correctly pdata_len. first_page_bytes
> should be min(PAGE_SIZE-pdu->pdata_off, size)

Good Catch. Thanks for finding it out. Will correct it in my next version.

Thanks,
JV

> 
> Thanks,
>     Lucho
> 
> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>> This patch avoids copy_from_user by employing get_user_pages_fast() on the
>> udata buffer. This will eliminate an additonal copy of user buffer into
>> kernel buffer befre placing on the virtio ring.
>>
>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>> ---
>>  net/9p/client.c   |   32 ++++++++++++++++++++-
>>  net/9p/protocol.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++------
>>  2 files changed, 105 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 5487896..7ce58fb 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/parser.h>
>>  #include <net/9p/client.h>
>>  #include <net/9p/transport.h>
>> +#include <linux/mm.h>
>>  #include "protocol.h"
>>
>>  /*
>> @@ -521,6 +522,25 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
>>  }
>>
>>  /**
>> + * p9_release_req_pages - Release pages after the transaction.
>> + * @req - Request buffer.
>> + *
>> + */
>> +static void
>> +p9_release_req_pages(struct p9_req_t *req)
>> +{
>> +       int i = 0;
>> +       while (req->tc->pdata[i] && req->tc->pdata_mapped_pages--) {
>> +               put_page(req->tc->pdata[i]);
>> +               req->tc->pdata[i] = NULL;
>> +               i++;
>> +       }
>> +       req->tc->pdata_write_len = 0;
>> +       req->tc->pdata_read_len = 0;
>> +}
>> +
>> +
>> +/**
>>  * p9_client_rpc - issue a request and wait for a response
>>  * @c: client session
>>  * @type: type of request
>> @@ -575,6 +595,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>>        err = c->trans_mod->request(c, req);
>>        if (err < 0) {
>>                c->status = Disconnected;
>> +               if (req->tc->pdata_write_len || req->tc->pdata_read_len)
>> +                       p9_release_req_pages(req);
>>                goto reterr;
>>        }
>>
>> @@ -583,6 +605,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>>                                                req->status >= REQ_STATUS_RCVD);
>>        P9_DPRINTK(P9_DEBUG_MUX, "wait %p tag: %d returned %d\n",
>>                                                req->wq, tag, err);
>> +       if (req->tc->pdata_write_len || req->tc->pdata_read_len)
>> +               p9_release_req_pages(req);
>>
>>        if (req->status == REQ_STATUS_ERROR) {
>>                P9_DPRINTK(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
>> @@ -1331,9 +1355,15 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata,
>>        if (data)
>>                req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset,
>>                                                                rsize, data);
>> -       else
>> +       else {
>> +               if (clnt->trans_mod->capability &&
>> +                       clnt->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES)) {
>> +
>> +                       rsize = count;
>> +               }
>>                req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset,
>>                                                                rsize, udata);
>> +       }
>>        if (IS_ERR(req)) {
>>                err = PTR_ERR(req);
>>                goto error;
>> diff --git a/net/9p/protocol.c b/net/9p/protocol.c
>> index ca63aff..97f313d 100644
>> --- a/net/9p/protocol.c
>> +++ b/net/9p/protocol.c
>> @@ -31,9 +31,12 @@
>>  #include <linux/slab.h>
>>  #include <linux/sched.h>
>>  #include <linux/types.h>
>> +#include <linux/parser.h>
>>  #include <net/9p/9p.h>
>>  #include <net/9p/client.h>
>>  #include "protocol.h"
>> +#include <net/9p/transport.h>
>> +#include <linux/pagemap.h>
>>
>>  #ifndef MIN
>>  #define MIN(a, b) (((a) < (b)) ? (a) : (b))
>> @@ -110,6 +113,51 @@ static size_t pdu_read(struct p9_fcall *pdu, void *data, size_t size)
>>        return size - len;
>>  }
>>
>> +static int
>> +pdu_fill_pages(struct p9_fcall *pdu, const char __user *udata, size_t size,
>> +               int rw, int max_sg_pages)
>> +{
>> +       int nr_pages;
>> +       uint32_t first_page_bytes = 0;
>> +       int pdata_len;
>> +
>> +       nr_pages = size >> PAGE_SHIFT;
>> +       pdu->pdata_off = (size_t)udata & (PAGE_SIZE-1);
>> +       if (pdu->pdata_off)
>> +               first_page_bytes = PAGE_SIZE - pdu->pdata_off;
>> +       if (size - (first_page_bytes + (nr_pages << PAGE_SHIFT))) {
>> +               /* trailing partial page */
>> +               nr_pages++;
>> +       }
>> +       if (first_page_bytes) {
>> +               /* leading partial page */
>> +               nr_pages++;
>> +       }
>> +       nr_pages = min(max_sg_pages, nr_pages);
>> +       pdu->pdata = (struct page **)(pdu->sdata + pdu->size);
>> +       pdu->pdata_write_len = 0;
>> +       pdu->pdata_read_len = 0;
>> +       pdu->pdata_mapped_pages = get_user_pages_fast((unsigned long)udata,
>> +                       nr_pages, rw, pdu->pdata);
>> +       if (pdu->pdata_mapped_pages < 0) {
>> +               printk(KERN_WARNING "get_user_pages_fast failed:%d udata:%p"
>> +                               "nr_pages:%d\n", pdu->pdata_mapped_pages,
>> +                               udata, nr_pages);
>> +               pdu->pdata_mapped_pages = 0;
>> +               return -1;
>> +       }
>> +       if (pdu->pdata_off) {
>> +               pdata_len = first_page_bytes;
>> +               pdata_len += min((size - pdata_len),
>> +                               ((size_t)pdu->pdata_mapped_pages - 1) <<
>> +                                                               PAGE_SHIFT);
>> +       } else {
>> +               pdata_len = min(size, (size_t)pdu->pdata_mapped_pages <<
>> +                                                               PAGE_SHIFT);
>> +       }
>> +       return pdata_len;
>> +}
>> +
>>  static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
>>  {
>>        size_t len = MIN(pdu->capacity - pdu->size, size);
>> @@ -119,15 +167,31 @@ static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size)
>>  }
>>
>>  static size_t
>> -pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size)
>> +pdu_write_u(struct p9_fcall *pdu, struct p9_client *c, const char __user *udata,
>> +               size_t size)
>>  {
>> -       size_t len = MIN(pdu->capacity - pdu->size, size);
>> -       int err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
>> -       if (err)
>> -               printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
>> +       size_t len;
>> +       int err;
>> +       int max_req_sg_pages = 0;
>>
>> -       pdu->size += len;
>> -       return size - len;
>> +       if (c->trans_mod->capability &&
>> +                       (udata && !segment_eq(get_fs(), KERNEL_DS))) {
>> +               max_req_sg_pages =
>> +                       c->trans_mod->capability(P9_CAP_GET_MAX_SG_PAGES);
>> +       }
>> +       if (max_req_sg_pages) {
>> +               len = pdu_fill_pages(pdu, udata, size, 0, max_req_sg_pages);
>> +               if (len < 0)
>> +                       return len;
>> +               pdu->pdata_write_len = len;
>> +       } else {
>> +               len = MIN(pdu->capacity - pdu->size, size);
>> +               err = copy_from_user(&pdu->sdata[pdu->size], udata, len);
>> +               if (err)
>> +                       printk(KERN_WARNING "pdu_write_u returning: %d\n", err);
>> +               pdu->size += len;
>> +       }
>> +       return len;
>>  }
>>
>>  /*
>> @@ -467,7 +531,8 @@ p9pdu_vwritef(struct p9_fcall *pdu, struct p9_client *c, const char *fmt,
>>                                const char __user *udata =
>>                                                va_arg(ap, const void __user *);
>>                                errcode = p9pdu_writef(pdu, c, "d", count);
>> -                               if (!errcode && pdu_write_u(pdu, udata, count))
>> +                               if (!errcode &&
>> +                                       pdu_write_u(pdu, c, udata, count) < 0)
>>                                        errcode = -EFAULT;
>>                        }
>>                        break;
>> --
>> 1.6.5.2
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> V9fs-developer mailing list
>> V9fs-developer@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 20:47         ` Venkateswararao Jujjuri (JV)
@ 2010-08-19 21:07           ` Latchesar Ionkov
  2010-08-19 21:26             ` Eric Van Hensbergen
  0 siblings, 1 reply; 22+ messages in thread
From: Latchesar Ionkov @ 2010-08-19 21:07 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: v9fs-developer, linux-fsdevel, Badari Pulavarty

Frankly, I don't like the way the zero copy is implemented much.

1. In case of Twrite, the size[4] field of the 9P message contains the
size of the message without the data field.

2. In case of Tread message, the pages to receive the data are
contained in req->tc instead of req->rc.

What is the point of having the page data in p9_fcall, if only the one
in req->tc is ever going to be used? I am trying to implement a
transport that uses your patches, I have to fill my code with
explanations why I am doing crazy things.

Thanks,
    Lucho

On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Latchesar Ionkov wrote:
>> It is kind of strange that part of the fcall packet (everything other
>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>> and the rest is in pages pointed by pdata. A scatterlist would make it
>> tidier and easier to figure out what is where.
>
> A separate sg list will further increase the size of PDU. Initially I had a sg list
> sitting in the fcall structure. But when we are using the page address directly,
> we are not making use of sdata completely ...only initial portion is used for
> the header.
> To make use of kernel memory efficiently, we came up with this plan of overloading
> sdata with page pointers during the user initiated Rread/Twrite calls.
>
> The major advantage we saw with this method is, changes are very modular.
> Other parts of the code, and other transports work without a change.
>
> If needed, we can easily have a separate sg list vector in the fcall, but it may
> not be using
> the kernel memory efficiently as we have the whole sdata allocated but not being
> used.
>
> Thanks,
> JV
>
>>
>> Thanks,
>>     Lucho
>>
>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>> Latchesar Ionkov wrote:
>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>> instead of a scatterlist?
>>> Given that page sizes are constant, all we need is offset into the first page.
>>> IO size determines the last page. So we decided no need to put sg list in the
>>> p9_fcall.
>>>
>>> Thanks,
>>> JV
>>>
>>>> Thanks,
>>>>     Lucho
>>>>
>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>> directly on the sg list for the server to consume.
>>>>>
>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>> except that it takes page array as an input and directly places them on
>>>>> the sg list after taking care of the first page offset.
>>>>>
>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>>> ---
>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>  net/9p/client.c       |    4 +++
>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>> index a8de812..382ef22 100644
>>>>> --- a/include/net/9p/9p.h
>>>>> +++ b/include/net/9p/9p.h
>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>
>>>>>        size_t offset;
>>>>>        size_t capacity;
>>>>> -
>>>>> +       struct page **pdata;
>>>>> +       uint32_t pdata_mapped_pages;
>>>>> +       uint32_t pdata_off;
>>>>> +       uint32_t pdata_write_len;
>>>>> +       uint32_t pdata_read_len;
>>>>>        uint8_t *sdata;
>>>>>  };
>>>>>
>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>> index 29bbbbd..5487896 100644
>>>>> --- a/net/9p/client.c
>>>>> +++ b/net/9p/client.c
>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>                }
>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>                req->tc->capacity = c->msize;
>>>>> +               req->tc->pdata_write_len = 0;
>>>>> +               req->tc->pdata_read_len = 0;
>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>                req->rc->capacity = c->msize;
>>>>> +               req->rc->pdata_write_len = 0;
>>>>> +               req->rc->pdata_read_len = 0;
>>>>>        }
>>>>>
>>>>>        p9pdu_reset(req->tc);
>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>> index 762c19f..8f86cb5 100644
>>>>> --- a/net/9p/trans_virtio.c
>>>>> +++ b/net/9p/trans_virtio.c
>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>        return index-start;
>>>>>  }
>>>>>
>>>>> +/**
>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>> + * @sg: scatter/gather list to pack into
>>>>> + * @start: which segment of the sg_list to start at
>>>>> + * @limit: maximum segment to pack data to
>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>> + *
>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>> + * as an input and directly places them on the sg list after taking
>>>>> + * care of the first page offset.
>>>>> + */
>>>>> +
>>>>> +static int
>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>> +               struct p9_fcall *pdu, int count)
>>>>> +{
>>>>> +       int s;
>>>>> +       int i = 0;
>>>>> +       int index = start;
>>>>> +
>>>>> +       if (pdu->pdata_off) {
>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>> +               count -= s;
>>>>> +       }
>>>>> +
>>>>> +       while (count) {
>>>>> +               BUG_ON(index > limit);
>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>> +               count -= s;
>>>>> +       }
>>>>> +
>>>>> +       return index-start;
>>>>> +}
>>>>> +
>>>>> +
>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>  {
>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>  static int
>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>  {
>>>>> -       int in, out;
>>>>> +       int in, out, outp, inp;
>>>>>        struct virtio_chan *chan = client->trans;
>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>
>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>
>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>> -                                                               req->tc->size);
>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>> +                       req->tc->size);
>>>>> +
>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>> +
>>>>> +       if (req->tc->pdata_write_len) {
>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>> +               out += outp;
>>>>> +       }
>>>>> +       if (req->tc->pdata_read_len) {
>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>> +               in += inp;
>>>>> +       } else {
>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>                                                                client->msize);
>>>>> +       }
>>>>>
>>>>>        req->status = REQ_STATUS_SENT;
>>>>>
>>>>> --
>>>>> 1.6.5.2
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by
>>>>>
>>>>> Make an app they can't live without
>>>>> Enter the BlackBerry Developer Challenge
>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>> _______________________________________________
>>>>> V9fs-developer mailing list
>>>>> V9fs-developer@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>
>>>
>>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 21:07           ` Latchesar Ionkov
@ 2010-08-19 21:26             ` Eric Van Hensbergen
  2010-08-19 23:35               ` Venkateswararao Jujjuri (JV)
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Van Hensbergen @ 2010-08-19 21:26 UTC (permalink / raw)
  To: Latchesar Ionkov
  Cc: Venkateswararao Jujjuri (JV),
	linux-fsdevel, v9fs-developer, Badari Pulavarty

Yeah, I haven't done a thorough critique of the code yet as I'm
overloaded with other things, but my overall impression is similar to
lucho's.  Last time I looked at doing this my approach was to keep
everything zero copy in the fs and client layer and if I needed to
allocate/copy in the transport because it didn't support
scatter/gather I did it in the transport.

Of course, this touches more than just the virtio transport and so I
can understand the trepidation at broader changes, but it seems like
we are doing backflips to try and avoid broader changes and the result
is somewhat awkward due the overloaded nature of many of the fields --
in the end it is likely to lead to more problems.

          -eric


On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@ionkov.net> wrote:
> Frankly, I don't like the way the zero copy is implemented much.
>
> 1. In case of Twrite, the size[4] field of the 9P message contains the
> size of the message without the data field.
>
> 2. In case of Tread message, the pages to receive the data are
> contained in req->tc instead of req->rc.
>
> What is the point of having the page data in p9_fcall, if only the one
> in req->tc is ever going to be used? I am trying to implement a
> transport that uses your patches, I have to fill my code with
> explanations why I am doing crazy things.
>
> Thanks,
>    Lucho
>
> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
> <jvrao@linux.vnet.ibm.com> wrote:
>> Latchesar Ionkov wrote:
>>> It is kind of strange that part of the fcall packet (everything other
>>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>>> and the rest is in pages pointed by pdata. A scatterlist would make it
>>> tidier and easier to figure out what is where.
>>
>> A separate sg list will further increase the size of PDU. Initially I had a sg list
>> sitting in the fcall structure. But when we are using the page address directly,
>> we are not making use of sdata completely ...only initial portion is used for
>> the header.
>> To make use of kernel memory efficiently, we came up with this plan of overloading
>> sdata with page pointers during the user initiated Rread/Twrite calls.
>>
>> The major advantage we saw with this method is, changes are very modular.
>> Other parts of the code, and other transports work without a change.
>>
>> If needed, we can easily have a separate sg list vector in the fcall, but it may
>> not be using
>> the kernel memory efficiently as we have the whole sdata allocated but not being
>> used.
>>
>> Thanks,
>> JV
>>
>>>
>>> Thanks,
>>>     Lucho
>>>
>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>> Latchesar Ionkov wrote:
>>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>>> instead of a scatterlist?
>>>> Given that page sizes are constant, all we need is offset into the first page.
>>>> IO size determines the last page. So we decided no need to put sg list in the
>>>> p9_fcall.
>>>>
>>>> Thanks,
>>>> JV
>>>>
>>>>> Thanks,
>>>>>     Lucho
>>>>>
>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>>> directly on the sg list for the server to consume.
>>>>>>
>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>>> except that it takes page array as an input and directly places them on
>>>>>> the sg list after taking care of the first page offset.
>>>>>>
>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>>>> ---
>>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>>  net/9p/client.c       |    4 +++
>>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>>> index a8de812..382ef22 100644
>>>>>> --- a/include/net/9p/9p.h
>>>>>> +++ b/include/net/9p/9p.h
>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>>
>>>>>>        size_t offset;
>>>>>>        size_t capacity;
>>>>>> -
>>>>>> +       struct page **pdata;
>>>>>> +       uint32_t pdata_mapped_pages;
>>>>>> +       uint32_t pdata_off;
>>>>>> +       uint32_t pdata_write_len;
>>>>>> +       uint32_t pdata_read_len;
>>>>>>        uint8_t *sdata;
>>>>>>  };
>>>>>>
>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>>> index 29bbbbd..5487896 100644
>>>>>> --- a/net/9p/client.c
>>>>>> +++ b/net/9p/client.c
>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>>                }
>>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>>                req->tc->capacity = c->msize;
>>>>>> +               req->tc->pdata_write_len = 0;
>>>>>> +               req->tc->pdata_read_len = 0;
>>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>>                req->rc->capacity = c->msize;
>>>>>> +               req->rc->pdata_write_len = 0;
>>>>>> +               req->rc->pdata_read_len = 0;
>>>>>>        }
>>>>>>
>>>>>>        p9pdu_reset(req->tc);
>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>> index 762c19f..8f86cb5 100644
>>>>>> --- a/net/9p/trans_virtio.c
>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>>        return index-start;
>>>>>>  }
>>>>>>
>>>>>> +/**
>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>>> + * @sg: scatter/gather list to pack into
>>>>>> + * @start: which segment of the sg_list to start at
>>>>>> + * @limit: maximum segment to pack data to
>>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>>> + *
>>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>>> + * as an input and directly places them on the sg list after taking
>>>>>> + * care of the first page offset.
>>>>>> + */
>>>>>> +
>>>>>> +static int
>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>>> +               struct p9_fcall *pdu, int count)
>>>>>> +{
>>>>>> +       int s;
>>>>>> +       int i = 0;
>>>>>> +       int index = start;
>>>>>> +
>>>>>> +       if (pdu->pdata_off) {
>>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>>> +               count -= s;
>>>>>> +       }
>>>>>> +
>>>>>> +       while (count) {
>>>>>> +               BUG_ON(index > limit);
>>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>>> +               count -= s;
>>>>>> +       }
>>>>>> +
>>>>>> +       return index-start;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>  {
>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>  static int
>>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>>  {
>>>>>> -       int in, out;
>>>>>> +       int in, out, outp, inp;
>>>>>>        struct virtio_chan *chan = client->trans;
>>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>>
>>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>>
>>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>>> -                                                               req->tc->size);
>>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>>> +                       req->tc->size);
>>>>>> +
>>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>>> +
>>>>>> +       if (req->tc->pdata_write_len) {
>>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>>> +               out += outp;
>>>>>> +       }
>>>>>> +       if (req->tc->pdata_read_len) {
>>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>>> +               in += inp;
>>>>>> +       } else {
>>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>>                                                                client->msize);
>>>>>> +       }
>>>>>>
>>>>>>        req->status = REQ_STATUS_SENT;
>>>>>>
>>>>>> --
>>>>>> 1.6.5.2
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by
>>>>>>
>>>>>> Make an app they can't live without
>>>>>> Enter the BlackBerry Developer Challenge
>>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>>> _______________________________________________
>>>>>> V9fs-developer mailing list
>>>>>> V9fs-developer@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>>
>>>>
>>>>
>>
>>
>>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> V9fs-developer mailing list
> V9fs-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 21:26             ` Eric Van Hensbergen
@ 2010-08-19 23:35               ` Venkateswararao Jujjuri (JV)
  2010-08-20  0:27                 ` Eric Van Hensbergen
  0 siblings, 1 reply; 22+ messages in thread
From: Venkateswararao Jujjuri (JV) @ 2010-08-19 23:35 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: Latchesar Ionkov, linux-fsdevel, v9fs-developer, Badari Pulavarty

Eric Van Hensbergen wrote:
> Yeah, I haven't done a thorough critique of the code yet as I'm
> overloaded with other things, but my overall impression is similar to
> lucho's.  Last time I looked at doing this my approach was to keep
> everything zero copy in the fs and client layer and if I needed to
> allocate/copy in the transport because it didn't support
> scatter/gather I did it in the transport.
> 
> Of course, this touches more than just the virtio transport and so I
> can understand the trepidation at broader changes, but it seems like
> we are doing backflips to try and avoid broader changes and the result
> is somewhat awkward due the overloaded nature of many of the fields --
> in the end it is likely to lead to more problems.

I understand that it is not very pretty.. :) but the main motto was modular and
limited change.
Unless we separate the header from payload completely at the protocol level,
I think it may be difficult to achieve zero copy cleanly.

I will be more than happy to take your suggestions on how to get it right cleanly.
Increasing msize in the current model really doesn't solve the issue, as it forces
us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE.

Another area where we need to look at is, the same structure, 9p_fcall is being
used for
both transmission and reception. Most of the times only one way will need larger
memory,
not the other way.


> 
> On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@ionkov.net> wrote:
>> Frankly, I don't like the way the zero copy is implemented much.
>>
>> 1. In case of Twrite, the size[4] field of the 9P message contains the
>> size of the message without the data field.
>>
>> 2. In case of Tread message, the pages to receive the data are
>> contained in req->tc instead of req->rc.

We had some debate internally, on this before choosing req->tc. :)
The argument being, whatever client sends, it puts it on TC, and server responds
back on RC.
But I do understand your point too.

Eric/Lucho,

Please let me know if you have any suggestions to get this right.

Thanks,
JV

>>
>> What is the point of having the page data in p9_fcall, if only the one
>> in req->tc is ever going to be used? I am trying to implement a
>> transport that uses your patches, I have to fill my code with
>> explanations why I am doing crazy things.
>>
>> Thanks,
>>    Lucho
>>
>> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>> Latchesar Ionkov wrote:
>>>> It is kind of strange that part of the fcall packet (everything other
>>>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>>>> and the rest is in pages pointed by pdata. A scatterlist would make it
>>>> tidier and easier to figure out what is where.
>>> A separate sg list will further increase the size of PDU. Initially I had a sg list
>>> sitting in the fcall structure. But when we are using the page address directly,
>>> we are not making use of sdata completely ...only initial portion is used for
>>> the header.
>>> To make use of kernel memory efficiently, we came up with this plan of overloading
>>> sdata with page pointers during the user initiated Rread/Twrite calls.
>>>
>>> The major advantage we saw with this method is, changes are very modular.
>>> Other parts of the code, and other transports work without a change.
>>>
>>> If needed, we can easily have a separate sg list vector in the fcall, but it may
>>> not be using
>>> the kernel memory efficiently as we have the whole sdata allocated but not being
>>> used.
>>>
>>> Thanks,
>>> JV
>>>
>>>> Thanks,
>>>>     Lucho
>>>>
>>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>> Latchesar Ionkov wrote:
>>>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>>>> instead of a scatterlist?
>>>>> Given that page sizes are constant, all we need is offset into the first page.
>>>>> IO size determines the last page. So we decided no need to put sg list in the
>>>>> p9_fcall.
>>>>>
>>>>> Thanks,
>>>>> JV
>>>>>
>>>>>> Thanks,
>>>>>>     Lucho
>>>>>>
>>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>>>> directly on the sg list for the server to consume.
>>>>>>>
>>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>>>> except that it takes page array as an input and directly places them on
>>>>>>> the sg list after taking care of the first page offset.
>>>>>>>
>>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>>>>> ---
>>>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>>>  net/9p/client.c       |    4 +++
>>>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>>>> index a8de812..382ef22 100644
>>>>>>> --- a/include/net/9p/9p.h
>>>>>>> +++ b/include/net/9p/9p.h
>>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>>>
>>>>>>>        size_t offset;
>>>>>>>        size_t capacity;
>>>>>>> -
>>>>>>> +       struct page **pdata;
>>>>>>> +       uint32_t pdata_mapped_pages;
>>>>>>> +       uint32_t pdata_off;
>>>>>>> +       uint32_t pdata_write_len;
>>>>>>> +       uint32_t pdata_read_len;
>>>>>>>        uint8_t *sdata;
>>>>>>>  };
>>>>>>>
>>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>>>> index 29bbbbd..5487896 100644
>>>>>>> --- a/net/9p/client.c
>>>>>>> +++ b/net/9p/client.c
>>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>>>                }
>>>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>>>                req->tc->capacity = c->msize;
>>>>>>> +               req->tc->pdata_write_len = 0;
>>>>>>> +               req->tc->pdata_read_len = 0;
>>>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>>>                req->rc->capacity = c->msize;
>>>>>>> +               req->rc->pdata_write_len = 0;
>>>>>>> +               req->rc->pdata_read_len = 0;
>>>>>>>        }
>>>>>>>
>>>>>>>        p9pdu_reset(req->tc);
>>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>>> index 762c19f..8f86cb5 100644
>>>>>>> --- a/net/9p/trans_virtio.c
>>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>>>        return index-start;
>>>>>>>  }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>>>> + * @sg: scatter/gather list to pack into
>>>>>>> + * @start: which segment of the sg_list to start at
>>>>>>> + * @limit: maximum segment to pack data to
>>>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>>>> + *
>>>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>>>> + * as an input and directly places them on the sg list after taking
>>>>>>> + * care of the first page offset.
>>>>>>> + */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>>>> +               struct p9_fcall *pdu, int count)
>>>>>>> +{
>>>>>>> +       int s;
>>>>>>> +       int i = 0;
>>>>>>> +       int index = start;
>>>>>>> +
>>>>>>> +       if (pdu->pdata_off) {
>>>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>>>> +               count -= s;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       while (count) {
>>>>>>> +               BUG_ON(index > limit);
>>>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>>>> +               count -= s;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       return index-start;
>>>>>>> +}
>>>>>>> +
>>>>>>> +
>>>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  {
>>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  static int
>>>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  {
>>>>>>> -       int in, out;
>>>>>>> +       int in, out, outp, inp;
>>>>>>>        struct virtio_chan *chan = client->trans;
>>>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>>>
>>>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>>>
>>>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>>>> -                                                               req->tc->size);
>>>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>>>> +                       req->tc->size);
>>>>>>> +
>>>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>>>> +
>>>>>>> +       if (req->tc->pdata_write_len) {
>>>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>>>> +               out += outp;
>>>>>>> +       }
>>>>>>> +       if (req->tc->pdata_read_len) {
>>>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>>>> +               in += inp;
>>>>>>> +       } else {
>>>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>>>                                                                client->msize);
>>>>>>> +       }
>>>>>>>
>>>>>>>        req->status = REQ_STATUS_SENT;
>>>>>>>
>>>>>>> --
>>>>>>> 1.6.5.2
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by
>>>>>>>
>>>>>>> Make an app they can't live without
>>>>>>> Enter the BlackBerry Developer Challenge
>>>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>>>> _______________________________________________
>>>>>>> V9fs-developer mailing list
>>>>>>> V9fs-developer@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>>>
>>>>>
>>>
>>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> V9fs-developer mailing list
>> V9fs-developer@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
  2010-08-19 23:35               ` Venkateswararao Jujjuri (JV)
@ 2010-08-20  0:27                 ` Eric Van Hensbergen
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Van Hensbergen @ 2010-08-20  0:27 UTC (permalink / raw)
  To: Venkateswararao Jujjuri (JV)
  Cc: Latchesar Ionkov, linux-fsdevel, v9fs-developer, Badari Pulavarty

On Thu, Aug 19, 2010 at 6:35 PM, Venkateswararao Jujjuri (JV)
<jvrao@linux.vnet.ibm.com> wrote:
> Eric Van Hensbergen wrote:
>> Yeah, I haven't done a thorough critique of the code yet as I'm
>> overloaded with other things, but my overall impression is similar to
>> lucho's.  Last time I looked at doing this my approach was to keep
>> everything zero copy in the fs and client layer and if I needed to
>> allocate/copy in the transport because it didn't support
>> scatter/gather I did it in the transport.
>>
>> Of course, this touches more than just the virtio transport and so I
>> can understand the trepidation at broader changes, but it seems like
>> we are doing backflips to try and avoid broader changes and the result
>> is somewhat awkward due the overloaded nature of many of the fields --
>> in the end it is likely to lead to more problems.
>
> I understand that it is not very pretty.. :) but the main motto was modular and
> limited change.
> Unless we separate the header from payload completely at the protocol level,
> I think it may be difficult to achieve zero copy cleanly.
>

I somewhat disagree - the point is to separate the header from payload
from the API perspective -- it has little to do with the protocol.
While keeping things modular and limited change is a good goal, there
are certain things that are worth rethinking when we start jumping
through too many hoops.  I think zero-copy is something we need to try
and strive for in every transport where its possible, and perhaps it
should be possible in all of them -- so its worth doing right.  This
is why I was holding off on optimizations until after we got the
basics of virtfs and .L in place, because I knew they would involve
broader changes that would impact more of the code base and so would
take more time to crack.

>
> I will be more than happy to take your suggestions on how to get it right cleanly.
> Increasing msize in the current model really doesn't solve the issue, as it forces
> us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE.
>

In situations where we are using zero copy it may be possible to make
the per-packet allocated buffer IOHDRSIZE.  Its something thats worth
looking at carefully - IMHO it doesn't make sense to allocate msize
buffers if we are passing payloads by reference most of the time.

>
> Another area where we need to look at is, the same structure, 9p_fcall is being
> used for
> both transmission and reception. Most of the times only one way will need larger
> memory,
> not the other way.
>

This is almost definitely true, but it depends on what we are treating
as payload.  Essentially, we only need something other than IOHDRSIZE
for a subset of operations (read, write, stat, and some of the new
ones in .L) -- if those operations also use sg-lists then we should be
able to do a better job.

        -eric


>
>>
>> On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@ionkov.net> wrote:
>>> Frankly, I don't like the way the zero copy is implemented much.
>>>
>>> 1. In case of Twrite, the size[4] field of the 9P message contains the
>>> size of the message without the data field.
>>>
>>> 2. In case of Tread message, the pages to receive the data are
>>> contained in req->tc instead of req->rc.
>
> We had some debate internally, on this before choosing req->tc. :)
> The argument being, whatever client sends, it puts it on TC, and server responds
> back on RC.
> But I do understand your point too.
>
> Eric/Lucho,
>
> Please let me know if you have any suggestions to get this right.
>
> Thanks,
> JV
>
>>>
>>> What is the point of having the page data in p9_fcall, if only the one
>>> in req->tc is ever going to be used? I am trying to implement a
>>> transport that uses your patches, I have to fill my code with
>>> explanations why I am doing crazy things.
>>>
>>> Thanks,
>>>    Lucho
>>>
>>> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>> Latchesar Ionkov wrote:
>>>>> It is kind of strange that part of the fcall packet (everything other
>>>>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>>>>> and the rest is in pages pointed by pdata. A scatterlist would make it
>>>>> tidier and easier to figure out what is where.
>>>> A separate sg list will further increase the size of PDU. Initially I had a sg list
>>>> sitting in the fcall structure. But when we are using the page address directly,
>>>> we are not making use of sdata completely ...only initial portion is used for
>>>> the header.
>>>> To make use of kernel memory efficiently, we came up with this plan of overloading
>>>> sdata with page pointers during the user initiated Rread/Twrite calls.
>>>>
>>>> The major advantage we saw with this method is, changes are very modular.
>>>> Other parts of the code, and other transports work without a change.
>>>>
>>>> If needed, we can easily have a separate sg list vector in the fcall, but it may
>>>> not be using
>>>> the kernel memory efficiently as we have the whole sdata allocated but not being
>>>> used.
>>>>
>>>> Thanks,
>>>> JV
>>>>
>>>>> Thanks,
>>>>>     Lucho
>>>>>
>>>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>>> Latchesar Ionkov wrote:
>>>>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>>>>> instead of a scatterlist?
>>>>>> Given that page sizes are constant, all we need is offset into the first page.
>>>>>> IO size determines the last page. So we decided no need to put sg list in the
>>>>>> p9_fcall.
>>>>>>
>>>>>> Thanks,
>>>>>> JV
>>>>>>
>>>>>>> Thanks,
>>>>>>>     Lucho
>>>>>>>
>>>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>>>>> directly on the sg list for the server to consume.
>>>>>>>>
>>>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>>>>> except that it takes page array as an input and directly places them on
>>>>>>>> the sg list after taking care of the first page offset.
>>>>>>>>
>>>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>>>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>>>>>> ---
>>>>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>>>>  net/9p/client.c       |    4 +++
>>>>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>>>>> index a8de812..382ef22 100644
>>>>>>>> --- a/include/net/9p/9p.h
>>>>>>>> +++ b/include/net/9p/9p.h
>>>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>>>>
>>>>>>>>        size_t offset;
>>>>>>>>        size_t capacity;
>>>>>>>> -
>>>>>>>> +       struct page **pdata;
>>>>>>>> +       uint32_t pdata_mapped_pages;
>>>>>>>> +       uint32_t pdata_off;
>>>>>>>> +       uint32_t pdata_write_len;
>>>>>>>> +       uint32_t pdata_read_len;
>>>>>>>>        uint8_t *sdata;
>>>>>>>>  };
>>>>>>>>
>>>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>>>>> index 29bbbbd..5487896 100644
>>>>>>>> --- a/net/9p/client.c
>>>>>>>> +++ b/net/9p/client.c
>>>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>>>>                }
>>>>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>>>>                req->tc->capacity = c->msize;
>>>>>>>> +               req->tc->pdata_write_len = 0;
>>>>>>>> +               req->tc->pdata_read_len = 0;
>>>>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>>>>                req->rc->capacity = c->msize;
>>>>>>>> +               req->rc->pdata_write_len = 0;
>>>>>>>> +               req->rc->pdata_read_len = 0;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        p9pdu_reset(req->tc);
>>>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>>>> index 762c19f..8f86cb5 100644
>>>>>>>> --- a/net/9p/trans_virtio.c
>>>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>>>>        return index-start;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>>>>> + * @sg: scatter/gather list to pack into
>>>>>>>> + * @start: which segment of the sg_list to start at
>>>>>>>> + * @limit: maximum segment to pack data to
>>>>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>>>>> + *
>>>>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>>>>> + * as an input and directly places them on the sg list after taking
>>>>>>>> + * care of the first page offset.
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>>>>> +               struct p9_fcall *pdu, int count)
>>>>>>>> +{
>>>>>>>> +       int s;
>>>>>>>> +       int i = 0;
>>>>>>>> +       int index = start;
>>>>>>>> +
>>>>>>>> +       if (pdu->pdata_off) {
>>>>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>>>>> +               count -= s;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       while (count) {
>>>>>>>> +               BUG_ON(index > limit);
>>>>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>>>>> +               count -= s;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       return index-start;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +
>>>>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  {
>>>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  static int
>>>>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>>>>  {
>>>>>>>> -       int in, out;
>>>>>>>> +       int in, out, outp, inp;
>>>>>>>>        struct virtio_chan *chan = client->trans;
>>>>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>>>>
>>>>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>>>>
>>>>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>>>>> -                                                               req->tc->size);
>>>>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>>>>> +                       req->tc->size);
>>>>>>>> +
>>>>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>>>>> +
>>>>>>>> +       if (req->tc->pdata_write_len) {
>>>>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>>>>> +               out += outp;
>>>>>>>> +       }
>>>>>>>> +       if (req->tc->pdata_read_len) {
>>>>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>>>>> +               in += inp;
>>>>>>>> +       } else {
>>>>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>>>>                                                                client->msize);
>>>>>>>> +       }
>>>>>>>>
>>>>>>>>        req->status = REQ_STATUS_SENT;
>>>>>>>>
>>>>>>>> --
>>>>>>>> 1.6.5.2
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> This SF.net email is sponsored by
>>>>>>>>
>>>>>>>> Make an app they can't live without
>>>>>>>> Enter the BlackBerry Developer Challenge
>>>>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>>>>> _______________________________________________
>>>>>>>> V9fs-developer mailing list
>>>>>>>> V9fs-developer@lists.sourceforge.net
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>>>>
>>>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by
>>>
>>> Make an app they can't live without
>>> Enter the BlackBerry Developer Challenge
>>> http://p.sf.net/sfu/RIM-dev2dev
>>> _______________________________________________
>>> V9fs-developer mailing list
>>> V9fs-developer@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-08-20  0:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
2010-08-17 20:43   ` [V9fs-developer] " Eric Van Hensbergen
2010-08-17 20:46     ` Latchesar Ionkov
2010-08-17 23:31       ` Venkateswararao Jujjuri (JV)
2010-08-18 15:16         ` Eric Van Hensbergen
2010-08-18 16:56           ` Venkateswararao Jujjuri (JV)
2010-08-18 18:26             ` Eric Van Hensbergen
2010-08-17 17:27 ` [PATCH 2/5] [net/9p] Pass p9_client structure to pdu perpartion routines Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list Venkateswararao Jujjuri (JV)
2010-08-18 20:50   ` [V9fs-developer] " Latchesar Ionkov
2010-08-19 18:28     ` Venkateswararao Jujjuri (JV)
2010-08-19 18:49       ` Latchesar Ionkov
2010-08-19 20:47         ` Venkateswararao Jujjuri (JV)
2010-08-19 21:07           ` Latchesar Ionkov
2010-08-19 21:26             ` Eric Van Hensbergen
2010-08-19 23:35               ` Venkateswararao Jujjuri (JV)
2010-08-20  0:27                 ` Eric Van Hensbergen
2010-08-17 17:27 ` [PATCH 4/5] [net/9p] Achieve zero copy on write path Venkateswararao Jujjuri (JV)
2010-08-19 19:30   ` [V9fs-developer] " Latchesar Ionkov
2010-08-19 20:55     ` Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 5/5] [net/9p] Achieve zero copy on read path Venkateswararao Jujjuri (JV)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.