linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2 00/15] CIFS: Add direct I/O support
@ 2018-05-30 19:47 Long Li
  2018-05-30 19:47 ` [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures Long Li
                   ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

This patch set implements direct I/O.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons of possible protocol
required signing and encryption on user data.

With this patch set, CIFS passes the I/O data directly from user-space
buffer to the transport layer, when file system is mounted with
"cache-none".

Patch v2 addressed comments from Christoph Hellwig <hch@lst.de> and
Tom Talpey <ttalpey@microsoft.com> to implement direct I/O for both
socket and RDMA.


Long Li (15):
  CIFS: Introduce offset for the 1st page in data transfer structures
  CIFS: Add support for direct pages in rdata
  CIFS: Use offset when reading pages
  CIFS: Add support for direct pages in wdata
  CIFS: Calculate the correct request length based on page offset and
    tail size
  CIFS: Introduce helper function to get page offset and length in
    smb_rqst
  CIFS: When sending data on socket, pass the correct page offset
  CIFS: SMBD: Support page offset in RDMA send
  CIFS: SMBD: Support page offset in RDMA recv
  CIFS: SMBD: Support page offset in memory registration
  CIFS: Pass page offset for calculating signature
  CIFS: Pass page offset for encrypting
  CIFS: Add support for direct I/O read
  CIFS: Add support for direct I/O write
  CIFS: Add direct I/O functions to file_operations

 fs/cifs/cifsencrypt.c |   9 +-
 fs/cifs/cifsfs.c      |   5 +-
 fs/cifs/cifsfs.h      |   2 +
 fs/cifs/cifsglob.h    |   7 +-
 fs/cifs/cifsproto.h   |   9 +-
 fs/cifs/cifssmb.c     |  17 ++-
 fs/cifs/connect.c     |   5 +-
 fs/cifs/file.c        | 389 +++++++++++++++++++++++++++++++++++++++++++++++---
 fs/cifs/misc.c        |  17 +++
 fs/cifs/smb2ops.c     |  22 +--
 fs/cifs/smb2pdu.c     |  20 ++-
 fs/cifs/smbdirect.c   | 121 ++++++++++------
 fs/cifs/smbdirect.h   |   2 +-
 fs/cifs/transport.c   |  34 +++--
 14 files changed, 554 insertions(+), 105 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-05-30 19:47 ` [Patch v2 02/15] CIFS: Add support for direct pages in rdata Long Li
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When direct I/O is used, the data buffer may not always align to page
boundaries. Introduce a page offset in transport data structures to
describe the location of the buffer within the page.

Also change the function to pass the page offset when sending data to
transport.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h | 3 +++
 fs/cifs/smb2pdu.c  | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 4f674b7..8d16c3e 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -176,6 +176,7 @@ struct smb_rqst {
 	struct kvec	*rq_iov;	/* array of kvecs */
 	unsigned int	rq_nvec;	/* number of kvecs in array */
 	struct page	**rq_pages;	/* pointer to array of page ptrs */
+	unsigned int	rq_offset;	/* the offset to the 1st page */
 	unsigned int	rq_npages;	/* number pages in array */
 	unsigned int	rq_pagesz;	/* page size to use */
 	unsigned int	rq_tailsz;	/* length of last page */
@@ -1174,6 +1175,7 @@ struct cifs_readdata {
 	struct smbd_mr			*mr;
 #endif
 	unsigned int			pagesz;
+	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
@@ -1199,6 +1201,7 @@ struct cifs_writedata {
 	struct smbd_mr			*mr;
 #endif
 	unsigned int			pagesz;
+	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 47d5331..a02f6b6 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3045,6 +3045,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
 	rqst.rq_pages = wdata->pages;
+	rqst.rq_offset = wdata->page_offset;
 	rqst.rq_npages = wdata->nr_pages;
 	rqst.rq_pagesz = wdata->pagesz;
 	rqst.rq_tailsz = wdata->tailsz;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
  2018-05-30 19:47 ` [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-05-30 20:27   ` Ruhl, Michael J
  2018-06-24  1:50   ` Tom Talpey
  2018-05-30 19:47 ` [Patch v2 03/15] CIFS: Use offset when reading pages Long Li
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Add a function to allocate rdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages
that point to the data buffer.

rdata is still reponsible for free those pages after it's done.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h |  2 +-
 fs/cifs/file.c     | 23 ++++++++++++++++++++---
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 8d16c3e..56864a87 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1179,7 +1179,7 @@ struct cifs_readdata {
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
-	struct page			*pages[];
+	struct page			**pages;
 };
 
 struct cifs_writedata;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..1c98293 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
 }
 
 static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
 {
 	struct cifs_readdata *rdata;
 
-	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
-			GFP_KERNEL);
+	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
 	if (rdata != NULL) {
+		rdata->pages = pages;
 		kref_init(&rdata->refcount);
 		INIT_LIST_HEAD(&rdata->list);
 		init_completion(&rdata->done);
@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
 	return rdata;
 }
 
+static struct cifs_readdata *
+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+{
+	struct page **pages =
+		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
+	struct cifs_readdata *ret = NULL;
+
+	if (pages) {
+		ret = cifs_readdata_direct_alloc(pages, complete);
+		if (!ret)
+			kfree(pages);
+	}
+
+	return ret;
+}
+
 void
 cifs_readdata_release(struct kref *refcount)
 {
@@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
 	if (rdata->cfile)
 		cifsFileInfo_put(rdata->cfile);
 
+	kvfree(rdata->pages);
 	kfree(rdata);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 03/15] CIFS: Use offset when reading pages
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
  2018-05-30 19:47 ` [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures Long Li
  2018-05-30 19:47 ` [Patch v2 02/15] CIFS: Add support for direct pages in rdata Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-06-24  1:58   ` Tom Talpey
  2018-05-30 19:47 ` [Patch v2 04/15] CIFS: Add support for direct pages in wdata Long Li
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With offset defined in rdata, transport functions need to look at this
offset when reading data into the correct places in pages.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h |  4 +++-
 fs/cifs/connect.c   |  5 +++--
 fs/cifs/file.c      | 52 +++++++++++++++++++++++++++++++++++++---------------
 fs/cifs/smb2ops.c   |  2 +-
 fs/cifs/smb2pdu.c   |  1 +
 5 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index dc80f84..1f27d8e 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
 extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 			         unsigned int to_read);
 extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
-				      struct page *page, unsigned int to_read);
+					struct page *page,
+					unsigned int page_offset,
+					unsigned int to_read);
 extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 			       struct cifs_sb_info *cifs_sb);
 extern int cifs_match_super(struct super_block *, void *);
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 83b0234..8501da0 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 
 int
 cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
-		      unsigned int to_read)
+	unsigned int page_offset, unsigned int to_read)
 {
 	struct msghdr smb_msg;
-	struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
+	struct bio_vec bv = {
+		.bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
 	iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
 	return cifs_readv_from_socket(server, &smb_msg);
 }
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1c98293..87eece6 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 	int result = 0;
 	unsigned int i;
 	unsigned int nr_pages = rdata->nr_pages;
+	unsigned int page_offset = rdata->page_offset;
 
 	rdata->got_bytes = 0;
 	rdata->tailsz = PAGE_SIZE;
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page = rdata->pages[i];
 		size_t n;
+		unsigned int segment_size = rdata->pagesz;
+
+		if (i == 0)
+			segment_size -= page_offset;
+		else
+			page_offset = 0;
+
 
 		if (len <= 0) {
 			/* no need to hold page hostage */
@@ -3040,24 +3048,25 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 			put_page(page);
 			continue;
 		}
+
 		n = len;
-		if (len >= PAGE_SIZE) {
+		if (len >= segment_size)
 			/* enough data to fill the page */
-			n = PAGE_SIZE;
-			len -= n;
-		} else {
-			zero_user(page, len, PAGE_SIZE - len);
+			n = segment_size;
+		else
 			rdata->tailsz = len;
-			len = 0;
-		}
+		len -= n;
+
 		if (iter)
-			result = copy_page_from_iter(page, 0, n, iter);
+			result = copy_page_from_iter(
+					page, page_offset, n, iter);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 		else if (rdata->mr)
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(
+					server, page, page_offset, n);
 		if (result < 0)
 			break;
 
@@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		rdata->bytes = cur_len;
 		rdata->pid = pid;
 		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = PAGE_SIZE;
 		rdata->read_into_pages = cifs_uncached_read_into_pages;
 		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
 		rdata->credits = credits;
@@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 	u64 eof;
 	pgoff_t eof_index;
 	unsigned int nr_pages = rdata->nr_pages;
+	unsigned int page_offset = rdata->page_offset;
 
 	/* determine the eof that the server (probably) has */
 	eof = CIFS_I(rdata->mapping->host)->server_eof;
@@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 	rdata->tailsz = PAGE_SIZE;
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page = rdata->pages[i];
-		size_t n = PAGE_SIZE;
+		unsigned int to_read = rdata->pagesz;
+		size_t n;
+
+		if (i == 0)
+			to_read -= page_offset;
+		else
+			page_offset = 0;
+
+		n = to_read;
 
-		if (len >= PAGE_SIZE) {
-			len -= PAGE_SIZE;
+		if (len >= to_read) {
+			len -= to_read;
 		} else if (len > 0) {
 			/* enough for partial page, fill and zero the rest */
-			zero_user(page, len, PAGE_SIZE - len);
+			zero_user(page, len + page_offset, to_read - len);
 			n = rdata->tailsz = len;
 			len = 0;
 		} else if (page->index > eof_index) {
@@ -3622,13 +3641,15 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 		}
 
 		if (iter)
-			result = copy_page_from_iter(page, 0, n, iter);
+			result = copy_page_from_iter(
+					page, page_offset, n, iter);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 		else if (rdata->mr)
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(
+					server, page, page_offset, n);
 		if (result < 0)
 			break;
 
@@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 		rdata->bytes = bytes;
 		rdata->pid = pid;
 		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = PAGE_SIZE;
 		rdata->read_into_pages = cifs_readpages_read_into_pages;
 		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
 		rdata->credits = credits;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 7c0edd2..1fa1c29 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
 			zero_user(page, len, PAGE_SIZE - len);
 			len = 0;
 		}
-		length = cifs_read_page_from_socket(server, page, n);
+		length = cifs_read_page_from_socket(server, page, 0, n);
 		if (length < 0)
 			return length;
 		server->total_read += length;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index a02f6b6..f603fbe 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
 	struct smb_rqst rqst = { .rq_iov = rdata->iov,
 				 .rq_nvec = 2,
 				 .rq_pages = rdata->pages,
+				 .rq_offset = rdata->page_offset,
 				 .rq_npages = rdata->nr_pages,
 				 .rq_pagesz = rdata->pagesz,
 				 .rq_tailsz = rdata->tailsz };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 04/15] CIFS: Add support for direct pages in wdata
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (2 preceding siblings ...)
  2018-05-30 19:47 ` [Patch v2 03/15] CIFS: Use offset when reading pages Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-06-24  2:01   ` Tom Talpey
  2018-05-30 19:47 ` [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size Long Li
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Add a function to allocate wdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages that
point to the data buffer to write to.

wdata is reponsible for free those pages after it's done.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h  |  2 +-
 fs/cifs/cifsproto.h |  2 ++
 fs/cifs/cifssmb.c   | 17 ++++++++++++++---
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 56864a87..7f62c98 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1205,7 +1205,7 @@ struct cifs_writedata {
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
-	struct page			*pages[];
+	struct page			**pages;
 };
 
 /*
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 1f27d8e..7933c5f 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
 void cifs_writev_complete(struct work_struct *work);
 struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
 						work_func_t complete);
+struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
+						work_func_t complete);
 void cifs_writedata_release(struct kref *refcount);
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
 			  struct cifs_sb_info *cifs_sb,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index c8e4278..5aca336 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
 	if (wdata->cfile)
 		cifsFileInfo_put(wdata->cfile);
 
+	kvfree(wdata->pages);
 	kfree(wdata);
 }
 
@@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct *work)
 struct cifs_writedata *
 cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
 {
+	struct page **pages =
+		kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
+	if (pages)
+		return cifs_writedata_direct_alloc(pages, complete);
+
+	return NULL;
+}
+
+struct cifs_writedata *
+cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
+{
 	struct cifs_writedata *wdata;
 
-	/* writedata + number of page pointers */
-	wdata = kzalloc(sizeof(*wdata) +
-			sizeof(struct page *) * nr_pages, GFP_NOFS);
+	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
 	if (wdata != NULL) {
+		wdata->pages = pages;
 		kref_init(&wdata->refcount);
 		INIT_LIST_HEAD(&wdata->list);
 		init_completion(&wdata->done);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (3 preceding siblings ...)
  2018-05-30 19:47 ` [Patch v2 04/15] CIFS: Add support for direct pages in wdata Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-06-24  2:07   ` Tom Talpey
  2018-05-30 19:47 ` [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It's possible that the page offset is non-zero in the pages in a request,
change the function to calculate the correct data buffer length.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/transport.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 927226a..d6b5523 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
 	for (i = 0; i < rqst->rq_nvec; i++)
 		buflen += iov[i].iov_len;
 
-	/* add in the page array if there is one */
+	/*
+	 * Add in the page array if there is one. The caller needs to make
+	 * sure rq_offset and rq_tailsz are set correctly. If a buffer of
+	 * multiple pages ends at page boundary, rq_tailsz needs to be set to
+	 * PAGE_SIZE.
+	 */
 	if (rqst->rq_npages) {
-		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-		buflen += rqst->rq_tailsz;
+		if (rqst->rq_npages == 1)
+			buflen += rqst->rq_tailsz;
+		else {
+			/*
+			 * If there is more than one page, calculate the
+			 * buffer length based on rq_offset and rq_tailsz
+			 */
+			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+					rqst->rq_offset;
+			buflen += rqst->rq_tailsz;
+		}
 	}
 
 	return buflen;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (4 preceding siblings ...)
  2018-05-30 19:47 ` [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-06-24  2:09   ` Tom Talpey
  2018-05-30 19:47 ` [Patch v2 07/15] CIFS: When sending data on socket, pass the correct page offset Long Li
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Introduce a function rqst_page_get_length to return the page offset and
length for a given page in smb_rqst. This function is to be used by
following patches.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h |  3 +++
 fs/cifs/misc.c      | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 7933c5f..89dda14 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
 		    struct sdesc **sdesc);
 void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
 
+extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+				unsigned int *len, unsigned int *offset);
+
 #endif			/* _CIFSPROTO_H */
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 96849b5..e951417 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
 		crypto_free_shash(*shash);
 	*shash = NULL;
 }
+
+/**
+ * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
+ * Input: rqst - a smb_rqst, page - a page index for rqst
+ * Output: *len - the length for this page, *offset - the offset for this page
+ */
+void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+				unsigned int *len, unsigned int *offset)
+{
+	*len = rqst->rq_pagesz;
+	*offset = (page == 0) ? rqst->rq_offset : 0;
+
+	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
+		*len = rqst->rq_tailsz;
+	else if (page == 0)
+		*len = rqst->rq_pagesz - rqst->rq_offset;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 07/15] CIFS: When sending data on socket, pass the correct page offset
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (5 preceding siblings ...)
  2018-05-30 19:47 ` [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
@ 2018-05-30 19:47 ` Long Li
  2018-05-30 19:48 ` [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send Long Li
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 19:47 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It's possible that the offset is non-zero in the page to send, change the
function to pass this offset to socket.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/transport.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index d6b5523..5c96ee8 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -288,15 +288,13 @@ __smb_send_rqst(struct TCP_Server_Info *server, struct smb_rqst *rqst)
 
 	/* now walk the page array and send each page in it */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		size_t len = i == rqst->rq_npages - 1
-				? rqst->rq_tailsz
-				: rqst->rq_pagesz;
-		struct bio_vec bvec = {
-			.bv_page = rqst->rq_pages[i],
-			.bv_len = len
-		};
+		struct bio_vec bvec;
+
+		bvec.bv_page = rqst->rq_pages[i];
+		rqst_page_get_length(rqst, i, &bvec.bv_len, &bvec.bv_offset);
+
 		iov_iter_bvec(&smb_msg.msg_iter, WRITE | ITER_BVEC,
-			      &bvec, 1, len);
+			      &bvec, 1, bvec.bv_len);
 		rc = smb_send_kvec(server, &smb_msg, &sent);
 		if (rc < 0)
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (6 preceding siblings ...)
  2018-05-30 19:47 ` [Patch v2 07/15] CIFS: When sending data on socket, pass the correct page offset Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:11   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv Long Li
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

The RDMA send function needs to look at offset in the request pages, and
send data starting from there.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index c62f7c9..6141e3c 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -17,6 +17,7 @@
 #include <linux/highmem.h>
 #include "smbdirect.h"
 #include "cifs_debug.h"
+#include "cifsproto.h"
 
 static struct smbd_response *get_empty_queue_buffer(
 		struct smbd_connection *info);
@@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 	struct kvec vec;
 	int nvecs;
 	int size;
-	int buflen = 0, remaining_data_length;
+	unsigned int buflen = 0, remaining_data_length;
 	int start, i, j;
 	int max_iov_size =
 		info->max_send_size - sizeof(struct smbd_data_transfer);
@@ -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 		buflen += iov[i].iov_len;
 	}
 
-	/* add in the page array if there is one */
+	/*
+	 * Add in the page array if there is one. The caller needs to set
+	 * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
+	 * ends at page boundary
+	 */
 	if (rqst->rq_npages) {
-		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-		buflen += rqst->rq_tailsz;
+		if (rqst->rq_npages == 1)
+			buflen += rqst->rq_tailsz;
+		else
+			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+					rqst->rq_offset + rqst->rq_tailsz;
 	}
 
 	if (buflen + sizeof(struct smbd_data_transfer) >
@@ -2213,8 +2221,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 
 	/* now sending pages if there are any */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		buflen = (i == rqst->rq_npages-1) ?
-			rqst->rq_tailsz : rqst->rq_pagesz;
+		unsigned int offset;
+
+		rqst_page_get_length(rqst, i, &buflen, &offset);
 		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
 		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
 			buflen, nvecs);
@@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 			remaining_data_length -= size;
 			log_write(INFO, "sending pages i=%d offset=%d size=%d"
 				" remaining_data_length=%d\n",
-				i, j*max_iov_size, size, remaining_data_length);
+				i, j*max_iov_size+offset, size,
+				remaining_data_length);
 			rc = smbd_post_send_page(
-				info, rqst->rq_pages[i], j*max_iov_size,
+				info, rqst->rq_pages[i],
+				j*max_iov_size + offset,
 				size, remaining_data_length);
 			if (rc)
 				goto done;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (7 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:16   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration Long Li
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

RDMA recv function needs to place data to the correct place starting at
page offset.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 6141e3c..ba53c52 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
  * return value: actual data read
  */
 static int smbd_recv_page(struct smbd_connection *info,
-		struct page *page, unsigned int to_read)
+		struct page *page, unsigned int page_offset,
+		unsigned int to_read)
 {
 	int ret;
 	char *to_address;
+	void *page_address;
 
 	/* make sure we have the page ready for read */
 	ret = wait_event_interruptible(
@@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct smbd_connection *info,
 		info->reassembly_data_length >= to_read ||
 			info->transport_status != SMBD_CONNECTED);
 	if (ret)
-		return 0;
+		return ret;
 
 	/* now we can read from reassembly queue and not sleep */
-	to_address = kmap_atomic(page);
+	page_address = kmap_atomic(page);
+	to_address = (char *) page_address + page_offset;
 
 	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
 		page, to_address, to_read);
 
 	ret = smbd_recv_buf(info, to_address, to_read);
-	kunmap_atomic(to_address);
+	kunmap_atomic(page_address);
 
 	return ret;
 }
@@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 {
 	char *buf;
 	struct page *page;
-	unsigned int to_read;
+	unsigned int to_read, page_offset;
 	int rc;
 
 	info->smbd_recv_pending++;
@@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 
 	case READ | ITER_BVEC:
 		page = msg->msg_iter.bvec->bv_page;
+		page_offset = msg->msg_iter.bvec->bv_offset;
 		to_read = msg->msg_iter.bvec->bv_len;
-		rc = smbd_recv_page(info, page, to_read);
+		rc = smbd_recv_page(info, page, page_offset, to_read);
 		break;
 
 	default:
 		/* It's a bug in upper layer to get there */
 		cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
 			msg->msg_iter.type);
-		rc = -EIO;
+		rc = -EINVAL;
 	}
 
 	info->smbd_recv_pending--;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (8 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:24   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 11/15] CIFS: Pass page offset for calculating signature Long Li
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Change code to pass the correct page offset during memory registration for
RDMA read/write.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2pdu.c   | 18 ++++++++-----
 fs/cifs/smbdirect.c | 76 +++++++++++++++++++++++++++++++----------------------
 fs/cifs/smbdirect.h |  2 +-
 3 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f603fbe..fc30774 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2623,8 +2623,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
 
 		rdata->mr = smbd_register_mr(
 				server->smbd_conn, rdata->pages,
-				rdata->nr_pages, rdata->tailsz,
-				true, need_invalidate);
+				rdata->nr_pages, rdata->page_offset,
+				rdata->tailsz, true, need_invalidate);
 		if (!rdata->mr)
 			return -ENOBUFS;
 
@@ -3013,16 +3013,22 @@ smb2_async_writev(struct cifs_writedata *wdata,
 
 		wdata->mr = smbd_register_mr(
 				server->smbd_conn, wdata->pages,
-				wdata->nr_pages, wdata->tailsz,
-				false, need_invalidate);
+				wdata->nr_pages, wdata->page_offset,
+				wdata->tailsz, false, need_invalidate);
 		if (!wdata->mr) {
 			rc = -ENOBUFS;
 			goto async_writev_out;
 		}
 		req->Length = 0;
 		req->DataOffset = 0;
-		req->RemainingBytes =
-			cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
+		if (wdata->nr_pages > 1)
+			req->RemainingBytes =
+				cpu_to_le32(
+					(wdata->nr_pages - 1) * wdata->pagesz -
+					wdata->page_offset + wdata->tailsz
+				);
+		else
+			req->RemainingBytes = cpu_to_le32(wdata->tailsz);
 		req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
 		if (need_invalidate)
 			req->Channel = SMB2_CHANNEL_RDMA_V1;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index ba53c52..e459c97 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2299,37 +2299,37 @@ static void smbd_mr_recovery_work(struct work_struct *work)
 		if (smbdirect_mr->state == MR_INVALIDATED ||
 			smbdirect_mr->state == MR_ERROR) {
 
-			if (smbdirect_mr->state == MR_INVALIDATED) {
+			/* recover this MR entry */
+			rc = ib_dereg_mr(smbdirect_mr->mr);
+			if (rc) {
+				log_rdma_mr(ERR,
+					"ib_dereg_mr failed rc=%x\n",
+					rc);
+				smbd_disconnect_rdma_connection(info);
+				continue;
+			}
+
+			smbdirect_mr->mr = ib_alloc_mr(
+				info->pd, info->mr_type,
+				info->max_frmr_depth);
+			if (IS_ERR(smbdirect_mr->mr)) {
+				log_rdma_mr(ERR,
+					"ib_alloc_mr failed mr_type=%x "
+					"max_frmr_depth=%x\n",
+					info->mr_type,
+					info->max_frmr_depth);
+				smbd_disconnect_rdma_connection(info);
+				continue;
+			}
+
+			if (smbdirect_mr->state == MR_INVALIDATED)
 				ib_dma_unmap_sg(
 					info->id->device, smbdirect_mr->sgl,
 					smbdirect_mr->sgl_count,
 					smbdirect_mr->dir);
-				smbdirect_mr->state = MR_READY;
-			} else if (smbdirect_mr->state == MR_ERROR) {
-
-				/* recover this MR entry */
-				rc = ib_dereg_mr(smbdirect_mr->mr);
-				if (rc) {
-					log_rdma_mr(ERR,
-						"ib_dereg_mr failed rc=%x\n",
-						rc);
-					smbd_disconnect_rdma_connection(info);
-				}
 
-				smbdirect_mr->mr = ib_alloc_mr(
-					info->pd, info->mr_type,
-					info->max_frmr_depth);
-				if (IS_ERR(smbdirect_mr->mr)) {
-					log_rdma_mr(ERR,
-						"ib_alloc_mr failed mr_type=%x "
-						"max_frmr_depth=%x\n",
-						info->mr_type,
-						info->max_frmr_depth);
-					smbd_disconnect_rdma_connection(info);
-				}
+			smbdirect_mr->state = MR_READY;
 
-				smbdirect_mr->state = MR_READY;
-			}
 			/* smbdirect_mr->state is updated by this function
 			 * and is read and updated by I/O issuing CPUs trying
 			 * to get a MR, the call to atomic_inc_return
@@ -2475,7 +2475,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
  */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate)
+	int offset, int tailsz, bool writing, bool need_invalidate)
 {
 	struct smbd_mr *smbdirect_mr;
 	int rc, i;
@@ -2498,17 +2498,31 @@ struct smbd_mr *smbd_register_mr(
 	smbdirect_mr->sgl_count = num_pages;
 	sg_init_table(smbdirect_mr->sgl, num_pages);
 
-	for (i = 0; i < num_pages - 1; i++)
-		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+	log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n",
+			num_pages, offset, tailsz);
 
+	if (num_pages == 1) {
+		sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
+		goto skip_multiple_pages;
+	}
+
+	/* We have at least two pages to register */
+	sg_set_page(
+		&smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
+	i = 1;
+	while (i < num_pages - 1) {
+		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+		i++;
+	}
 	sg_set_page(&smbdirect_mr->sgl[i], pages[i],
 		tailsz ? tailsz : PAGE_SIZE, 0);
 
+skip_multiple_pages:
 	dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
 	smbdirect_mr->dir = dir;
 	rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
 	if (!rc) {
-		log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
+		log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
 			num_pages, dir, rc);
 		goto dma_map_error;
 	}
@@ -2516,8 +2530,8 @@ struct smbd_mr *smbd_register_mr(
 	rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
 		NULL, PAGE_SIZE);
 	if (rc != num_pages) {
-		log_rdma_mr(INFO,
-			"ib_map_mr_sg failed rc = %x num_pages = %x\n",
+		log_rdma_mr(ERR,
+			"ib_map_mr_sg failed rc = %d num_pages = %x\n",
 			rc, num_pages);
 		goto map_mr_error;
 	}
diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
index f9038da..1e419c2 100644
--- a/fs/cifs/smbdirect.h
+++ b/fs/cifs/smbdirect.h
@@ -321,7 +321,7 @@ struct smbd_mr {
 /* Interfaces to register and deregister MR for RDMA read/write */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate);
+	int offset, int tailsz, bool writing, bool need_invalidate);
 int smbd_deregister_mr(struct smbd_mr *mr);
 
 #else
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 11/15] CIFS: Pass page offset for calculating signature
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (9 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:27   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 12/15] CIFS: Pass page offset for encrypting Long Li
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When calculating signature for the packet, it needs to read into the
correct page offset for the data.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsencrypt.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index a6ef088..e88303c 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
 
 	/* now hash over the rq_pages array */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		void *kaddr = kmap(rqst->rq_pages[i]);
-		size_t len = rqst->rq_pagesz;
+		void *kaddr;
+		unsigned int len, offset;
 
-		if (i == rqst->rq_npages - 1)
-			len = rqst->rq_tailsz;
+		rqst_page_get_length(rqst, i, &len, &offset);
+
+		kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
 
 		crypto_shash_update(shash, kaddr, len);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 12/15] CIFS: Pass page offset for encrypting
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (10 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 11/15] CIFS: Pass page offset for calculating signature Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:28   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Encryption function needs to read data starting page offset from input
buffer.

This doesn't affect decryption path since it allocates its own page
buffers.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2ops.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 1fa1c29..38d19b6 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2189,9 +2189,10 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
 		smb2_sg_set_buf(&sg[i], rqst->rq_iov[i].iov_base,
 						rqst->rq_iov[i].iov_len);
 	for (j = 0; i < sg_len - 1; i++, j++) {
-		unsigned int len = (j < rqst->rq_npages - 1) ? rqst->rq_pagesz
-							: rqst->rq_tailsz;
-		sg_set_page(&sg[i], rqst->rq_pages[j], len, 0);
+		unsigned int len, offset;
+
+		rqst_page_get_length(rqst, j, &len, &offset);
+		sg_set_page(&sg[i], rqst->rq_pages[j], len, offset);
 	}
 	smb2_sg_set_buf(&sg[sg_len - 1], sign, SMB2_SIGNATURE_SIZE);
 	return sg;
@@ -2332,6 +2333,7 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
 		return rc;
 
 	new_rq->rq_pages = pages;
+	new_rq->rq_offset = old_rq->rq_offset;
 	new_rq->rq_npages = old_rq->rq_npages;
 	new_rq->rq_pagesz = old_rq->rq_pagesz;
 	new_rq->rq_tailsz = old_rq->rq_tailsz;
@@ -2363,10 +2365,14 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
 
 	/* copy pages form the old */
 	for (i = 0; i < npages; i++) {
-		char *dst = kmap(new_rq->rq_pages[i]);
-		char *src = kmap(old_rq->rq_pages[i]);
-		unsigned int len = (i < npages - 1) ? new_rq->rq_pagesz :
-							new_rq->rq_tailsz;
+		char *dst, *src;
+		unsigned int offset, len;
+
+		rqst_page_get_length(new_rq, i, &len, &offset);
+
+		dst = (char *) kmap(new_rq->rq_pages[i]) + offset;
+		src = (char *) kmap(old_rq->rq_pages[i]) + offset;
+
 		memcpy(dst, src, len);
 		kunmap(new_rq->rq_pages[i]);
 		kunmap(old_rq->rq_pages[i]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 13/15] CIFS: Add support for direct I/O read
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (11 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 12/15] CIFS: Pass page offset for encrypting Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-02  5:51   ` kbuild test robot
                     ` (2 more replies)
  2018-05-30 19:48 ` [Patch v2 14/15] CIFS: Add support for direct I/O write Long Li
  2018-05-30 19:48 ` [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations Long Li
  14 siblings, 3 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Implement the function for direct I/O read. It doesn't support AIO, which
will be implemented in a follow up patch.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.h |   1 +
 fs/cifs/file.c   | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 5f02318..7fba9aa 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
 extern int cifs_close(struct inode *inode, struct file *file);
 extern int cifs_closedir(struct inode *inode, struct file *file);
 extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
+extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 87eece6..e6e6f24 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
 	return rc;
 }
 
+static void cifs_direct_readdata_release(struct kref *refcount)
+{
+	struct cifs_readdata *rdata = container_of(refcount,
+					struct cifs_readdata, refcount);
+	unsigned int i;
+
+	for (i = 0; i < rdata->nr_pages; i++)
+		put_page(rdata->pages[i]);
+
+	cifs_readdata_release(refcount);
+}
+
 static void
 cifs_uncached_readdata_release(struct kref *refcount)
 {
@@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+static void cifs_direct_readv_complete(struct work_struct *work)
+{
+	struct cifs_readdata *rdata =
+		container_of(work, struct cifs_readdata, work);
+
+	complete(&rdata->done);
+	kref_put(&rdata->refcount, cifs_direct_readdata_release);
+}
+
+ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+	size_t len, cur_len, start;
+	unsigned int npages, rsize, credits;
+	struct file *file;
+	struct cifs_sb_info *cifs_sb;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	struct page **pagevec;
+	ssize_t rc, total_read = 0;
+	struct TCP_Server_Info *server;
+	loff_t offset = iocb->ki_pos;
+	pid_t pid;
+	struct cifs_readdata *rdata;
+
+	/*
+	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
+	 * fall back to data copy read path
+	 */
+	if (to->type & ITER_KVEC) {
+		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
+		return cifs_user_readv(iocb, to);
+	}
+
+	len = iov_iter_count(to);
+	if (!len)
+		return 0;
+
+	file = iocb->ki_filp;
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_readv)
+		return -ENOSYS;
+
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+		pid = cfile->pid;
+	else
+		pid = current->tgid;
+
+	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+		cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+	do {
+		rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
+					&rsize, &credits);
+		if (rc)
+			break;
+
+		cur_len = min_t(const size_t, len, rsize);
+
+		rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
+		if (rc < 0) {
+			cifs_dbg(VFS,
+				"couldn't get user pages (rc=%zd) iter type %d"
+				" iov_offset %lu count %lu\n",
+				rc, to->type, to->iov_offset, to->count);
+			dump_stack();
+			break;
+		}
+
+		rdata = cifs_readdata_direct_alloc(
+				pagevec, cifs_direct_readv_complete);
+		if (!rdata) {
+			add_credits_and_wake_if(server, credits, 0);
+			rc = -ENOMEM;
+			break;
+		}
+
+		npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
+		rdata->nr_pages = npages;
+		rdata->page_offset = start;
+		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = npages > 1 ?
+				rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
+				rc;
+		cur_len = rc;
+
+		rdata->cfile = cifsFileInfo_get(cfile);
+		rdata->offset = offset;
+		rdata->bytes = rc;
+		rdata->pid = pid;
+		rdata->read_into_pages = cifs_uncached_read_into_pages;
+		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
+		rdata->credits = credits;
+
+		rc = 0;
+		if (rdata->cfile->invalidHandle)
+			rc = cifs_reopen_file(rdata->cfile, true);
+
+		if (!rc)
+			rc = server->ops->async_readv(rdata);
+
+		if (rc) {
+			add_credits_and_wake_if(server, rdata->credits, 0);
+			kref_put(&rdata->refcount,
+				 cifs_direct_readdata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		wait_for_completion(&rdata->done);
+		rc = rdata->result;
+		if (rc) {
+			kref_put(
+				&rdata->refcount,
+				cifs_direct_readdata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		total_read += rdata->got_bytes;
+		kref_put(&rdata->refcount, cifs_direct_readdata_release);
+
+		iov_iter_advance(to, cur_len);
+		len -= cur_len;
+		offset += cur_len;
+	} while (len);
+
+	iocb->ki_pos += total_read;
+
+	return total_read;
+}
+
 ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 14/15] CIFS: Add support for direct I/O write
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (12 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-24  2:48   ` Tom Talpey
  2018-05-30 19:48 ` [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations Long Li
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Implement the function for direct I/O write. It doesn't support AIO, which
will be implemented in a follow up patch.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.h |   1 +
 fs/cifs/file.c   | 165 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 7fba9aa..e9c5103 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
+extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e6e6f24..8c385b1 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref *refcount)
 
 static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
 
+static void cifs_direct_writedata_release(struct kref *refcount)
+{
+	int i;
+	struct cifs_writedata *wdata = container_of(refcount,
+					struct cifs_writedata, refcount);
+
+	for (i = 0; i < wdata->nr_pages; i++)
+		put_page(wdata->pages[i]);
+
+	cifs_writedata_release(refcount);
+}
+
+static void cifs_direct_writev_complete(struct work_struct *work)
+{
+	struct cifs_writedata *wdata = container_of(work,
+					struct cifs_writedata, work);
+	struct inode *inode = d_inode(wdata->cfile->dentry);
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+	spin_lock(&inode->i_lock);
+	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
+	if (cifsi->server_eof > inode->i_size)
+		i_size_write(inode, cifsi->server_eof);
+	spin_unlock(&inode->i_lock);
+
+	complete(&wdata->done);
+	kref_put(&wdata->refcount, cifs_direct_writedata_release);
+}
+
 static void
 cifs_uncached_writev_complete(struct work_struct *work)
 {
@@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	ssize_t total_written = 0;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	struct cifs_sb_info *cifs_sb;
+	struct TCP_Server_Info *server;
+	pid_t pid;
+	unsigned long nr_pages;
+	loff_t offset = iocb->ki_pos;
+	size_t len = iov_iter_count(from);
+	int rc;
+	struct cifs_writedata *wdata;
+
+	/*
+	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
+	 * In this case, fall back to non-direct write function.
+	 */
+	if (from->type & ITER_KVEC) {
+		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
+		return cifs_user_writev(iocb, from);
+	}
+
+	rc = generic_write_checks(iocb, from);
+	if (rc <= 0)
+		return rc;
+
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_writev)
+		return -ENOSYS;
+
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+		pid = cfile->pid;
+	else
+		pid = current->tgid;
+
+	do {
+		unsigned int wsize, credits;
+		struct page **pagevec;
+		size_t start;
+		ssize_t cur_len;
+
+		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
+						   &wsize, &credits);
+		if (rc)
+			break;
+
+		cur_len = iov_iter_get_pages_alloc(
+				from, &pagevec, wsize, &start);
+		if (cur_len < 0) {
+			cifs_dbg(VFS,
+				"direct_writev couldn't get user pages "
+				"(rc=%zd) iter type %d iov_offset %lu count"
+				" %lu\n",
+				cur_len, from->type,
+				from->iov_offset, from->count);
+			dump_stack();
+			break;
+		}
+		if (cur_len < 0)
+			break;
+
+		nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
+
+		wdata = cifs_writedata_direct_alloc(pagevec,
+					     cifs_direct_writev_complete);
+		if (!wdata) {
+			rc = -ENOMEM;
+			add_credits_and_wake_if(server, credits, 0);
+			break;
+		}
+
+		wdata->nr_pages = nr_pages;
+		wdata->page_offset = start;
+		wdata->pagesz = PAGE_SIZE;
+		wdata->tailsz =
+			nr_pages > 1 ?
+			cur_len - (PAGE_SIZE - start) -
+				(nr_pages - 2) * PAGE_SIZE :
+			cur_len;
+
+		wdata->sync_mode = WB_SYNC_ALL;
+		wdata->offset = (__u64)offset;
+		wdata->cfile = cifsFileInfo_get(cfile);
+		wdata->pid = pid;
+		wdata->bytes = cur_len;
+		wdata->credits = credits;
+
+		rc = 0;
+		if (wdata->cfile->invalidHandle)
+			rc = cifs_reopen_file(wdata->cfile, false);
+
+		if (!rc)
+			rc = server->ops->async_writev(wdata,
+					cifs_direct_writedata_release);
+
+		if (rc) {
+			add_credits_and_wake_if(server, wdata->credits, 0);
+			kref_put(&wdata->refcount,
+				 cifs_direct_writedata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		wait_for_completion(&wdata->done);
+		if (wdata->result) {
+			rc = wdata->result;
+			kref_put(&wdata->refcount,
+					cifs_direct_writedata_release);
+			if (rc == -EAGAIN)
+				continue;
+			break;
+		}
+
+		kref_put(&wdata->refcount, cifs_direct_writedata_release);
+
+		iov_iter_advance(from, cur_len);
+		total_written += cur_len;
+		offset += cur_len;
+		len -= cur_len;
+	} while (len);
+
+	if (unlikely(!total_written))
+		return rc;
+
+	iocb->ki_pos += total_written;
+	return total_written;
+
+}
+
 ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations
  2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
                   ` (13 preceding siblings ...)
  2018-05-30 19:48 ` [Patch v2 14/15] CIFS: Add support for direct I/O write Long Li
@ 2018-05-30 19:48 ` Long Li
  2018-06-07 11:17   ` Pavel Shilovsky
  14 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-05-30 19:48 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With direct read/write functions implemented, add them to file_operations.
When mounting with "cache=none", CIFS uses direct I/O.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 62f1662..e84f8c2 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
 };
 
 const struct file_operations cifs_file_direct_ops = {
-	/* BB reevaluate whether they can be done with directio, no cache */
-	.read_iter = cifs_user_readv,
-	.write_iter = cifs_user_writev,
+	.read_iter = cifs_direct_readv,
+	.write_iter = cifs_direct_writev,
 	.open = cifs_open,
 	.release = cifs_close,
 	.lock = cifs_lock,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-05-30 19:47 ` [Patch v2 02/15] CIFS: Add support for direct pages in rdata Long Li
@ 2018-05-30 20:27   ` Ruhl, Michael J
  2018-05-30 20:57     ` Long Li
  2018-06-24  1:50   ` Tom Talpey
  1 sibling, 1 reply; 50+ messages in thread
From: Ruhl, Michael J @ 2018-05-30 20:27 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

>-----Original Message-----
>From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>owner@vger.kernel.org] On Behalf Of Long Li
>Sent: Wednesday, May 30, 2018 3:48 PM
>To: Steve French <sfrench@samba.org>; linux-cifs@vger.kernel.org; samba-
>technical@lists.samba.org; linux-kernel@vger.kernel.org; linux-
>rdma@vger.kernel.org
>Cc: Long Li <longli@microsoft.com>
>Subject: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
>
>From: Long Li <longli@microsoft.com>
>
>Add a function to allocate rdata without allocating pages for data
>transfer. This gives the caller an option to pass a number of pages
>that point to the data buffer.
>
>rdata is still reponsible for free those pages after it's done.
>
>Signed-off-by: Long Li <longli@microsoft.com>
>---
> fs/cifs/cifsglob.h |  2 +-
> fs/cifs/file.c     | 23 ++++++++++++++++++++---
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
>diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>index 8d16c3e..56864a87 100644
>--- a/fs/cifs/cifsglob.h
>+++ b/fs/cifs/cifsglob.h
>@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> 	unsigned int			tailsz;
> 	unsigned int			credits;
> 	unsigned int			nr_pages;
>-	struct page			*pages[];
>+	struct page			**pages;
> };
>
> struct cifs_writedata;
>diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>index 23fd430..1c98293 100644
>--- a/fs/cifs/file.c
>+++ b/fs/cifs/file.c
>@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
>iov_iter *from)
> }
>
> static struct cifs_readdata *
>-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
>+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> {
> 	struct cifs_readdata *rdata;
>
>-	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
>-			GFP_KERNEL);
>+	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> 	if (rdata != NULL) {
>+		rdata->pages = pages;
> 		kref_init(&rdata->refcount);
> 		INIT_LIST_HEAD(&rdata->list);
> 		init_completion(&rdata->done);
>@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
>work_func_t complete)
> 	return rdata;
> }
>
>+static struct cifs_readdata *
>+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
>+{
>+	struct page **pages =
>+		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
>+	struct cifs_readdata *ret = NULL;
>+
>+	if (pages) {
>+		ret = cifs_readdata_direct_alloc(pages, complete);
>+		if (!ret)
>+			kfree(pages);
>+	}
>+
>+	return ret;
>+}
>+
> void
> cifs_readdata_release(struct kref *refcount)
> {
>@@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
> 	if (rdata->cfile)
> 		cifsFileInfo_put(rdata->cfile);
>
>+	kvfree(rdata->pages);

Is the kvfree() correct?

You use kzalloc() and kfree in cifs_readdata_alloc().

Mike

> 	kfree(rdata);
> }
>
>--
>2.7.4
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-05-30 20:27   ` Ruhl, Michael J
@ 2018-05-30 20:57     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-05-30 20:57 UTC (permalink / raw)
  To: Ruhl, Michael J, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
> 
> >-----Original Message-----
> >From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >owner@vger.kernel.org] On Behalf Of Long Li
> >Sent: Wednesday, May 30, 2018 3:48 PM
> >To: Steve French <sfrench@samba.org>; linux-cifs@vger.kernel.org;
> >samba- technical@lists.samba.org; linux-kernel@vger.kernel.org; linux-
> >rdma@vger.kernel.org
> >Cc: Long Li <longli@microsoft.com>
> >Subject: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
> >
> >From: Long Li <longli@microsoft.com>
> >
> >Add a function to allocate rdata without allocating pages for data
> >transfer. This gives the caller an option to pass a number of pages
> >that point to the data buffer.
> >
> >rdata is still reponsible for free those pages after it's done.
> >
> >Signed-off-by: Long Li <longli@microsoft.com>
> >---
> > fs/cifs/cifsglob.h |  2 +-
> > fs/cifs/file.c     | 23 ++++++++++++++++++++---
> > 2 files changed, 21 insertions(+), 4 deletions(-)
> >
> >diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> >8d16c3e..56864a87 100644
> >--- a/fs/cifs/cifsglob.h
> >+++ b/fs/cifs/cifsglob.h
> >@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> > 	unsigned int			tailsz;
> > 	unsigned int			credits;
> > 	unsigned int			nr_pages;
> >-	struct page			*pages[];
> >+	struct page			**pages;
> > };
> >
> > struct cifs_writedata;
> >diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..1c98293
> >100644
> >--- a/fs/cifs/file.c
> >+++ b/fs/cifs/file.c
> >@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
> >iov_iter *from)  }
> >
> > static struct cifs_readdata *
> >-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> >+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> > {
> > 	struct cifs_readdata *rdata;
> >
> >-	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> >-			GFP_KERNEL);
> >+	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> > 	if (rdata != NULL) {
> >+		rdata->pages = pages;
> > 		kref_init(&rdata->refcount);
> > 		INIT_LIST_HEAD(&rdata->list);
> > 		init_completion(&rdata->done);
> >@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
> >work_func_t complete)
> > 	return rdata;
> > }
> >
> >+static struct cifs_readdata *
> >+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) {
> >+	struct page **pages =
> >+		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> >+	struct cifs_readdata *ret = NULL;
> >+
> >+	if (pages) {
> >+		ret = cifs_readdata_direct_alloc(pages, complete);
> >+		if (!ret)
> >+			kfree(pages);
> >+	}
> >+
> >+	return ret;
> >+}
> >+
> > void
> > cifs_readdata_release(struct kref *refcount)  { @@ -2910,6 +2926,7 @@
> >cifs_readdata_release(struct kref *refcount)
> > 	if (rdata->cfile)
> > 		cifsFileInfo_put(rdata->cfile);
> >
> >+	kvfree(rdata->pages);
> 
> Is the kvfree() correct?
> 
> You use kzalloc() and kfree in cifs_readdata_alloc().

This function is shared by both non-direct I/O and direct I/O code paths.

Direct I/O uses kvmalloc to allocate pages, so kvfree is used here to handle both cases.

> 
> Mike
> 
> > 	kfree(rdata);
> > }
> >
> >--
> >2.7.4
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >in the body of a message to majordomo@vger.kernel.org More majordomo
> >info at
> >https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.k
> e
> >rnel.org%2Fmajordomo-
> info.html&data=02%7C01%7Clongli%40microsoft.com%7C
> >e810388a534643737ab108d5c66bd6df%7C72f988bf86f141af91ab2d7cd011db
> 47%7C1
> >%7C0%7C636633088833938755&sdata=iHKiji2rUhLHpbH5x13SJBWCvHExSr4a
> rz9Xiv3
> >1rMQ%3D&reserved=0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&data=02%7C01%7Clongli%40microsoft.com%7Ce810388a53464373
> 7ab108d5c66bd6df%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63
> 6633088833938755&sdata=iHKiji2rUhLHpbH5x13SJBWCvHExSr4arz9Xiv31rMQ
> %3D&reserved=0

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 13/15] CIFS: Add support for direct I/O read
  2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
@ 2018-06-02  5:51   ` kbuild test robot
  2018-06-02  7:15   ` kbuild test robot
  2018-06-24  2:39   ` Tom Talpey
  2 siblings, 0 replies; 50+ messages in thread
From: kbuild test robot @ 2018-06-02  5:51 UTC (permalink / raw)
  To: Long Li
  Cc: kbuild-all, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma, Long Li

[-- Attachment #1: Type: text/plain, Size: 4607 bytes --]

Hi Long,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.17-rc6]
[cannot apply to cifs/for-next next-20180601]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Long-Li/CIFS-Add-direct-I-O-support/20180602-130240
config: i386-randconfig-x000-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs/cifs/file.c:24:
   fs/cifs/file.c: In function 'cifs_direct_readv':
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
>> fs/cifs/cifs_debug.h:82:3: note: in expansion of macro 'pr_debug'
      pr_debug(fmt, ##__VA_ARGS__);    \
      ^~~~~~~~
   fs/cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs/cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs/cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:342:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
>> fs/cifs/cifs_debug.h:82:3: note: in expansion of macro 'pr_debug'
      pr_debug(fmt, ##__VA_ARGS__);    \
      ^~~~~~~~
   fs/cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs/cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u

vim +/pr_debug +82 fs/cifs/cifs_debug.h

f2f176b4 Aurelien Aptel  2018-04-10  73  
^1da177e Linus Torvalds  2005-04-16  74  /*
^1da177e Linus Torvalds  2005-04-16  75   *	debug OFF
^1da177e Linus Torvalds  2005-04-16  76   *	---------
^1da177e Linus Torvalds  2005-04-16  77   */
^1da177e Linus Torvalds  2005-04-16  78  #else		/* _CIFS_DEBUG */
f96637be Joe Perches     2013-05-04  79  #define cifs_dbg(type, fmt, ...)					\
bde98197 Joe Perches     2012-12-05  80  do {									\
bde98197 Joe Perches     2012-12-05  81  	if (0)								\
0b456f04 Andy Shevchenko 2014-08-27 @82  		pr_debug(fmt, ##__VA_ARGS__);				\
bde98197 Joe Perches     2012-12-05  83  } while (0)
f96637be Joe Perches     2013-05-04  84  #endif
^1da177e Linus Torvalds  2005-04-16  85  

:::::: The code at line 82 was first introduced by commit
:::::: 0b456f04bcdf5b1151136adaada158bfc26f1be7 cifs: convert printk(LEVEL...) to pr_<level>

:::::: TO: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
:::::: CC: Steve French <steve.french@primarydata.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25237 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 13/15] CIFS: Add support for direct I/O read
  2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
  2018-06-02  5:51   ` kbuild test robot
@ 2018-06-02  7:15   ` kbuild test robot
  2018-06-24  2:39   ` Tom Talpey
  2 siblings, 0 replies; 50+ messages in thread
From: kbuild test robot @ 2018-06-02  7:15 UTC (permalink / raw)
  To: Long Li
  Cc: kbuild-all, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma, Long Li

[-- Attachment #1: Type: text/plain, Size: 23102 bytes --]

Hi Long,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.17-rc6]
[cannot apply to cifs/for-next next-20180601]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Long-Li/CIFS-Add-direct-I-O-support/20180602-130240
config: i386-randconfig-x008-201821 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   fs//cifs/file.c: In function 'cifs_direct_readv':
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_once'
      pr_debug_ ## ratefunc("%s: "   \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_once'
      pr_debug_ ## ratefunc("%s: "   \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:357:10: note: in definition of macro 'printk_once'
      printk(fmt, ##__VA_ARGS__);   \
             ^~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:386:14: note: in expansion of macro 'KERN_ERR'
     printk_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                 ^~~~~~~~
   fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_once'
      pr_err_ ## ratefunc("CIFS VFS: "  \
      ^~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:357:10: note: in definition of macro 'printk_once'
      printk(fmt, ##__VA_ARGS__);   \
             ^~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:386:14: note: in expansion of macro 'KERN_ERR'
     printk_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                 ^~~~~~~~
   fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_once'
      pr_err_ ## ratefunc("CIFS VFS: "  \
      ^~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_once'
      pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:410:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_once'
      pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:67:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(once,   \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_ratelimited'
      pr_debug_ ## ratefunc("%s: "   \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:54:3: note: in expansion of macro 'pr_debug_ratelimited'
      pr_debug_ ## ratefunc("%s: "   \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:425:10: note: in definition of macro 'printk_ratelimited'
      printk(fmt, ##__VA_ARGS__);    \
             ^~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:439:21: note: in expansion of macro 'KERN_ERR'
     printk_ratelimited(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                        ^~~~~~~~
   fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_ratelimited'
      pr_err_ ## ratefunc("CIFS VFS: "  \
      ^~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:425:10: note: in definition of macro 'printk_ratelimited'
      printk(fmt, ##__VA_ARGS__);    \
             ^~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
    #define KERN_ERR KERN_SOH "3" /* error conditions */
                     ^~~~~~~~
   include/linux/printk.h:439:21: note: in expansion of macro 'KERN_ERR'
     printk_ratelimited(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                        ^~~~~~~~
   fs//cifs/cifs_debug.h:57:3: note: in expansion of macro 'pr_err_ratelimited'
      pr_err_ ## ratefunc("CIFS VFS: "  \
      ^~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_ratelimited'
      pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:20: note: format string is defined here
        " iov_offset %lu count %lu\n",
                     ~~^
                     %u
   In file included from include/linux/kernel.h:14:0,
                    from include/linux/list.h:9,
                    from include/linux/wait.h:7,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs//cifs/file.c:24:
   include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'size_t {aka unsigned int}' [-Wformat=]
    #define KERN_SOH "\001"  /* ASCII Start Of Header */
                     ^
   include/linux/printk.h:136:10: note: in definition of macro 'no_printk'
      printk(fmt, ##__VA_ARGS__);  \
             ^~~
   include/linux/kern_levels.h:15:20: note: in expansion of macro 'KERN_SOH'
    #define KERN_DEBUG KERN_SOH "7" /* debug-level messages */
                       ^~~~~~~~
   include/linux/printk.h:474:12: note: in expansion of macro 'KERN_DEBUG'
     no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
               ^~~~~~~~~~
   fs//cifs/cifs_debug.h:60:3: note: in expansion of macro 'pr_debug_ratelimited'
      pr_debug_ ## ratefunc(fmt, ##__VA_ARGS__); \
      ^~~~~~~~~
   fs//cifs/cifs_debug.h:70:3: note: in expansion of macro 'cifs_dbg_func'
      cifs_dbg_func(ratelimited,  \
      ^~~~~~~~~~~~~
>> fs//cifs/file.c:3346:4: note: in expansion of macro 'cifs_dbg'
       cifs_dbg(VFS,
       ^~~~~~~~
   fs//cifs/file.c:3348:30: note: format string is defined here
        " iov_offset %lu count %lu\n",
                               ~~^
                               %u

vim +/cifs_dbg +3346 fs//cifs/file.c

  3290	
  3291	ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
  3292	{
  3293		size_t len, cur_len, start;
  3294		unsigned int npages, rsize, credits;
  3295		struct file *file;
  3296		struct cifs_sb_info *cifs_sb;
  3297		struct cifsFileInfo *cfile;
  3298		struct cifs_tcon *tcon;
  3299		struct page **pagevec;
  3300		ssize_t rc, total_read = 0;
  3301		struct TCP_Server_Info *server;
  3302		loff_t offset = iocb->ki_pos;
  3303		pid_t pid;
  3304		struct cifs_readdata *rdata;
  3305	
  3306		/*
  3307		 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
  3308		 * fall back to data copy read path
  3309		 */
  3310		if (to->type & ITER_KVEC) {
  3311			cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
  3312			return cifs_user_readv(iocb, to);
  3313		}
  3314	
  3315		len = iov_iter_count(to);
  3316		if (!len)
  3317			return 0;
  3318	
  3319		file = iocb->ki_filp;
  3320		cifs_sb = CIFS_FILE_SB(file);
  3321		cfile = file->private_data;
  3322		tcon = tlink_tcon(cfile->tlink);
  3323		server = tcon->ses->server;
  3324	
  3325		if (!server->ops->async_readv)
  3326			return -ENOSYS;
  3327	
  3328		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
  3329			pid = cfile->pid;
  3330		else
  3331			pid = current->tgid;
  3332	
  3333		if ((file->f_flags & O_ACCMODE) == O_WRONLY)
  3334			cifs_dbg(FYI, "attempting read on write only file instance\n");
  3335	
  3336		do {
  3337			rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
  3338						&rsize, &credits);
  3339			if (rc)
  3340				break;
  3341	
  3342			cur_len = min_t(const size_t, len, rsize);
  3343	
  3344			rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
  3345			if (rc < 0) {
> 3346				cifs_dbg(VFS,
  3347					"couldn't get user pages (rc=%zd) iter type %d"
  3348					" iov_offset %lu count %lu\n",
  3349					rc, to->type, to->iov_offset, to->count);
  3350				dump_stack();
  3351				break;
  3352			}
  3353	
  3354			rdata = cifs_readdata_direct_alloc(
  3355					pagevec, cifs_direct_readv_complete);
  3356			if (!rdata) {
  3357				add_credits_and_wake_if(server, credits, 0);
  3358				rc = -ENOMEM;
  3359				break;
  3360			}
  3361	
  3362			npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
  3363			rdata->nr_pages = npages;
  3364			rdata->page_offset = start;
  3365			rdata->pagesz = PAGE_SIZE;
  3366			rdata->tailsz = npages > 1 ?
  3367					rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
  3368					rc;
  3369			cur_len = rc;
  3370	
  3371			rdata->cfile = cifsFileInfo_get(cfile);
  3372			rdata->offset = offset;
  3373			rdata->bytes = rc;
  3374			rdata->pid = pid;
  3375			rdata->read_into_pages = cifs_uncached_read_into_pages;
  3376			rdata->copy_into_pages = cifs_uncached_copy_into_pages;
  3377			rdata->credits = credits;
  3378	
  3379			rc = 0;
  3380			if (rdata->cfile->invalidHandle)
  3381				rc = cifs_reopen_file(rdata->cfile, true);
  3382	
  3383			if (!rc)
  3384				rc = server->ops->async_readv(rdata);
  3385	
  3386			if (rc) {
  3387				add_credits_and_wake_if(server, rdata->credits, 0);
  3388				kref_put(&rdata->refcount,
  3389					 cifs_direct_readdata_release);
  3390				if (rc == -EAGAIN)
  3391					continue;
  3392				break;
  3393			}
  3394	
  3395			wait_for_completion(&rdata->done);
  3396			rc = rdata->result;
  3397			if (rc) {
  3398				kref_put(
  3399					&rdata->refcount,
  3400					cifs_direct_readdata_release);
  3401				if (rc == -EAGAIN)
  3402					continue;
  3403				break;
  3404			}
  3405	
  3406			total_read += rdata->got_bytes;
  3407			kref_put(&rdata->refcount, cifs_direct_readdata_release);
  3408	
  3409			iov_iter_advance(to, cur_len);
  3410			len -= cur_len;
  3411			offset += cur_len;
  3412		} while (len);
  3413	
  3414		iocb->ki_pos += total_read;
  3415	
  3416		return total_read;
  3417	}
  3418	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28707 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations
  2018-05-30 19:48 ` [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations Long Li
@ 2018-06-07 11:17   ` Pavel Shilovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Pavel Shilovsky @ 2018-06-07 11:17 UTC (permalink / raw)
  To: Long Li
  Cc: Steve French, linux-cifs, samba-technical, Kernel Mailing List,
	linux-rdma

2018-05-30 12:48 GMT-07:00 Long Li <longli@linuxonhyperv.com>:
> From: Long Li <longli@microsoft.com>
>
> With direct read/write functions implemented, add them to file_operations.
> When mounting with "cache=none", CIFS uses direct I/O.
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  fs/cifs/cifsfs.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 62f1662..e84f8c2 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
>  };
>
>  const struct file_operations cifs_file_direct_ops = {
> -       /* BB reevaluate whether they can be done with directio, no cache */
> -       .read_iter = cifs_user_readv,
> -       .write_iter = cifs_user_writev,
> +       .read_iter = cifs_direct_readv,
> +       .write_iter = cifs_direct_writev,

I would postpone making this change until we have asynchronous I/O
support for direct mode.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-05-30 19:47 ` [Patch v2 02/15] CIFS: Add support for direct pages in rdata Long Li
  2018-05-30 20:27   ` Ruhl, Michael J
@ 2018-06-24  1:50   ` Tom Talpey
  2018-06-25 20:25     ` Long Li
  2018-06-25 21:01     ` Jason Gunthorpe
  1 sibling, 2 replies; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  1:50 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Add a function to allocate rdata without allocating pages for data
> transfer. This gives the caller an option to pass a number of pages
> that point to the data buffer.
> 
> rdata is still reponsible for free those pages after it's done.

"Caller" is still responsible? Or is the rdata somehow freeing itself
via another mechanism?

> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsglob.h |  2 +-
>   fs/cifs/file.c     | 23 ++++++++++++++++++++---
>   2 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 8d16c3e..56864a87 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
>   	unsigned int			tailsz;
>   	unsigned int			credits;
>   	unsigned int			nr_pages;
> -	struct page			*pages[];
> +	struct page			**pages;

Technically speaking, these are syntactically equivalent. It may not
be worth changing this historic definition.

Tom.


>   };
>   
>   struct cifs_writedata;
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 23fd430..1c98293 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
>   }
>   
>   static struct cifs_readdata *
> -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> +cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
>   {
>   	struct cifs_readdata *rdata;
>   
> -	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> -			GFP_KERNEL);
> +	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
>   	if (rdata != NULL) {
> +		rdata->pages = pages;
>   		kref_init(&rdata->refcount);
>   		INIT_LIST_HEAD(&rdata->list);
>   		init_completion(&rdata->done);
> @@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
>   	return rdata;
>   }
>   
> +static struct cifs_readdata *
> +cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> +{
> +	struct page **pages =
> +		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> +	struct cifs_readdata *ret = NULL;
> +
> +	if (pages) {
> +		ret = cifs_readdata_direct_alloc(pages, complete);
> +		if (!ret)
> +			kfree(pages);
> +	}
> +
> +	return ret;
> +}
> +
>   void
>   cifs_readdata_release(struct kref *refcount)
>   {
> @@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
>   	if (rdata->cfile)
>   		cifsFileInfo_put(rdata->cfile);
>   
> +	kvfree(rdata->pages);
>   	kfree(rdata);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 03/15] CIFS: Use offset when reading pages
  2018-05-30 19:47 ` [Patch v2 03/15] CIFS: Use offset when reading pages Long Li
@ 2018-06-24  1:58   ` Tom Talpey
  2018-06-25 20:27     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  1:58 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> With offset defined in rdata, transport functions need to look at this
> offset when reading data into the correct places in pages.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsproto.h |  4 +++-
>   fs/cifs/connect.c   |  5 +++--
>   fs/cifs/file.c      | 52 +++++++++++++++++++++++++++++++++++++---------------
>   fs/cifs/smb2ops.c   |  2 +-
>   fs/cifs/smb2pdu.c   |  1 +
>   5 files changed, 45 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index dc80f84..1f27d8e 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
>   extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
>   			         unsigned int to_read);
>   extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
> -				      struct page *page, unsigned int to_read);
> +					struct page *page,
> +					unsigned int page_offset,
> +					unsigned int to_read);
>   extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
>   			       struct cifs_sb_info *cifs_sb);
>   extern int cifs_match_super(struct super_block *, void *);
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 83b0234..8501da0 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
>   
>   int
>   cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
> -		      unsigned int to_read)
> +	unsigned int page_offset, unsigned int to_read)
>   {
>   	struct msghdr smb_msg;
> -	struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
> +	struct bio_vec bv = {
> +		.bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
>   	iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
>   	return cifs_readv_from_socket(server, &smb_msg);
>   }
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 1c98293..87eece6 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info *server,
>   	int result = 0;
>   	unsigned int i;
>   	unsigned int nr_pages = rdata->nr_pages;
> +	unsigned int page_offset = rdata->page_offset;
>   
>   	rdata->got_bytes = 0;
>   	rdata->tailsz = PAGE_SIZE;
>   	for (i = 0; i < nr_pages; i++) {
>   		struct page *page = rdata->pages[i];
>   		size_t n;
> +		unsigned int segment_size = rdata->pagesz;
> +
> +		if (i == 0)
> +			segment_size -= page_offset;
> +		else
> +			page_offset = 0;
> +
>   
>   		if (len <= 0) {
>   			/* no need to hold page hostage */
> @@ -3040,24 +3048,25 @@ uncached_fill_pages(struct TCP_Server_Info *server,
>   			put_page(page);
>   			continue;
>   		}
> +
>   		n = len;
> -		if (len >= PAGE_SIZE) {
> +		if (len >= segment_size)
>   			/* enough data to fill the page */
> -			n = PAGE_SIZE;
> -			len -= n;
> -		} else {
> -			zero_user(page, len, PAGE_SIZE - len);
> +			n = segment_size;
> +		else
>   			rdata->tailsz = len;
> -			len = 0;
> -		}
> +		len -= n;
> +
>   		if (iter)
> -			result = copy_page_from_iter(page, 0, n, iter);
> +			result = copy_page_from_iter(
> +					page, page_offset, n, iter);
>   #ifdef CONFIG_CIFS_SMB_DIRECT
>   		else if (rdata->mr)
>   			result = n;
>   #endif

I hadn't noticed this type of xonditional code before. It's kind of ugly
to modify the data handling code like this. Do you plan to break this
out into an smbdirect-specific hander, like the cases above and below?

>   		else
> -			result = cifs_read_page_from_socket(server, page, n);
> +			result = cifs_read_page_from_socket(
> +					server, page, page_offset, n);
>   		if (result < 0)
>   			break;
>   
> @@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
>   		rdata->bytes = cur_len;
>   		rdata->pid = pid;
>   		rdata->pagesz = PAGE_SIZE;
> +		rdata->tailsz = PAGE_SIZE;
>   		rdata->read_into_pages = cifs_uncached_read_into_pages;
>   		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
>   		rdata->credits = credits;
> @@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
>   	u64 eof;
>   	pgoff_t eof_index;
>   	unsigned int nr_pages = rdata->nr_pages;
> +	unsigned int page_offset = rdata->page_offset;
>   
>   	/* determine the eof that the server (probably) has */
>   	eof = CIFS_I(rdata->mapping->host)->server_eof;
> @@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info *server,
>   	rdata->tailsz = PAGE_SIZE;
>   	for (i = 0; i < nr_pages; i++) {
>   		struct page *page = rdata->pages[i];
> -		size_t n = PAGE_SIZE;
> +		unsigned int to_read = rdata->pagesz;
> +		size_t n;
> +
> +		if (i == 0)
> +			to_read -= page_offset;
> +		else
> +			page_offset = 0;
> +
> +		n = to_read;
>   
> -		if (len >= PAGE_SIZE) {
> -			len -= PAGE_SIZE;
> +		if (len >= to_read) {
> +			len -= to_read;
>   		} else if (len > 0) {
>   			/* enough for partial page, fill and zero the rest */
> -			zero_user(page, len, PAGE_SIZE - len);
> +			zero_user(page, len + page_offset, to_read - len);
>   			n = rdata->tailsz = len;
>   			len = 0;
>   		} else if (page->index > eof_index) {
> @@ -3622,13 +3641,15 @@ readpages_fill_pages(struct TCP_Server_Info *server,
>   		}
>   
>   		if (iter)
> -			result = copy_page_from_iter(page, 0, n, iter);
> +			result = copy_page_from_iter(
> +					page, page_offset, n, iter);
>   #ifdef CONFIG_CIFS_SMB_DIRECT
>   		else if (rdata->mr)
>   			result = n;
>   #endif

Ditto previous comment/question.

>   		else
> -			result = cifs_read_page_from_socket(server, page, n);
> +			result = cifs_read_page_from_socket(
> +					server, page, page_offset, n);
>   		if (result < 0)
>   			break;
>   
> @@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
>   		rdata->bytes = bytes;
>   		rdata->pid = pid;
>   		rdata->pagesz = PAGE_SIZE;
> +		rdata->tailsz = PAGE_SIZE;
>   		rdata->read_into_pages = cifs_readpages_read_into_pages;
>   		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
>   		rdata->credits = credits;
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 7c0edd2..1fa1c29 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
>   			zero_user(page, len, PAGE_SIZE - len);
>   			len = 0;
>   		}
> -		length = cifs_read_page_from_socket(server, page, n);
> +		length = cifs_read_page_from_socket(server, page, 0, n);
>   		if (length < 0)
>   			return length;
>   		server->total_read += length;
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index a02f6b6..f603fbe 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
>   	struct smb_rqst rqst = { .rq_iov = rdata->iov,
>   				 .rq_nvec = 2,
>   				 .rq_pages = rdata->pages,
> +				 .rq_offset = rdata->page_offset,
>   				 .rq_npages = rdata->nr_pages,
>   				 .rq_pagesz = rdata->pagesz,
>   				 .rq_tailsz = rdata->tailsz };
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 04/15] CIFS: Add support for direct pages in wdata
  2018-05-30 19:47 ` [Patch v2 04/15] CIFS: Add support for direct pages in wdata Long Li
@ 2018-06-24  2:01   ` Tom Talpey
  2018-06-25 20:34     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:01 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Add a function to allocate wdata without allocating pages for data
> transfer. This gives the caller an option to pass a number of pages that
> point to the data buffer to write to.
> 
> wdata is reponsible for free those pages after it's done.

Same comment as for the earlier patch. "Caller" is responsible for
"freeing" those pages? Confusing, as worded.

> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsglob.h  |  2 +-
>   fs/cifs/cifsproto.h |  2 ++
>   fs/cifs/cifssmb.c   | 17 ++++++++++++++---
>   3 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 56864a87..7f62c98 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1205,7 +1205,7 @@ struct cifs_writedata {
>   	unsigned int			tailsz;
>   	unsigned int			credits;
>   	unsigned int			nr_pages;
> -	struct page			*pages[];
> +	struct page			**pages;

Also same comment as for earlier patch, these are syntactically
equivalent and maybe not needed to change.

>   };
>   
>   /*
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 1f27d8e..7933c5f 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
>   void cifs_writev_complete(struct work_struct *work);
>   struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
>   						work_func_t complete);
> +struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> +						work_func_t complete);
>   void cifs_writedata_release(struct kref *refcount);
>   int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
>   			  struct cifs_sb_info *cifs_sb,
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index c8e4278..5aca336 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
>   	if (wdata->cfile)
>   		cifsFileInfo_put(wdata->cfile);
>   
> +	kvfree(wdata->pages);
>   	kfree(wdata);
>   }
>   
> @@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct *work)
>   struct cifs_writedata *
>   cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
>   {
> +	struct page **pages =
> +		kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);

Why do you do a GFP_NOFS here but GFP_KERNEL in the earlier patch?

> +	if (pages)
> +		return cifs_writedata_direct_alloc(pages, complete);
> +
> +	return NULL;
> +}
> +
> +struct cifs_writedata *
> +cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
> +{
>   	struct cifs_writedata *wdata;
>   
> -	/* writedata + number of page pointers */
> -	wdata = kzalloc(sizeof(*wdata) +
> -			sizeof(struct page *) * nr_pages, GFP_NOFS);
> +	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
>   	if (wdata != NULL) {
> +		wdata->pages = pages;
>   		kref_init(&wdata->refcount);
>   		INIT_LIST_HEAD(&wdata->list);
>   		init_completion(&wdata->done);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size
  2018-05-30 19:47 ` [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size Long Li
@ 2018-06-24  2:07   ` Tom Talpey
  2018-06-25 21:07     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:07 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> It's possible that the page offset is non-zero in the pages in a request,
> change the function to calculate the correct data buffer length.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/transport.c | 20 +++++++++++++++++---
>   1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 927226a..d6b5523 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
>   	for (i = 0; i < rqst->rq_nvec; i++)
>   		buflen += iov[i].iov_len;
>   
> -	/* add in the page array if there is one */
> +	/*
> +	 * Add in the page array if there is one. The caller needs to make
> +	 * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> +	 * multiple pages ends at page boundary, rq_tailsz needs to be set to
> +	 * PAGE_SIZE.
> +	 */
>   	if (rqst->rq_npages) {
> -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> -		buflen += rqst->rq_tailsz;
> +		if (rqst->rq_npages == 1)
> +			buflen += rqst->rq_tailsz;
> +		else {
> +			/*
> +			 * If there is more than one page, calculate the
> +			 * buffer length based on rq_offset and rq_tailsz
> +			 */
> +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> +					rqst->rq_offset;
> +			buflen += rqst->rq_tailsz;
> +		}

Wouldn't it be simpler to keep the original code, but then just subtract
the rq_offset?

	buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
	buflen += rqst->rq_tailsz;
	buflen -= rqst->rq_offset;

It's kind of confusing as written. Also, what if it's just one page, but
has a non-zero offset? Is that somehow not possible? My suggested code
would take that into account, yours doesn't.

Tom.

>   	}
>   
>   	return buflen;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-05-30 19:47 ` [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
@ 2018-06-24  2:09   ` Tom Talpey
  2018-06-25 21:14     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:09 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:47 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Introduce a function rqst_page_get_length to return the page offset and
> length for a given page in smb_rqst. This function is to be used by
> following patches.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsproto.h |  3 +++
>   fs/cifs/misc.c      | 17 +++++++++++++++++
>   2 files changed, 20 insertions(+)
> 
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 7933c5f..89dda14 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
>   		    struct sdesc **sdesc);
>   void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
>   
> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> +				unsigned int *len, unsigned int *offset);
> +
>   #endif			/* _CIFSPROTO_H */
> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
> index 96849b5..e951417 100644
> --- a/fs/cifs/misc.c
> +++ b/fs/cifs/misc.c
> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
>   		crypto_free_shash(*shash);
>   	*shash = NULL;
>   }
> +
> +/**
> + * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
> + * Input: rqst - a smb_rqst, page - a page index for rqst
> + * Output: *len - the length for this page, *offset - the offset for this page
> + */
> +void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
> +				unsigned int *len, unsigned int *offset)
> +{
> +	*len = rqst->rq_pagesz;
> +	*offset = (page == 0) ? rqst->rq_offset : 0;

Really? Page 0 always has a zero offset??

> +
> +	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> +		*len = rqst->rq_tailsz;
> +	else if (page == 0)
> +		*len = rqst->rq_pagesz - rqst->rq_offset;
> +}
> 

This subroutine does what patch 5 does inline. Why not push this patch
up in the sequence and use the helper?

Tom.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send
  2018-05-30 19:48 ` [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send Long Li
@ 2018-06-24  2:11   ` Tom Talpey
  2018-06-25 21:23     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:11 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> The RDMA send function needs to look at offset in the request pages, and
> send data starting from there.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index c62f7c9..6141e3c 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -17,6 +17,7 @@
>   #include <linux/highmem.h>
>   #include "smbdirect.h"
>   #include "cifs_debug.h"
> +#include "cifsproto.h"
>   
>   static struct smbd_response *get_empty_queue_buffer(
>   		struct smbd_connection *info);
> @@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   	struct kvec vec;
>   	int nvecs;
>   	int size;
> -	int buflen = 0, remaining_data_length;
> +	unsigned int buflen = 0, remaining_data_length;
>   	int start, i, j;
>   	int max_iov_size =
>   		info->max_send_size - sizeof(struct smbd_data_transfer);
> @@ -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   		buflen += iov[i].iov_len;
>   	}
>   
> -	/* add in the page array if there is one */
> +	/*
> +	 * Add in the page array if there is one. The caller needs to set
> +	 * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
> +	 * ends at page boundary
> +	 */
>   	if (rqst->rq_npages) {
> -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> -		buflen += rqst->rq_tailsz;
> +		if (rqst->rq_npages == 1)
> +			buflen += rqst->rq_tailsz;
> +		else
> +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> +					rqst->rq_offset + rqst->rq_tailsz;
>   	}

This code is really confusing and redundant. It tests npages > 0,
then tests npages == 1, then does an else. Why not call the helper
like the following smbd_send()?

Tom.

>   
>   	if (buflen + sizeof(struct smbd_data_transfer) >
> @@ -2213,8 +2221,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   
>   	/* now sending pages if there are any */
>   	for (i = 0; i < rqst->rq_npages; i++) {
> -		buflen = (i == rqst->rq_npages-1) ?
> -			rqst->rq_tailsz : rqst->rq_pagesz;
> +		unsigned int offset;
> +
> +		rqst_page_get_length(rqst, i, &buflen, &offset);
>   		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
>   		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
>   			buflen, nvecs);
> @@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
>   			remaining_data_length -= size;
>   			log_write(INFO, "sending pages i=%d offset=%d size=%d"
>   				" remaining_data_length=%d\n",
> -				i, j*max_iov_size, size, remaining_data_length);
> +				i, j*max_iov_size+offset, size,
> +				remaining_data_length);
>   			rc = smbd_post_send_page(
> -				info, rqst->rq_pages[i], j*max_iov_size,
> +				info, rqst->rq_pages[i],
> +				j*max_iov_size + offset,
>   				size, remaining_data_length);
>   			if (rc)
>   				goto done;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv
  2018-05-30 19:48 ` [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv Long Li
@ 2018-06-24  2:16   ` Tom Talpey
  2018-06-25 21:29     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:16 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> RDMA recv function needs to place data to the correct place starting at
> page offset.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/smbdirect.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index 6141e3c..ba53c52 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>    * return value: actual data read
>    */
>   static int smbd_recv_page(struct smbd_connection *info,
> -		struct page *page, unsigned int to_read)
> +		struct page *page, unsigned int page_offset,
> +		unsigned int to_read)
>   {
>   	int ret;
>   	char *to_address;
> +	void *page_address;
>   
>   	/* make sure we have the page ready for read */
>   	ret = wait_event_interruptible(
> @@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct smbd_connection *info,
>   		info->reassembly_data_length >= to_read ||
>   			info->transport_status != SMBD_CONNECTED);
>   	if (ret)
> -		return 0;
> +		return ret;
>   
>   	/* now we can read from reassembly queue and not sleep */
> -	to_address = kmap_atomic(page);
> +	page_address = kmap_atomic(page);
> +	to_address = (char *) page_address + page_offset;
>   
>   	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
>   		page, to_address, to_read);
>   
>   	ret = smbd_recv_buf(info, to_address, to_read);
> -	kunmap_atomic(to_address);
> +	kunmap_atomic(page_address);

Is "page" truly not mapped? This kmap/kunmap for each received 4KB is
very expensive. Is there not a way to keep a kva for the reassembly
queue segments?

>   
>   	return ret;
>   }
> @@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>   {
>   	char *buf;
>   	struct page *page;
> -	unsigned int to_read;
> +	unsigned int to_read, page_offset;
>   	int rc;
>   
>   	info->smbd_recv_pending++;
> @@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>   
>   	case READ | ITER_BVEC:
>   		page = msg->msg_iter.bvec->bv_page;
> +		page_offset = msg->msg_iter.bvec->bv_offset;
>   		to_read = msg->msg_iter.bvec->bv_len;
> -		rc = smbd_recv_page(info, page, to_read);
> +		rc = smbd_recv_page(info, page, page_offset, to_read);
>   		break;
>   
>   	default:
>   		/* It's a bug in upper layer to get there */
>   		cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
>   			msg->msg_iter.type);
> -		rc = -EIO;
> +		rc = -EINVAL;
>   	}
>   
>   	info->smbd_recv_pending--;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration
  2018-05-30 19:48 ` [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration Long Li
@ 2018-06-24  2:24   ` Tom Talpey
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:24 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Change code to pass the correct page offset during memory registration for
> RDMA read/write.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/smb2pdu.c   | 18 ++++++++-----
>   fs/cifs/smbdirect.c | 76 +++++++++++++++++++++++++++++++----------------------
>   fs/cifs/smbdirect.h |  2 +-
>   3 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
> index f603fbe..fc30774 100644
> --- a/fs/cifs/smb2pdu.c
> +++ b/fs/cifs/smb2pdu.c
> @@ -2623,8 +2623,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
>   
>   		rdata->mr = smbd_register_mr(
>   				server->smbd_conn, rdata->pages,
> -				rdata->nr_pages, rdata->tailsz,
> -				true, need_invalidate);
> +				rdata->nr_pages, rdata->page_offset,
> +				rdata->tailsz, true, need_invalidate);
>   		if (!rdata->mr)
>   			return -ENOBUFS;
>   
> @@ -3013,16 +3013,22 @@ smb2_async_writev(struct cifs_writedata *wdata,
>   
>   		wdata->mr = smbd_register_mr(
>   				server->smbd_conn, wdata->pages,
> -				wdata->nr_pages, wdata->tailsz,
> -				false, need_invalidate);
> +				wdata->nr_pages, wdata->page_offset,
> +				wdata->tailsz, false, need_invalidate);
>   		if (!wdata->mr) {
>   			rc = -ENOBUFS;
>   			goto async_writev_out;
>   		}
>   		req->Length = 0;
>   		req->DataOffset = 0;
> -		req->RemainingBytes =
> -			cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
> +		if (wdata->nr_pages > 1)
> +			req->RemainingBytes =
> +				cpu_to_le32(
> +					(wdata->nr_pages - 1) * wdata->pagesz -
> +					wdata->page_offset + wdata->tailsz
> +				);
> +		else
> +			req->RemainingBytes = cpu_to_le32(wdata->tailsz);

Again, I think a helper that computed and returned this size would be
much clearer and compact. And I still am incredulous that a single page
io always has an offset of zero. :-)

>   		req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
>   		if (need_invalidate)
>   			req->Channel = SMB2_CHANNEL_RDMA_V1;
> diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
> index ba53c52..e459c97 100644
> --- a/fs/cifs/smbdirect.c
> +++ b/fs/cifs/smbdirect.c
> @@ -2299,37 +2299,37 @@ static void smbd_mr_recovery_work(struct work_struct *work)
>   		if (smbdirect_mr->state == MR_INVALIDATED ||
>   			smbdirect_mr->state == MR_ERROR) {
>   
> -			if (smbdirect_mr->state == MR_INVALIDATED) {
> +			/* recover this MR entry */
> +			rc = ib_dereg_mr(smbdirect_mr->mr);
> +			if (rc) {
> +				log_rdma_mr(ERR,
> +					"ib_dereg_mr failed rc=%x\n",
> +					rc);
> +				smbd_disconnect_rdma_connection(info);
> +				continue;
> +			}

Ok, we discussed this ib_dereg_mr() call at the plugfest last week.
It's unnecessary - the MR is reusable and does not need to be destroyed
after each use.

> +
> +			smbdirect_mr->mr = ib_alloc_mr(
> +				info->pd, info->mr_type,
> +				info->max_frmr_depth);
> +			if (IS_ERR(smbdirect_mr->mr)) {
> +				log_rdma_mr(ERR,
> +					"ib_alloc_mr failed mr_type=%x "
> +					"max_frmr_depth=%x\n",
> +					info->mr_type,
> +					info->max_frmr_depth);
> +				smbd_disconnect_rdma_connection(info);
> +				continue;
> +			}
> +

Not needed, for the same reason above.

> +			if (smbdirect_mr->state == MR_INVALIDATED)
>   				ib_dma_unmap_sg(
>   					info->id->device, smbdirect_mr->sgl,
>   					smbdirect_mr->sgl_count,
>   					smbdirect_mr->dir);
> -				smbdirect_mr->state = MR_READY;

As we observed, the smbdirect_mr is not protected by a lock, therefore
this MR_READY state transition needs a memory barrier in front of it!

> -			} else if (smbdirect_mr->state == MR_ERROR) {
> -
> -				/* recover this MR entry */
> -				rc = ib_dereg_mr(smbdirect_mr->mr);
> -				if (rc) {
> -					log_rdma_mr(ERR,
> -						"ib_dereg_mr failed rc=%x\n",
> -						rc);
> -					smbd_disconnect_rdma_connection(info);
> -				}
Why are you deleting the MR_ERROR handling? It seems this is precisely
the place where the MR needs to be destroyed, to prevent any later RDMA
operations from potentially reaching the original memory.

>   
> -				smbdirect_mr->mr = ib_alloc_mr(
> -					info->pd, info->mr_type,
> -					info->max_frmr_depth);
> -				if (IS_ERR(smbdirect_mr->mr)) {
> -					log_rdma_mr(ERR,
> -						"ib_alloc_mr failed mr_type=%x "
> -						"max_frmr_depth=%x\n",
> -						info->mr_type,
> -						info->max_frmr_depth);
> -					smbd_disconnect_rdma_connection(info);
> -				}
> +			smbdirect_mr->state = MR_READY;
>   
> -				smbdirect_mr->state = MR_READY;
> -			}
>   			/* smbdirect_mr->state is updated by this function
>   			 * and is read and updated by I/O issuing CPUs trying
>   			 * to get a MR, the call to atomic_inc_return
> @@ -2475,7 +2475,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
>    */
>   struct smbd_mr *smbd_register_mr(
>   	struct smbd_connection *info, struct page *pages[], int num_pages,
> -	int tailsz, bool writing, bool need_invalidate)
> +	int offset, int tailsz, bool writing, bool need_invalidate)
>   {
>   	struct smbd_mr *smbdirect_mr;
>   	int rc, i;
> @@ -2498,17 +2498,31 @@ struct smbd_mr *smbd_register_mr(
>   	smbdirect_mr->sgl_count = num_pages;
>   	sg_init_table(smbdirect_mr->sgl, num_pages);
>   
> -	for (i = 0; i < num_pages - 1; i++)
> -		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
> +	log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n",
> +			num_pages, offset, tailsz);
>   
> +	if (num_pages == 1) {
> +		sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
> +		goto skip_multiple_pages;

A simple "else" would be much preferable to this "goto".

> +	}
> +
> +	/* We have at least two pages to register */
> +	sg_set_page(
> +		&smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
> +	i = 1;
> +	while (i < num_pages - 1) {
> +		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
> +		i++;
> +	}
>   	sg_set_page(&smbdirect_mr->sgl[i], pages[i],
>   		tailsz ? tailsz : PAGE_SIZE, 0);
>   
> +skip_multiple_pages:
>   	dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
>   	smbdirect_mr->dir = dir;
>   	rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
>   	if (!rc) {
> -		log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
> +		log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
>   			num_pages, dir, rc);
>   		goto dma_map_error;
>   	}
> @@ -2516,8 +2530,8 @@ struct smbd_mr *smbd_register_mr(
>   	rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
>   		NULL, PAGE_SIZE);
>   	if (rc != num_pages) {
> -		log_rdma_mr(INFO,
> -			"ib_map_mr_sg failed rc = %x num_pages = %x\n",
> +		log_rdma_mr(ERR,
> +			"ib_map_mr_sg failed rc = %d num_pages = %x\n",
>   			rc, num_pages);
>   		goto map_mr_error;
>   	}
> diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
> index f9038da..1e419c2 100644
> --- a/fs/cifs/smbdirect.h
> +++ b/fs/cifs/smbdirect.h
> @@ -321,7 +321,7 @@ struct smbd_mr {
>   /* Interfaces to register and deregister MR for RDMA read/write */
>   struct smbd_mr *smbd_register_mr(
>   	struct smbd_connection *info, struct page *pages[], int num_pages,
> -	int tailsz, bool writing, bool need_invalidate);
> +	int offset, int tailsz, bool writing, bool need_invalidate);
>   int smbd_deregister_mr(struct smbd_mr *mr);
>   
>   #else
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 11/15] CIFS: Pass page offset for calculating signature
  2018-05-30 19:48 ` [Patch v2 11/15] CIFS: Pass page offset for calculating signature Long Li
@ 2018-06-24  2:27   ` Tom Talpey
  2018-06-26  4:15     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:27 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> When calculating signature for the packet, it needs to read into the
> correct page offset for the data.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsencrypt.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
> index a6ef088..e88303c 100644
> --- a/fs/cifs/cifsencrypt.c
> +++ b/fs/cifs/cifsencrypt.c
> @@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
>   
>   	/* now hash over the rq_pages array */
>   	for (i = 0; i < rqst->rq_npages; i++) {
> -		void *kaddr = kmap(rqst->rq_pages[i]);
> -		size_t len = rqst->rq_pagesz;
> +		void *kaddr;
> +		unsigned int len, offset;
>   
> -		if (i == rqst->rq_npages - 1)
> -			len = rqst->rq_tailsz;
> +		rqst_page_get_length(rqst, i, &len, &offset);
> +
> +		kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;

I suppose it's more robust to map a page at a time, but it's pretty
expensive. Is this the only way to iterate over a potentially very
large block of data? For example, a 1MB segment means 256 kmap/kunmaps.

Tom.

>   
>   		crypto_shash_update(shash, kaddr, len);
>   
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 12/15] CIFS: Pass page offset for encrypting
  2018-05-30 19:48 ` [Patch v2 12/15] CIFS: Pass page offset for encrypting Long Li
@ 2018-06-24  2:28   ` Tom Talpey
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:28 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Encryption function needs to read data starting page offset from input
> buffer.
> 
> This doesn't affect decryption path since it allocates its own page
> buffers.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/smb2ops.c | 20 +++++++++++++-------
>   1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 1fa1c29..38d19b6 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2189,9 +2189,10 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
>   		smb2_sg_set_buf(&sg[i], rqst->rq_iov[i].iov_base,
>   						rqst->rq_iov[i].iov_len);
>   	for (j = 0; i < sg_len - 1; i++, j++) {
> -		unsigned int len = (j < rqst->rq_npages - 1) ? rqst->rq_pagesz
> -							: rqst->rq_tailsz;
> -		sg_set_page(&sg[i], rqst->rq_pages[j], len, 0);
> +		unsigned int len, offset;
> +
> +		rqst_page_get_length(rqst, j, &len, &offset);
> +		sg_set_page(&sg[i], rqst->rq_pages[j], len, offset);
>   	}
>   	smb2_sg_set_buf(&sg[sg_len - 1], sign, SMB2_SIGNATURE_SIZE);
>   	return sg;
> @@ -2332,6 +2333,7 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
>   		return rc;
>   
>   	new_rq->rq_pages = pages;
> +	new_rq->rq_offset = old_rq->rq_offset;
>   	new_rq->rq_npages = old_rq->rq_npages;
>   	new_rq->rq_pagesz = old_rq->rq_pagesz;
>   	new_rq->rq_tailsz = old_rq->rq_tailsz;
> @@ -2363,10 +2365,14 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
>   
>   	/* copy pages form the old */
>   	for (i = 0; i < npages; i++) {
> -		char *dst = kmap(new_rq->rq_pages[i]);
> -		char *src = kmap(old_rq->rq_pages[i]);
> -		unsigned int len = (i < npages - 1) ? new_rq->rq_pagesz :
> -							new_rq->rq_tailsz;
> +		char *dst, *src;
> +		unsigned int offset, len;
> +
> +		rqst_page_get_length(new_rq, i, &len, &offset);
> +
> +		dst = (char *) kmap(new_rq->rq_pages[i]) + offset;
> +		src = (char *) kmap(old_rq->rq_pages[i]) + offset;

Ouch! TWO kmap/kunmaps per page of data? Is there not already a kva,
at least for the destination (message)?

Tom.


> +
>   		memcpy(dst, src, len);
>   		kunmap(new_rq->rq_pages[i]);
>   		kunmap(old_rq->rq_pages[i]);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 13/15] CIFS: Add support for direct I/O read
  2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
  2018-06-02  5:51   ` kbuild test robot
  2018-06-02  7:15   ` kbuild test robot
@ 2018-06-24  2:39   ` Tom Talpey
  2018-06-26  4:34     ` Long Li
  2 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:39 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma



On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Implement the function for direct I/O read. It doesn't support AIO, which
> will be implemented in a follow up patch.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsfs.h |   1 +
>   fs/cifs/file.c   | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 150 insertions(+)
> 
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 5f02318..7fba9aa 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
>   extern int cifs_close(struct inode *inode, struct file *file);
>   extern int cifs_closedir(struct inode *inode, struct file *file);
>   extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
> +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>   extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>   extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
>   extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 87eece6..e6e6f24 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages)
>   	return rc;
>   }
>   
> +static void cifs_direct_readdata_release(struct kref *refcount)
> +{
> +	struct cifs_readdata *rdata = container_of(refcount,
> +					struct cifs_readdata, refcount);
> +	unsigned int i;
> +
> +	for (i = 0; i < rdata->nr_pages; i++)
> +		put_page(rdata->pages[i]);
> +
> +	cifs_readdata_release(refcount);
> +}
> +
>   static void
>   cifs_uncached_readdata_release(struct kref *refcount)
>   {
> @@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
>   		complete(&ctx->done);
>   }
>   
> +static void cifs_direct_readv_complete(struct work_struct *work)
> +{
> +	struct cifs_readdata *rdata =
> +		container_of(work, struct cifs_readdata, work);
> +
> +	complete(&rdata->done);
> +	kref_put(&rdata->refcount, cifs_direct_readdata_release);
> +}
> +
> +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
> +{
> +	size_t len, cur_len, start;
> +	unsigned int npages, rsize, credits;
> +	struct file *file;
> +	struct cifs_sb_info *cifs_sb;
> +	struct cifsFileInfo *cfile;
> +	struct cifs_tcon *tcon;
> +	struct page **pagevec;
> +	ssize_t rc, total_read = 0;
> +	struct TCP_Server_Info *server;
> +	loff_t offset = iocb->ki_pos;
> +	pid_t pid;
> +	struct cifs_readdata *rdata;
> +
> +	/*
> +	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> +	 * fall back to data copy read path
> +	 */
> +	if (to->type & ITER_KVEC) {
> +		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
> +		return cifs_user_readv(iocb, to);
> +	}
> +
> +	len = iov_iter_count(to);
> +	if (!len)
> +		return 0;
> +
> +	file = iocb->ki_filp;
> +	cifs_sb = CIFS_FILE_SB(file);
> +	cfile = file->private_data;
> +	tcon = tlink_tcon(cfile->tlink);
> +	server = tcon->ses->server;
> +
> +	if (!server->ops->async_readv)
> +		return -ENOSYS;
> +
> +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> +		pid = cfile->pid;
> +	else
> +		pid = current->tgid;
> +
> +	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> +		cifs_dbg(FYI, "attempting read on write only file instance\n");

Confusing. Maybe "attempting read on write-only filehandle"?

> +
> +	do {
> +		rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> +					&rsize, &credits);
> +		if (rc)
> +			break;
> +
> +		cur_len = min_t(const size_t, len, rsize);
> +
> +		rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
> +		if (rc < 0) {
> +			cifs_dbg(VFS,
> +				"couldn't get user pages (rc=%zd) iter type %d"
> +				" iov_offset %lu count %lu\n",
> +				rc, to->type, to->iov_offset, to->count);
> +			dump_stack();
> +			break;
> +		}
> +
> +		rdata = cifs_readdata_direct_alloc(
> +				pagevec, cifs_direct_readv_complete);
> +		if (!rdata) {
> +			add_credits_and_wake_if(server, credits, 0);
> +			rc = -ENOMEM;
> +			break;
> +		}
> +
> +		npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
> +		rdata->nr_pages = npages;
> +		rdata->page_offset = start;
> +		rdata->pagesz = PAGE_SIZE;
> +		rdata->tailsz = npages > 1 ?
> +				rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> +				rc;

This expression makes my head hurt. Surely it can be simplified, or
expressed in a clearer way.

> +		cur_len = rc;
> +
> +		rdata->cfile = cifsFileInfo_get(cfile);
> +		rdata->offset = offset;
> +		rdata->bytes = rc;
> +		rdata->pid = pid;
> +		rdata->read_into_pages = cifs_uncached_read_into_pages;
> +		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> +		rdata->credits = credits;
> +
> +		rc = 0;
> +		if (rdata->cfile->invalidHandle)
> +			rc = cifs_reopen_file(rdata->cfile, true);
> +
> +		if (!rc)
> +			rc = server->ops->async_readv(rdata);
> +
> +		if (rc) {

This whole rc thing is messy. Initializing to zero, setting only in
one case, then testing the result, then setting it again, is twisted.
I actually think a goto or two would read much more clearly.

> +			add_credits_and_wake_if(server, rdata->credits, 0);
> +			kref_put(&rdata->refcount,
> +				 cifs_direct_readdata_release);
> +			if (rc == -EAGAIN)
> +				continue;
> +			break;

It's worth a comment here that this either breaks or continues the
entire do {} while (); and btw when it breaks it does *not* return "rc".
Again, maybe a goto instead of a break?

> +		}
> +
> +		wait_for_completion(&rdata->done);
> +		rc = rdata->result;
> +		if (rc) {
> +			kref_put(
> +				&rdata->refcount,
> +				cifs_direct_readdata_release);
> +			if (rc == -EAGAIN)
> +				continue;
> +			break;

Ditto.

> +		}
> +
> +		total_read += rdata->got_bytes;
> +		kref_put(&rdata->refcount, cifs_direct_readdata_release);
> +
> +		iov_iter_advance(to, cur_len);
> +		len -= cur_len;
> +		offset += cur_len;
> +	} while (len);
> +
> +	iocb->ki_pos += total_read;
> +
> +	return total_read;
> +}
> +
>   ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
>   {
>   	struct file *file = iocb->ki_filp;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
  2018-05-30 19:48 ` [Patch v2 14/15] CIFS: Add support for direct I/O write Long Li
@ 2018-06-24  2:48   ` Tom Talpey
  2018-06-26  4:39     ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-24  2:48 UTC (permalink / raw)
  To: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 5/30/2018 3:48 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Implement the function for direct I/O write. It doesn't support AIO, which
> will be implemented in a follow up patch.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   fs/cifs/cifsfs.h |   1 +
>   fs/cifs/file.c   | 165 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 166 insertions(+)
> 
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 7fba9aa..e9c5103 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
>   extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>   extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>   extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
>   extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
>   extern int cifs_lock(struct file *, int, struct file_lock *);
>   extern int cifs_fsync(struct file *, loff_t, loff_t, int);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index e6e6f24..8c385b1 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref *refcount)
>   
>   static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
>   
> +static void cifs_direct_writedata_release(struct kref *refcount)
> +{
> +	int i;
> +	struct cifs_writedata *wdata = container_of(refcount,
> +					struct cifs_writedata, refcount);
> +
> +	for (i = 0; i < wdata->nr_pages; i++)
> +		put_page(wdata->pages[i]);
> +
> +	cifs_writedata_release(refcount);
> +}
> +
> +static void cifs_direct_writev_complete(struct work_struct *work)
> +{
> +	struct cifs_writedata *wdata = container_of(work,
> +					struct cifs_writedata, work);
> +	struct inode *inode = d_inode(wdata->cfile->dentry);
> +	struct cifsInodeInfo *cifsi = CIFS_I(inode);
> +
> +	spin_lock(&inode->i_lock);
> +	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> +	if (cifsi->server_eof > inode->i_size)
> +		i_size_write(inode, cifsi->server_eof);
> +	spin_unlock(&inode->i_lock);
> +
> +	complete(&wdata->done);
> +	kref_put(&wdata->refcount, cifs_direct_writedata_release);
> +}
> +
>   static void
>   cifs_uncached_writev_complete(struct work_struct *work)
>   {
> @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
>   		complete(&ctx->done);
>   }
>   
> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> +{
> +	struct file *file = iocb->ki_filp;
> +	ssize_t total_written = 0;
> +	struct cifsFileInfo *cfile;
> +	struct cifs_tcon *tcon;
> +	struct cifs_sb_info *cifs_sb;
> +	struct TCP_Server_Info *server;
> +	pid_t pid;
> +	unsigned long nr_pages;
> +	loff_t offset = iocb->ki_pos;
> +	size_t len = iov_iter_count(from);
> +	int rc;
> +	struct cifs_writedata *wdata;
> +
> +	/*
> +	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> +	 * In this case, fall back to non-direct write function.
> +	 */
> +	if (from->type & ITER_KVEC) {
> +		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
> +		return cifs_user_writev(iocb, from);
> +	}
> +
> +	rc = generic_write_checks(iocb, from);
> +	if (rc <= 0)
> +		return rc;
> +
> +	cifs_sb = CIFS_FILE_SB(file);
> +	cfile = file->private_data;
> +	tcon = tlink_tcon(cfile->tlink);
> +	server = tcon->ses->server;
> +
> +	if (!server->ops->async_writev)
> +		return -ENOSYS;
> +
> +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> +		pid = cfile->pid;
> +	else
> +		pid = current->tgid;
> +
> +	do {
> +		unsigned int wsize, credits;
> +		struct page **pagevec;
> +		size_t start;
> +		ssize_t cur_len;
> +
> +		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> +						   &wsize, &credits);
> +		if (rc)
> +			break;
> +
> +		cur_len = iov_iter_get_pages_alloc(
> +				from, &pagevec, wsize, &start);
> +		if (cur_len < 0) {
> +			cifs_dbg(VFS,
> +				"direct_writev couldn't get user pages "
> +				"(rc=%zd) iter type %d iov_offset %lu count"
> +				" %lu\n",
> +				cur_len, from->type,
> +				from->iov_offset, from->count);
> +			dump_stack();
> +			break;
> +		}
> +		if (cur_len < 0)
> +			break;

This cur_len < 0 test is redundant with the prior if(), delete.
> +
> +		nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;

Am I misreading, or will this return be one more page than needed? If
start (the first byte offset) is > 0, nr_pages will already be one.
And if cur_len is 4KB, even if start is 0, nr_pages will be two.

> +
> +		wdata = cifs_writedata_direct_alloc(pagevec,
> +					     cifs_direct_writev_complete);
> +		if (!wdata) {
> +			rc = -ENOMEM;
> +			add_credits_and_wake_if(server, credits, 0);
> +			break;
> +		}
> +
> +		wdata->nr_pages = nr_pages;
> +		wdata->page_offset = start;
> +		wdata->pagesz = PAGE_SIZE;
> +		wdata->tailsz =
> +			nr_pages > 1 ?
> +			cur_len - (PAGE_SIZE - start) -
> +				(nr_pages - 2) * PAGE_SIZE :
> +			cur_len;
> +
> +		wdata->sync_mode = WB_SYNC_ALL;
> +		wdata->offset = (__u64)offset;
> +		wdata->cfile = cifsFileInfo_get(cfile);
> +		wdata->pid = pid;
> +		wdata->bytes = cur_len;
> +		wdata->credits = credits;
> +
> +		rc = 0;
> +		if (wdata->cfile->invalidHandle)
> +			rc = cifs_reopen_file(wdata->cfile, false);
> +
> +		if (!rc)
> +			rc = server->ops->async_writev(wdata,
> +					cifs_direct_writedata_release);
> +
> +		if (rc) {
> +			add_credits_and_wake_if(server, wdata->credits, 0);
> +			kref_put(&wdata->refcount,
> +				 cifs_direct_writedata_release);
> +			if (rc == -EAGAIN)
> +				continue;
> +			break;
> +		}

Same comments as for previous patch re the if (rc) ladder, and
the break/continues both being better expressed as careful goto's.

> +
> +		wait_for_completion(&wdata->done);
> +		if (wdata->result) {
> +			rc = wdata->result;
> +			kref_put(&wdata->refcount,
> +					cifs_direct_writedata_release);
> +			if (rc == -EAGAIN)
> +				continue;
> +			break;
> +		}
> +
> +		kref_put(&wdata->refcount, cifs_direct_writedata_release);
> +
> +		iov_iter_advance(from, cur_len);
> +		total_written += cur_len;
> +		offset += cur_len;
> +		len -= cur_len;
> +	} while (len);
> +
> +	if (unlikely(!total_written))
> +		return rc;
> +
> +	iocb->ki_pos += total_written;
> +	return total_written;
> +
> +}
> +
>   ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
>   {
>   	struct file *file = iocb->ki_filp;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-06-24  1:50   ` Tom Talpey
@ 2018-06-25 20:25     ` Long Li
  2018-06-25 21:01     ` Jason Gunthorpe
  1 sibling, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 20:25 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> From: Tom Talpey <tom@talpey.com>
> Sent: Saturday, June 23, 2018 6:50 PM
> To: Long Li <longli@microsoft.com>; Steve French <sfrench@samba.org>;
> linux-cifs@vger.kernel.org; samba-technical@lists.samba.org; linux-
> kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
> 
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Add a function to allocate rdata without allocating pages for data
> > transfer. This gives the caller an option to pass a number of pages
> > that point to the data buffer.
> >
> > rdata is still reponsible for free those pages after it's done.
> 
> "Caller" is still responsible? Or is the rdata somehow freeing itself via another
> mechanism?

I meant to say free the array "struct page **pages", not the pages themselves.

Before the patch, the page array is allocated at the end of the struct cifs_readdata:

      rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
                       GFP_KERNEL);

If this is the case, they are automatically freed when cifs_readdata is freed.

With the direct I/O patch, the page array is handled separately. So they need to be freed after being used.

> 
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsglob.h |  2 +-
> >   fs/cifs/file.c     | 23 ++++++++++++++++++++---
> >   2 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> > 8d16c3e..56864a87 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1179,7 +1179,7 @@ struct cifs_readdata {
> >   	unsigned int			tailsz;
> >   	unsigned int			credits;
> >   	unsigned int			nr_pages;
> > -	struct page			*pages[];
> > +	struct page			**pages;
> 
> Technically speaking, these are syntactically equivalent. It may not be worth
> changing this historic definition.
> 
> Tom.
> 
> 
> >   };
> >
> >   struct cifs_writedata;
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..1c98293
> > 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct
> iov_iter *from)
> >   }
> >
> >   static struct cifs_readdata *
> > -cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
> > +cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
> >   {
> >   	struct cifs_readdata *rdata;
> >
> > -	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
> > -			GFP_KERNEL);
> > +	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
> >   	if (rdata != NULL) {
> > +		rdata->pages = pages;
> >   		kref_init(&rdata->refcount);
> >   		INIT_LIST_HEAD(&rdata->list);
> >   		init_completion(&rdata->done);
> > @@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages,
> work_func_t complete)
> >   	return rdata;
> >   }
> >
> > +static struct cifs_readdata *
> > +cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) {
> > +	struct page **pages =
> > +		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
> > +	struct cifs_readdata *ret = NULL;
> > +
> > +	if (pages) {
> > +		ret = cifs_readdata_direct_alloc(pages, complete);
> > +		if (!ret)
> > +			kfree(pages);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >   void
> >   cifs_readdata_release(struct kref *refcount)
> >   {
> > @@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
> >   	if (rdata->cfile)
> >   		cifsFileInfo_put(rdata->cfile);
> >
> > +	kvfree(rdata->pages);
> >   	kfree(rdata);
> >   }
> >
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 03/15] CIFS: Use offset when reading pages
  2018-06-24  1:58   ` Tom Talpey
@ 2018-06-25 20:27     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 20:27 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> From: Tom Talpey <tom@talpey.com>
> Sent: Saturday, June 23, 2018 6:59 PM
> To: Long Li <longli@microsoft.com>; Steve French <sfrench@samba.org>;
> linux-cifs@vger.kernel.org; samba-technical@lists.samba.org; linux-
> kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [Patch v2 03/15] CIFS: Use offset when reading pages
> 
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > With offset defined in rdata, transport functions need to look at this
> > offset when reading data into the correct places in pages.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsproto.h |  4 +++-
> >   fs/cifs/connect.c   |  5 +++--
> >   fs/cifs/file.c      | 52 +++++++++++++++++++++++++++++++++++++--------
> -------
> >   fs/cifs/smb2ops.c   |  2 +-
> >   fs/cifs/smb2pdu.c   |  1 +
> >   5 files changed, 45 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > dc80f84..1f27d8e 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid,
> bool malformed);
> >   extern int cifs_read_from_socket(struct TCP_Server_Info *server, char
> *buf,
> >   			         unsigned int to_read);
> >   extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
> > -				      struct page *page, unsigned int to_read);
> > +					struct page *page,
> > +					unsigned int page_offset,
> > +					unsigned int to_read);
> >   extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
> >   			       struct cifs_sb_info *cifs_sb);
> >   extern int cifs_match_super(struct super_block *, void *); diff
> > --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 83b0234..8501da0
> > 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info
> > *server, char *buf,
> >
> >   int
> >   cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page
> *page,
> > -		      unsigned int to_read)
> > +	unsigned int page_offset, unsigned int to_read)
> >   {
> >   	struct msghdr smb_msg;
> > -	struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
> > +	struct bio_vec bv = {
> > +		.bv_page = page, .bv_len = to_read, .bv_offset =
> page_offset};
> >   	iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1,
> to_read);
> >   	return cifs_readv_from_socket(server, &smb_msg);
> >   }
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 1c98293..87eece6
> > 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info
> *server,
> >   	int result = 0;
> >   	unsigned int i;
> >   	unsigned int nr_pages = rdata->nr_pages;
> > +	unsigned int page_offset = rdata->page_offset;
> >
> >   	rdata->got_bytes = 0;
> >   	rdata->tailsz = PAGE_SIZE;
> >   	for (i = 0; i < nr_pages; i++) {
> >   		struct page *page = rdata->pages[i];
> >   		size_t n;
> > +		unsigned int segment_size = rdata->pagesz;
> > +
> > +		if (i == 0)
> > +			segment_size -= page_offset;
> > +		else
> > +			page_offset = 0;
> > +
> >
> >   		if (len <= 0) {
> >   			/* no need to hold page hostage */ @@ -3040,24
> +3048,25 @@
> > uncached_fill_pages(struct TCP_Server_Info *server,
> >   			put_page(page);
> >   			continue;
> >   		}
> > +
> >   		n = len;
> > -		if (len >= PAGE_SIZE) {
> > +		if (len >= segment_size)
> >   			/* enough data to fill the page */
> > -			n = PAGE_SIZE;
> > -			len -= n;
> > -		} else {
> > -			zero_user(page, len, PAGE_SIZE - len);
> > +			n = segment_size;
> > +		else
> >   			rdata->tailsz = len;
> > -			len = 0;
> > -		}
> > +		len -= n;
> > +
> >   		if (iter)
> > -			result = copy_page_from_iter(page, 0, n, iter);
> > +			result = copy_page_from_iter(
> > +					page, page_offset, n, iter);
> >   #ifdef CONFIG_CIFS_SMB_DIRECT
> >   		else if (rdata->mr)
> >   			result = n;
> >   #endif
> 
> I hadn't noticed this type of xonditional code before. It's kind of ugly to
> modify the data handling code like this. Do you plan to break this out into an
> smbdirect-specific hander, like the cases above and below?

This is a good point. I will look for ways to improve this.

> 
> >   		else
> > -			result = cifs_read_page_from_socket(server, page,
> n);
> > +			result = cifs_read_page_from_socket(
> > +					server, page, page_offset, n);
> >   		if (result < 0)
> >   			break;
> >
> > @@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len,
> struct cifsFileInfo *open_file,
> >   		rdata->bytes = cur_len;
> >   		rdata->pid = pid;
> >   		rdata->pagesz = PAGE_SIZE;
> > +		rdata->tailsz = PAGE_SIZE;
> >   		rdata->read_into_pages = cifs_uncached_read_into_pages;
> >   		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> >   		rdata->credits = credits;
> > @@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info
> *server,
> >   	u64 eof;
> >   	pgoff_t eof_index;
> >   	unsigned int nr_pages = rdata->nr_pages;
> > +	unsigned int page_offset = rdata->page_offset;
> >
> >   	/* determine the eof that the server (probably) has */
> >   	eof = CIFS_I(rdata->mapping->host)->server_eof;
> > @@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info
> *server,
> >   	rdata->tailsz = PAGE_SIZE;
> >   	for (i = 0; i < nr_pages; i++) {
> >   		struct page *page = rdata->pages[i];
> > -		size_t n = PAGE_SIZE;
> > +		unsigned int to_read = rdata->pagesz;
> > +		size_t n;
> > +
> > +		if (i == 0)
> > +			to_read -= page_offset;
> > +		else
> > +			page_offset = 0;
> > +
> > +		n = to_read;
> >
> > -		if (len >= PAGE_SIZE) {
> > -			len -= PAGE_SIZE;
> > +		if (len >= to_read) {
> > +			len -= to_read;
> >   		} else if (len > 0) {
> >   			/* enough for partial page, fill and zero the rest */
> > -			zero_user(page, len, PAGE_SIZE - len);
> > +			zero_user(page, len + page_offset, to_read - len);
> >   			n = rdata->tailsz = len;
> >   			len = 0;
> >   		} else if (page->index > eof_index) { @@ -3622,13 +3641,15
> @@
> > readpages_fill_pages(struct TCP_Server_Info *server,
> >   		}
> >
> >   		if (iter)
> > -			result = copy_page_from_iter(page, 0, n, iter);
> > +			result = copy_page_from_iter(
> > +					page, page_offset, n, iter);
> >   #ifdef CONFIG_CIFS_SMB_DIRECT
> >   		else if (rdata->mr)
> >   			result = n;
> >   #endif
> 
> Ditto previous comment/question.
> 
> >   		else
> > -			result = cifs_read_page_from_socket(server, page,
> n);
> > +			result = cifs_read_page_from_socket(
> > +					server, page, page_offset, n);
> >   		if (result < 0)
> >   			break;
> >
> > @@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct
> address_space *mapping,
> >   		rdata->bytes = bytes;
> >   		rdata->pid = pid;
> >   		rdata->pagesz = PAGE_SIZE;
> > +		rdata->tailsz = PAGE_SIZE;
> >   		rdata->read_into_pages = cifs_readpages_read_into_pages;
> >   		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
> >   		rdata->credits = credits;
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index
> > 7c0edd2..1fa1c29 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info
> *server, struct page **pages,
> >   			zero_user(page, len, PAGE_SIZE - len);
> >   			len = 0;
> >   		}
> > -		length = cifs_read_page_from_socket(server, page, n);
> > +		length = cifs_read_page_from_socket(server, page, 0, n);
> >   		if (length < 0)
> >   			return length;
> >   		server->total_read += length;
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index
> > a02f6b6..f603fbe 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
> >   	struct smb_rqst rqst = { .rq_iov = rdata->iov,
> >   				 .rq_nvec = 2,
> >   				 .rq_pages = rdata->pages,
> > +				 .rq_offset = rdata->page_offset,
> >   				 .rq_npages = rdata->nr_pages,
> >   				 .rq_pagesz = rdata->pagesz,
> >   				 .rq_tailsz = rdata->tailsz };
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 04/15] CIFS: Add support for direct pages in wdata
  2018-06-24  2:01   ` Tom Talpey
@ 2018-06-25 20:34     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 20:34 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 04/15] CIFS: Add support for direct pages in wdata
> 
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Add a function to allocate wdata without allocating pages for data
> > transfer. This gives the caller an option to pass a number of pages
> > that point to the data buffer to write to.
> >
> > wdata is reponsible for free those pages after it's done.
> 
> Same comment as for the earlier patch. "Caller" is responsible for "freeing"
> those pages? Confusing, as worded.
> 
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsglob.h  |  2 +-
> >   fs/cifs/cifsproto.h |  2 ++
> >   fs/cifs/cifssmb.c   | 17 ++++++++++++++---
> >   3 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> > 56864a87..7f62c98 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1205,7 +1205,7 @@ struct cifs_writedata {
> >   	unsigned int			tailsz;
> >   	unsigned int			credits;
> >   	unsigned int			nr_pages;
> > -	struct page			*pages[];
> > +	struct page			**pages;
> 
> Also same comment as for earlier patch, these are syntactically equivalent
> and maybe not needed to change.
> 
> >   };
> >
> >   /*
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > 1f27d8e..7933c5f 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
> >   void cifs_writev_complete(struct work_struct *work);
> >   struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
> >   						work_func_t complete);
> > +struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
> > +						work_func_t complete);
> >   void cifs_writedata_release(struct kref *refcount);
> >   int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
> >   			  struct cifs_sb_info *cifs_sb,
> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index
> > c8e4278..5aca336 100644
> > --- a/fs/cifs/cifssmb.c
> > +++ b/fs/cifs/cifssmb.c
> > @@ -1952,6 +1952,7 @@ cifs_writedata_release(struct kref *refcount)
> >   	if (wdata->cfile)
> >   		cifsFileInfo_put(wdata->cfile);
> >
> > +	kvfree(wdata->pages);
> >   	kfree(wdata);
> >   }
> >
> > @@ -2075,12 +2076,22 @@ cifs_writev_complete(struct work_struct
> *work)
> >   struct cifs_writedata *
> >   cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
> >   {
> > +	struct page **pages =
> > +		kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
> 
> Why do you do a GFP_NOFS here but GFP_KERNEL in the earlier patch?

This is for the I/O write path. The earlier patch is for I/O read path. This code preserves the behavior from the buffer I/O code.

The write path is possibly used when memory is under pressure, so it's better not to call FS (and CIFS) that may result in a loop.

> 
> > +	if (pages)
> > +		return cifs_writedata_direct_alloc(pages, complete);
> > +
> > +	return NULL;
> > +}
> > +
> > +struct cifs_writedata *
> > +cifs_writedata_direct_alloc(struct page **pages, work_func_t
> > +complete) {
> >   	struct cifs_writedata *wdata;
> >
> > -	/* writedata + number of page pointers */
> > -	wdata = kzalloc(sizeof(*wdata) +
> > -			sizeof(struct page *) * nr_pages, GFP_NOFS);
> > +	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> >   	if (wdata != NULL) {
> > +		wdata->pages = pages;
> >   		kref_init(&wdata->refcount);
> >   		INIT_LIST_HEAD(&wdata->list);
> >   		init_completion(&wdata->done);
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&amp;data=02%7C01%7Clongli%40microsoft.com%7C20e0bcbdcce3
> 402133ee08d5d9767977%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C636654025229749549&amp;sdata=p4P9AqS8cnOHErCFc1XALeaZxHuB7%2B
> VOF10SjaaycAI%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-06-24  1:50   ` Tom Talpey
  2018-06-25 20:25     ` Long Li
@ 2018-06-25 21:01     ` Jason Gunthorpe
  2018-06-26 15:13       ` Tom Talpey
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2018-06-25 21:01 UTC (permalink / raw)
  To: Tom Talpey
  Cc: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
> On 5/30/2018 3:47 PM, Long Li wrote:
> >From: Long Li <longli@microsoft.com>
> >
> >Add a function to allocate rdata without allocating pages for data
> >transfer. This gives the caller an option to pass a number of pages
> >that point to the data buffer.
> >
> >rdata is still reponsible for free those pages after it's done.
> 
> "Caller" is still responsible? Or is the rdata somehow freeing itself
> via another mechanism?
> 
> >
> >Signed-off-by: Long Li <longli@microsoft.com>
> >  fs/cifs/cifsglob.h |  2 +-
> >  fs/cifs/file.c     | 23 ++++++++++++++++++++---
> >  2 files changed, 21 insertions(+), 4 deletions(-)
> >
> >diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> >index 8d16c3e..56864a87 100644
> >+++ b/fs/cifs/cifsglob.h
> >@@ -1179,7 +1179,7 @@ struct cifs_readdata {
> >  	unsigned int			tailsz;
> >  	unsigned int			credits;
> >  	unsigned int			nr_pages;
> >-	struct page			*pages[];
> >+	struct page			**pages;
> 
> Technically speaking, these are syntactically equivalent. It may not
> be worth changing this historic definition.

[] is a C99 'flex array', it has a different allocation behavior than
** and is not interchangeable..

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size
  2018-06-24  2:07   ` Tom Talpey
@ 2018-06-25 21:07     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 21:07 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 05/15] CIFS: Calculate the correct request length based
> on page offset and tail size
> 
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > It's possible that the page offset is non-zero in the pages in a
> > request, change the function to calculate the correct data buffer length.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/transport.c | 20 +++++++++++++++++---
> >   1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index
> > 927226a..d6b5523 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
> >   	for (i = 0; i < rqst->rq_nvec; i++)
> >   		buflen += iov[i].iov_len;
> >
> > -	/* add in the page array if there is one */
> > +	/*
> > +	 * Add in the page array if there is one. The caller needs to make
> > +	 * sure rq_offset and rq_tailsz are set correctly. If a buffer of
> > +	 * multiple pages ends at page boundary, rq_tailsz needs to be set to
> > +	 * PAGE_SIZE.
> > +	 */
> >   	if (rqst->rq_npages) {
> > -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > -		buflen += rqst->rq_tailsz;
> > +		if (rqst->rq_npages == 1)
> > +			buflen += rqst->rq_tailsz;
> > +		else {
> > +			/*
> > +			 * If there is more than one page, calculate the
> > +			 * buffer length based on rq_offset and rq_tailsz
> > +			 */
> > +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > +					rqst->rq_offset;
> > +			buflen += rqst->rq_tailsz;
> > +		}
> 
> Wouldn't it be simpler to keep the original code, but then just subtract the
> rq_offset?
> 
> 	buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> 	buflen += rqst->rq_tailsz;
> 	buflen -= rqst->rq_offset;
> 
> It's kind of confusing as written. Also, what if it's just one page, but has a
> non-zero offset? Is that somehow not possible? My suggested code would
> take that into account, yours doesn't.

I think It can be changed to this:

	buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
 	buflen += rqst->rq_tailsz;
	if (rqst->rq_npages > 1)
 		buflen -= rqst->rq_offset;

This looks cleaner than before.

When there is only one page with a non-zero offset, rq_tailsz is the actual data length.

> 
> Tom.
> 
> >   	}
> >
> >   	return buflen;
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-06-24  2:09   ` Tom Talpey
@ 2018-06-25 21:14     ` Long Li
  2018-06-26 13:16       ` Tom Talpey
  0 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-06-25 21:14 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
> offset and length in smb_rqst
> 
> On 5/30/2018 3:47 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Introduce a function rqst_page_get_length to return the page offset
> > and length for a given page in smb_rqst. This function is to be used
> > by following patches.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsproto.h |  3 +++
> >   fs/cifs/misc.c      | 17 +++++++++++++++++
> >   2 files changed, 20 insertions(+)
> >
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> > 7933c5f..89dda14 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
> crypto_shash **shash,
> >   		    struct sdesc **sdesc);
> >   void cifs_free_hash(struct crypto_shash **shash, struct sdesc
> > **sdesc);
> >
> > +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int
> page,
> > +				unsigned int *len, unsigned int *offset);
> > +
> >   #endif			/* _CIFSPROTO_H */
> > diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
> > 100644
> > --- a/fs/cifs/misc.c
> > +++ b/fs/cifs/misc.c
> > @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
> struct sdesc **sdesc)
> >   		crypto_free_shash(*shash);
> >   	*shash = NULL;
> >   }
> > +
> > +/**
> > + * rqst_page_get_length - obtain the length and offset for a page in
> > +smb_rqst
> > + * Input: rqst - a smb_rqst, page - a page index for rqst
> > + * Output: *len - the length for this page, *offset - the offset for
> > +this page  */ void rqst_page_get_length(struct smb_rqst *rqst,
> > +unsigned int page,
> > +				unsigned int *len, unsigned int *offset) {
> > +	*len = rqst->rq_pagesz;
> > +	*offset = (page == 0) ? rqst->rq_offset : 0;
> 
> Really? Page 0 always has a zero offset??

I think you are misreading this line. The offset for page 0 is rq_offset, the offset for all subsequent pages are 0.

> 
> > +
> > +	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> > +		*len = rqst->rq_tailsz;
> > +	else if (page == 0)
> > +		*len = rqst->rq_pagesz - rqst->rq_offset; }
> >
> 
> This subroutine does what patch 5 does inline. Why not push this patch up in
> the sequence and use the helper?

This is actually a little different. This function returns the length and offset for a given page in the request. There might be multiple pages in a request. 

The other function calculates the total length of all the pages in a request.

> 
> Tom.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send
  2018-06-24  2:11   ` Tom Talpey
@ 2018-06-25 21:23     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 21:23 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send
> 
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > The RDMA send function needs to look at offset in the request pages,
> > and send data starting from there.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > c62f7c9..6141e3c 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -17,6 +17,7 @@
> >   #include <linux/highmem.h>
> >   #include "smbdirect.h"
> >   #include "cifs_debug.h"
> > +#include "cifsproto.h"
> >
> >   static struct smbd_response *get_empty_queue_buffer(
> >   		struct smbd_connection *info);
> > @@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> >   	struct kvec vec;
> >   	int nvecs;
> >   	int size;
> > -	int buflen = 0, remaining_data_length;
> > +	unsigned int buflen = 0, remaining_data_length;
> >   	int start, i, j;
> >   	int max_iov_size =
> >   		info->max_send_size - sizeof(struct smbd_data_transfer);
> @@
> > -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> >   		buflen += iov[i].iov_len;
> >   	}
> >
> > -	/* add in the page array if there is one */
> > +	/*
> > +	 * Add in the page array if there is one. The caller needs to set
> > +	 * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
> > +	 * ends at page boundary
> > +	 */
> >   	if (rqst->rq_npages) {
> > -		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > -		buflen += rqst->rq_tailsz;
> > +		if (rqst->rq_npages == 1)
> > +			buflen += rqst->rq_tailsz;
> > +		else
> > +			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > +					rqst->rq_offset + rqst->rq_tailsz;
> >   	}
> 
> This code is really confusing and redundant. It tests npages > 0, then tests
> npages == 1, then does an else. Why not call the helper like the following
> smbd_send()?

This code needs to get the combined length of all the pages in the request (but excluding iov, this is different to the function rqst_len in patch 05), but the following is for getting a single page from the request. 

I can simplify the code a little bit by following your suggestion in patch 05.

> 
> Tom.
> 
> >
> >   	if (buflen + sizeof(struct smbd_data_transfer) > @@ -2213,8 +2221,9
> > @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
> >
> >   	/* now sending pages if there are any */
> >   	for (i = 0; i < rqst->rq_npages; i++) {
> > -		buflen = (i == rqst->rq_npages-1) ?
> > -			rqst->rq_tailsz : rqst->rq_pagesz;
> > +		unsigned int offset;
> > +
> > +		rqst_page_get_length(rqst, i, &buflen, &offset);
> >   		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> >   		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> >   			buflen, nvecs);
> > @@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> >   			remaining_data_length -= size;
> >   			log_write(INFO, "sending pages i=%d offset=%d
> size=%d"
> >   				" remaining_data_length=%d\n",
> > -				i, j*max_iov_size, size,
> remaining_data_length);
> > +				i, j*max_iov_size+offset, size,
> > +				remaining_data_length);
> >   			rc = smbd_post_send_page(
> > -				info, rqst->rq_pages[i], j*max_iov_size,
> > +				info, rqst->rq_pages[i],
> > +				j*max_iov_size + offset,
> >   				size, remaining_data_length);
> >   			if (rc)
> >   				goto done;
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv
  2018-06-24  2:16   ` Tom Talpey
@ 2018-06-25 21:29     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-25 21:29 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv
> 
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > RDMA recv function needs to place data to the correct place starting
> > at page offset.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/smbdirect.c | 18 +++++++++++-------
> >   1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > 6141e3c..ba53c52 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct
> smbd_connection *info, char *buf,
> >    * return value: actual data read
> >    */
> >   static int smbd_recv_page(struct smbd_connection *info,
> > -		struct page *page, unsigned int to_read)
> > +		struct page *page, unsigned int page_offset,
> > +		unsigned int to_read)
> >   {
> >   	int ret;
> >   	char *to_address;
> > +	void *page_address;
> >
> >   	/* make sure we have the page ready for read */
> >   	ret = wait_event_interruptible(
> > @@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct
> smbd_connection *info,
> >   		info->reassembly_data_length >= to_read ||
> >   			info->transport_status != SMBD_CONNECTED);
> >   	if (ret)
> > -		return 0;
> > +		return ret;
> >
> >   	/* now we can read from reassembly queue and not sleep */
> > -	to_address = kmap_atomic(page);
> > +	page_address = kmap_atomic(page);
> > +	to_address = (char *) page_address + page_offset;
> >
> >   	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
> >   		page, to_address, to_read);
> >
> >   	ret = smbd_recv_buf(info, to_address, to_read);
> > -	kunmap_atomic(to_address);
> > +	kunmap_atomic(page_address);
> 
> Is "page" truly not mapped? This kmap/kunmap for each received 4KB is very
> expensive. Is there not a way to keep a kva for the reassembly queue
> segments?

I will find a way to not map those upper layer pages when doing RDMA receive. The reason for doing this is to use a common layer (that will greatly simplifies the code) for transport. It's probably a waste of time when only pages are involved, and not need to be mapped.

I will address those in a separate patch.

> 
> >
> >   	return ret;
> >   }
> > @@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info,
> struct msghdr *msg)
> >   {
> >   	char *buf;
> >   	struct page *page;
> > -	unsigned int to_read;
> > +	unsigned int to_read, page_offset;
> >   	int rc;
> >
> >   	info->smbd_recv_pending++;
> > @@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info,
> > struct msghdr *msg)
> >
> >   	case READ | ITER_BVEC:
> >   		page = msg->msg_iter.bvec->bv_page;
> > +		page_offset = msg->msg_iter.bvec->bv_offset;
> >   		to_read = msg->msg_iter.bvec->bv_len;
> > -		rc = smbd_recv_page(info, page, to_read);
> > +		rc = smbd_recv_page(info, page, page_offset, to_read);
> >   		break;
> >
> >   	default:
> >   		/* It's a bug in upper layer to get there */
> >   		cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
> >   			msg->msg_iter.type);
> > -		rc = -EIO;
> > +		rc = -EINVAL;
> >   	}
> >
> >   	info->smbd_recv_pending--;
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 11/15] CIFS: Pass page offset for calculating signature
  2018-06-24  2:27   ` Tom Talpey
@ 2018-06-26  4:15     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-26  4:15 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 11/15] CIFS: Pass page offset for calculating signature
> 
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > When calculating signature for the packet, it needs to read into the
> > correct page offset for the data.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsencrypt.c | 9 +++++----
> >   1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index
> > a6ef088..e88303c 100644
> > --- a/fs/cifs/cifsencrypt.c
> > +++ b/fs/cifs/cifsencrypt.c
> > @@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
> >
> >   	/* now hash over the rq_pages array */
> >   	for (i = 0; i < rqst->rq_npages; i++) {
> > -		void *kaddr = kmap(rqst->rq_pages[i]);
> > -		size_t len = rqst->rq_pagesz;
> > +		void *kaddr;
> > +		unsigned int len, offset;
> >
> > -		if (i == rqst->rq_npages - 1)
> > -			len = rqst->rq_tailsz;
> > +		rqst_page_get_length(rqst, i, &len, &offset);
> > +
> > +		kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
> 
> I suppose it's more robust to map a page at a time, but it's pretty expensive.
> Is this the only way to iterate over a potentially very large block of data? For
> example, a 1MB segment means 256 kmap/kunmaps.

I will look into not mapping those pages while doing I/O.

This code path is for RDMA send/receive, and it's rarely used for transferring large amount of data.

> 
> Tom.
> 
> >
> >   		crypto_shash_update(shash, kaddr, len);
> >
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 13/15] CIFS: Add support for direct I/O read
  2018-06-24  2:39   ` Tom Talpey
@ 2018-06-26  4:34     ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-26  4:34 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 13/15] CIFS: Add support for direct I/O read
> 
> 
> 
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Implement the function for direct I/O read. It doesn't support AIO,
> > which will be implemented in a follow up patch.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsfs.h |   1 +
> >   fs/cifs/file.c   | 149
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 150 insertions(+)
> >
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> > 5f02318..7fba9aa 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file
> *file);
> >   extern int cifs_close(struct inode *inode, struct file *file);
> >   extern int cifs_closedir(struct inode *inode, struct file *file);
> >   extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter
> > *to);
> > +extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter
> > +*to);
> >   extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> >   extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
> >   extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct
> > iov_iter *from); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index
> > 87eece6..e6e6f24 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2955,6 +2955,18 @@ cifs_read_allocate_pages(struct cifs_readdata
> *rdata, unsigned int nr_pages)
> >   	return rc;
> >   }
> >
> > +static void cifs_direct_readdata_release(struct kref *refcount) {
> > +	struct cifs_readdata *rdata = container_of(refcount,
> > +					struct cifs_readdata, refcount);
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < rdata->nr_pages; i++)
> > +		put_page(rdata->pages[i]);
> > +
> > +	cifs_readdata_release(refcount);
> > +}
> > +
> >   static void
> >   cifs_uncached_readdata_release(struct kref *refcount)
> >   {
> > @@ -3267,6 +3279,143 @@ collect_uncached_read_data(struct
> cifs_aio_ctx *ctx)
> >   		complete(&ctx->done);
> >   }
> >
> > +static void cifs_direct_readv_complete(struct work_struct *work) {
> > +	struct cifs_readdata *rdata =
> > +		container_of(work, struct cifs_readdata, work);
> > +
> > +	complete(&rdata->done);
> > +	kref_put(&rdata->refcount, cifs_direct_readdata_release); }
> > +
> > +ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to) {
> > +	size_t len, cur_len, start;
> > +	unsigned int npages, rsize, credits;
> > +	struct file *file;
> > +	struct cifs_sb_info *cifs_sb;
> > +	struct cifsFileInfo *cfile;
> > +	struct cifs_tcon *tcon;
> > +	struct page **pagevec;
> > +	ssize_t rc, total_read = 0;
> > +	struct TCP_Server_Info *server;
> > +	loff_t offset = iocb->ki_pos;
> > +	pid_t pid;
> > +	struct cifs_readdata *rdata;
> > +
> > +	/*
> > +	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
> > +	 * fall back to data copy read path
> > +	 */
> > +	if (to->type & ITER_KVEC) {
> > +		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec
> I/O\n");
> > +		return cifs_user_readv(iocb, to);
> > +	}
> > +
> > +	len = iov_iter_count(to);
> > +	if (!len)
> > +		return 0;
> > +
> > +	file = iocb->ki_filp;
> > +	cifs_sb = CIFS_FILE_SB(file);
> > +	cfile = file->private_data;
> > +	tcon = tlink_tcon(cfile->tlink);
> > +	server = tcon->ses->server;
> > +
> > +	if (!server->ops->async_readv)
> > +		return -ENOSYS;
> > +
> > +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> > +		pid = cfile->pid;
> > +	else
> > +		pid = current->tgid;
> > +
> > +	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
> > +		cifs_dbg(FYI, "attempting read on write only file instance\n");
> 
> Confusing. Maybe "attempting read on write-only filehandle"?
> 
> > +
> > +	do {
> > +		rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
> > +					&rsize, &credits);
> > +		if (rc)
> > +			break;
> > +
> > +		cur_len = min_t(const size_t, len, rsize);
> > +
> > +		rc = iov_iter_get_pages_alloc(to, &pagevec, cur_len, &start);
> > +		if (rc < 0) {
> > +			cifs_dbg(VFS,
> > +				"couldn't get user pages (rc=%zd) iter
> type %d"
> > +				" iov_offset %lu count %lu\n",
> > +				rc, to->type, to->iov_offset, to->count);
> > +			dump_stack();
> > +			break;
> > +		}
> > +
> > +		rdata = cifs_readdata_direct_alloc(
> > +				pagevec, cifs_direct_readv_complete);
> > +		if (!rdata) {
> > +			add_credits_and_wake_if(server, credits, 0);
> > +			rc = -ENOMEM;
> > +			break;
> > +		}
> > +
> > +		npages = (rc + start + PAGE_SIZE-1) / PAGE_SIZE;
> > +		rdata->nr_pages = npages;
> > +		rdata->page_offset = start;
> > +		rdata->pagesz = PAGE_SIZE;
> > +		rdata->tailsz = npages > 1 ?
> > +				rc-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
> > +				rc;
> 
> This expression makes my head hurt. Surely it can be simplified, or expressed
> in a clearer way.
> 
> > +		cur_len = rc;
> > +
> > +		rdata->cfile = cifsFileInfo_get(cfile);
> > +		rdata->offset = offset;
> > +		rdata->bytes = rc;
> > +		rdata->pid = pid;
> > +		rdata->read_into_pages = cifs_uncached_read_into_pages;
> > +		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
> > +		rdata->credits = credits;
> > +
> > +		rc = 0;
> > +		if (rdata->cfile->invalidHandle)
> > +			rc = cifs_reopen_file(rdata->cfile, true);
> > +
> > +		if (!rc)
> > +			rc = server->ops->async_readv(rdata);
> > +
> > +		if (rc) {
> 
> This whole rc thing is messy. Initializing to zero, setting only in one case, then
> testing the result, then setting it again, is twisted.
> I actually think a goto or two would read much more clearly.
> 
> > +			add_credits_and_wake_if(server, rdata->credits, 0);
> > +			kref_put(&rdata->refcount,
> > +				 cifs_direct_readdata_release);
> > +			if (rc == -EAGAIN)
> > +				continue;
> > +			break;
> 
> It's worth a comment here that this either breaks or continues the entire do {}
> while (); and btw when it breaks it does *not* return "rc".
> Again, maybe a goto instead of a break?
> 
> > +		}
> > +
> > +		wait_for_completion(&rdata->done);
> > +		rc = rdata->result;
> > +		if (rc) {
> > +			kref_put(
> > +				&rdata->refcount,
> > +				cifs_direct_readdata_release);
> > +			if (rc == -EAGAIN)
> > +				continue;
> > +			break;
> 
> Ditto.

I will re-work this patch.

> 
> > +		}
> > +
> > +		total_read += rdata->got_bytes;
> > +		kref_put(&rdata->refcount, cifs_direct_readdata_release);
> > +
> > +		iov_iter_advance(to, cur_len);
> > +		len -= cur_len;
> > +		offset += cur_len;
> > +	} while (len);
> > +
> > +	iocb->ki_pos += total_read;
> > +
> > +	return total_read;
> > +}
> > +
> >   ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> >   {
> >   	struct file *file = iocb->ki_filp;
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 14/15] CIFS: Add support for direct I/O write
  2018-06-24  2:48   ` Tom Talpey
@ 2018-06-26  4:39     ` Long Li
  2018-06-26 13:29       ` Tom Talpey
  0 siblings, 1 reply; 50+ messages in thread
From: Long Li @ 2018-06-26  4:39 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
> 
> On 5/30/2018 3:48 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Implement the function for direct I/O write. It doesn't support AIO,
> > which will be implemented in a follow up patch.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   fs/cifs/cifsfs.h |   1 +
> >   fs/cifs/file.c   | 165
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 166 insertions(+)
> >
> > diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> > 7fba9aa..e9c5103 100644
> > --- a/fs/cifs/cifsfs.h
> > +++ b/fs/cifs/cifsfs.h
> > @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb,
> struct iov_iter *to);
> >   extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> >   extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> >   extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter
> > *from);
> > +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
> > +*from);
> >   extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
> >   extern int cifs_lock(struct file *, int, struct file_lock *);
> >   extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
> > --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
> > *refcount)
> >
> >   static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
> >
> > +static void cifs_direct_writedata_release(struct kref *refcount) {
> > +	int i;
> > +	struct cifs_writedata *wdata = container_of(refcount,
> > +					struct cifs_writedata, refcount);
> > +
> > +	for (i = 0; i < wdata->nr_pages; i++)
> > +		put_page(wdata->pages[i]);
> > +
> > +	cifs_writedata_release(refcount);
> > +}
> > +
> > +static void cifs_direct_writev_complete(struct work_struct *work) {
> > +	struct cifs_writedata *wdata = container_of(work,
> > +					struct cifs_writedata, work);
> > +	struct inode *inode = d_inode(wdata->cfile->dentry);
> > +	struct cifsInodeInfo *cifsi = CIFS_I(inode);
> > +
> > +	spin_lock(&inode->i_lock);
> > +	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> > +	if (cifsi->server_eof > inode->i_size)
> > +		i_size_write(inode, cifsi->server_eof);
> > +	spin_unlock(&inode->i_lock);
> > +
> > +	complete(&wdata->done);
> > +	kref_put(&wdata->refcount, cifs_direct_writedata_release); }
> > +
> >   static void
> >   cifs_uncached_writev_complete(struct work_struct *work)
> >   {
> > @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct
> cifs_aio_ctx *ctx)
> >   		complete(&ctx->done);
> >   }
> >
> > +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
> > +{
> > +	struct file *file = iocb->ki_filp;
> > +	ssize_t total_written = 0;
> > +	struct cifsFileInfo *cfile;
> > +	struct cifs_tcon *tcon;
> > +	struct cifs_sb_info *cifs_sb;
> > +	struct TCP_Server_Info *server;
> > +	pid_t pid;
> > +	unsigned long nr_pages;
> > +	loff_t offset = iocb->ki_pos;
> > +	size_t len = iov_iter_count(from);
> > +	int rc;
> > +	struct cifs_writedata *wdata;
> > +
> > +	/*
> > +	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> > +	 * In this case, fall back to non-direct write function.
> > +	 */
> > +	if (from->type & ITER_KVEC) {
> > +		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
> I/O\n");
> > +		return cifs_user_writev(iocb, from);
> > +	}
> > +
> > +	rc = generic_write_checks(iocb, from);
> > +	if (rc <= 0)
> > +		return rc;
> > +
> > +	cifs_sb = CIFS_FILE_SB(file);
> > +	cfile = file->private_data;
> > +	tcon = tlink_tcon(cfile->tlink);
> > +	server = tcon->ses->server;
> > +
> > +	if (!server->ops->async_writev)
> > +		return -ENOSYS;
> > +
> > +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> > +		pid = cfile->pid;
> > +	else
> > +		pid = current->tgid;
> > +
> > +	do {
> > +		unsigned int wsize, credits;
> > +		struct page **pagevec;
> > +		size_t start;
> > +		ssize_t cur_len;
> > +
> > +		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> > +						   &wsize, &credits);
> > +		if (rc)
> > +			break;
> > +
> > +		cur_len = iov_iter_get_pages_alloc(
> > +				from, &pagevec, wsize, &start);
> > +		if (cur_len < 0) {
> > +			cifs_dbg(VFS,
> > +				"direct_writev couldn't get user pages "
> > +				"(rc=%zd) iter type %d iov_offset %lu count"
> > +				" %lu\n",
> > +				cur_len, from->type,
> > +				from->iov_offset, from->count);
> > +			dump_stack();
> > +			break;
> > +		}
> > +		if (cur_len < 0)
> > +			break;
> 
> This cur_len < 0 test is redundant with the prior if(), delete.
> > +
> > +		nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
> 
> Am I misreading, or will this return be one more page than needed? If start
> (the first byte offset) is > 0, nr_pages will already be one.
> And if cur_len is 4KB, even if start is 0, nr_pages will be two.

I think the calculation is correct, assuming cur_len > 0. (which should be the case if we reach here)

If cur_len is 4kb and start is 0, nr_pages will be 1.

> 
> > +
> > +		wdata = cifs_writedata_direct_alloc(pagevec,
> > +					     cifs_direct_writev_complete);
> > +		if (!wdata) {
> > +			rc = -ENOMEM;
> > +			add_credits_and_wake_if(server, credits, 0);
> > +			break;
> > +		}
> > +
> > +		wdata->nr_pages = nr_pages;
> > +		wdata->page_offset = start;
> > +		wdata->pagesz = PAGE_SIZE;
> > +		wdata->tailsz =
> > +			nr_pages > 1 ?
> > +			cur_len - (PAGE_SIZE - start) -
> > +				(nr_pages - 2) * PAGE_SIZE :
> > +			cur_len;
> > +
> > +		wdata->sync_mode = WB_SYNC_ALL;
> > +		wdata->offset = (__u64)offset;
> > +		wdata->cfile = cifsFileInfo_get(cfile);
> > +		wdata->pid = pid;
> > +		wdata->bytes = cur_len;
> > +		wdata->credits = credits;
> > +
> > +		rc = 0;
> > +		if (wdata->cfile->invalidHandle)
> > +			rc = cifs_reopen_file(wdata->cfile, false);
> > +
> > +		if (!rc)
> > +			rc = server->ops->async_writev(wdata,
> > +					cifs_direct_writedata_release);
> > +
> > +		if (rc) {
> > +			add_credits_and_wake_if(server, wdata->credits, 0);
> > +			kref_put(&wdata->refcount,
> > +				 cifs_direct_writedata_release);
> > +			if (rc == -EAGAIN)
> > +				continue;
> > +			break;
> > +		}
> 
> Same comments as for previous patch re the if (rc) ladder, and the
> break/continues both being better expressed as careful goto's.
> 
> > +
> > +		wait_for_completion(&wdata->done);
> > +		if (wdata->result) {
> > +			rc = wdata->result;
> > +			kref_put(&wdata->refcount,
> > +					cifs_direct_writedata_release);
> > +			if (rc == -EAGAIN)
> > +				continue;
> > +			break;
> > +		}
> > +
> > +		kref_put(&wdata->refcount, cifs_direct_writedata_release);
> > +
> > +		iov_iter_advance(from, cur_len);
> > +		total_written += cur_len;
> > +		offset += cur_len;
> > +		len -= cur_len;
> > +	} while (len);
> > +
> > +	if (unlikely(!total_written))
> > +		return rc;
> > +
> > +	iocb->ki_pos += total_written;
> > +	return total_written;
> > +
> > +}
> > +
> >   ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> >   {
> >   	struct file *file = iocb->ki_filp;
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-06-25 21:14     ` Long Li
@ 2018-06-26 13:16       ` Tom Talpey
  2018-06-27  3:24         ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-26 13:16 UTC (permalink / raw)
  To: Long Li, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 6/25/2018 5:14 PM, Long Li wrote:
>> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
>> offset and length in smb_rqst
>>
>> On 5/30/2018 3:47 PM, Long Li wrote:
>>> From: Long Li <longli@microsoft.com>
>>>
>>> Introduce a function rqst_page_get_length to return the page offset
>>> and length for a given page in smb_rqst. This function is to be used
>>> by following patches.
>>>
>>> Signed-off-by: Long Li <longli@microsoft.com>
>>> ---
>>>    fs/cifs/cifsproto.h |  3 +++
>>>    fs/cifs/misc.c      | 17 +++++++++++++++++
>>>    2 files changed, 20 insertions(+)
>>>
>>> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
>>> 7933c5f..89dda14 100644
>>> --- a/fs/cifs/cifsproto.h
>>> +++ b/fs/cifs/cifsproto.h
>>> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
>> crypto_shash **shash,
>>>    		    struct sdesc **sdesc);
>>>    void cifs_free_hash(struct crypto_shash **shash, struct sdesc
>>> **sdesc);
>>>
>>> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int
>> page,
>>> +				unsigned int *len, unsigned int *offset);
>>> +
>>>    #endif			/* _CIFSPROTO_H */
>>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
>>> 100644
>>> --- a/fs/cifs/misc.c
>>> +++ b/fs/cifs/misc.c
>>> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
>> struct sdesc **sdesc)
>>>    		crypto_free_shash(*shash);
>>>    	*shash = NULL;
>>>    }
>>> +
>>> +/**
>>> + * rqst_page_get_length - obtain the length and offset for a page in
>>> +smb_rqst
>>> + * Input: rqst - a smb_rqst, page - a page index for rqst
>>> + * Output: *len - the length for this page, *offset - the offset for
>>> +this page  */ void rqst_page_get_length(struct smb_rqst *rqst,
>>> +unsigned int page,
>>> +				unsigned int *len, unsigned int *offset) {
>>> +	*len = rqst->rq_pagesz;
>>> +	*offset = (page == 0) ? rqst->rq_offset : 0;
>>
>> Really? Page 0 always has a zero offset??
> 
> I think you are misreading this line. The offset for page 0 is rq_offset, the offset for all subsequent pages are 0.

Ah, yes, sorry I did read it incorrectly.
>>> +
>>> +	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
>>> +		*len = rqst->rq_tailsz;
>>> +	else if (page == 0)
>>> +		*len = rqst->rq_pagesz - rqst->rq_offset; }
>>>
>>
>> This subroutine does what patch 5 does inline. Why not push this patch up in
>> the sequence and use the helper?
> 
> This is actually a little different. This function returns the length and offset for a given page in the request. There might be multiple pages in a request.

Ok, but I still think there is quite a bit of inline computation of
this stuff that would more clearly and more robustly be done in a set
of common functions. If someone ever touches the code to support a
new upper layer, or even integrate with more complex compounding,
things will get ugly fast.

> The other function calculates the total length of all the pages in a request.

Again, a great candidate for a common set of subroutines, IMO.

Tom.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
  2018-06-26  4:39     ` Long Li
@ 2018-06-26 13:29       ` Tom Talpey
  2018-06-27  3:44         ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-26 13:29 UTC (permalink / raw)
  To: Long Li, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 6/26/2018 12:39 AM, Long Li wrote:
>> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
>>
>> On 5/30/2018 3:48 PM, Long Li wrote:
>>> From: Long Li <longli@microsoft.com>
>>>
>>> Implement the function for direct I/O write. It doesn't support AIO,
>>> which will be implemented in a follow up patch.
>>>
>>> Signed-off-by: Long Li <longli@microsoft.com>
>>> ---
>>>    fs/cifs/cifsfs.h |   1 +
>>>    fs/cifs/file.c   | 165
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    2 files changed, 166 insertions(+)
>>>
>>> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
>>> 7fba9aa..e9c5103 100644
>>> --- a/fs/cifs/cifsfs.h
>>> +++ b/fs/cifs/cifsfs.h
>>> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb,
>> struct iov_iter *to);
>>>    extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
>>>    extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
>>>    extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter
>>> *from);
>>> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
>>> +*from);
>>>    extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
>>>    extern int cifs_lock(struct file *, int, struct file_lock *);
>>>    extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
>>> --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1 100644
>>> --- a/fs/cifs/file.c
>>> +++ b/fs/cifs/file.c
>>> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
>>> *refcount)
>>>
>>>    static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
>>>
>>> +static void cifs_direct_writedata_release(struct kref *refcount) {
>>> +	int i;
>>> +	struct cifs_writedata *wdata = container_of(refcount,
>>> +					struct cifs_writedata, refcount);
>>> +
>>> +	for (i = 0; i < wdata->nr_pages; i++)
>>> +		put_page(wdata->pages[i]);
>>> +
>>> +	cifs_writedata_release(refcount);
>>> +}
>>> +
>>> +static void cifs_direct_writev_complete(struct work_struct *work) {
>>> +	struct cifs_writedata *wdata = container_of(work,
>>> +					struct cifs_writedata, work);
>>> +	struct inode *inode = d_inode(wdata->cfile->dentry);
>>> +	struct cifsInodeInfo *cifsi = CIFS_I(inode);
>>> +
>>> +	spin_lock(&inode->i_lock);
>>> +	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
>>> +	if (cifsi->server_eof > inode->i_size)
>>> +		i_size_write(inode, cifsi->server_eof);
>>> +	spin_unlock(&inode->i_lock);
>>> +
>>> +	complete(&wdata->done);
>>> +	kref_put(&wdata->refcount, cifs_direct_writedata_release); }
>>> +
>>>    static void
>>>    cifs_uncached_writev_complete(struct work_struct *work)
>>>    {
>>> @@ -2703,6 +2732,142 @@ static void collect_uncached_write_data(struct
>> cifs_aio_ctx *ctx)
>>>    		complete(&ctx->done);
>>>    }
>>>
>>> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
>>> +{
>>> +	struct file *file = iocb->ki_filp;
>>> +	ssize_t total_written = 0;
>>> +	struct cifsFileInfo *cfile;
>>> +	struct cifs_tcon *tcon;
>>> +	struct cifs_sb_info *cifs_sb;
>>> +	struct TCP_Server_Info *server;
>>> +	pid_t pid;
>>> +	unsigned long nr_pages;
>>> +	loff_t offset = iocb->ki_pos;
>>> +	size_t len = iov_iter_count(from);
>>> +	int rc;
>>> +	struct cifs_writedata *wdata;
>>> +
>>> +	/*
>>> +	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
>>> +	 * In this case, fall back to non-direct write function.
>>> +	 */
>>> +	if (from->type & ITER_KVEC) {
>>> +		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
>> I/O\n");
>>> +		return cifs_user_writev(iocb, from);
>>> +	}
>>> +
>>> +	rc = generic_write_checks(iocb, from);
>>> +	if (rc <= 0)
>>> +		return rc;
>>> +
>>> +	cifs_sb = CIFS_FILE_SB(file);
>>> +	cfile = file->private_data;
>>> +	tcon = tlink_tcon(cfile->tlink);
>>> +	server = tcon->ses->server;
>>> +
>>> +	if (!server->ops->async_writev)
>>> +		return -ENOSYS;
>>> +
>>> +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
>>> +		pid = cfile->pid;
>>> +	else
>>> +		pid = current->tgid;
>>> +
>>> +	do {
>>> +		unsigned int wsize, credits;
>>> +		struct page **pagevec;
>>> +		size_t start;
>>> +		ssize_t cur_len;
>>> +
>>> +		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
>>> +						   &wsize, &credits);
>>> +		if (rc)
>>> +			break;
>>> +
>>> +		cur_len = iov_iter_get_pages_alloc(
>>> +				from, &pagevec, wsize, &start);
>>> +		if (cur_len < 0) {
>>> +			cifs_dbg(VFS,
>>> +				"direct_writev couldn't get user pages "
>>> +				"(rc=%zd) iter type %d iov_offset %lu count"
>>> +				" %lu\n",
>>> +				cur_len, from->type,
>>> +				from->iov_offset, from->count);
>>> +			dump_stack();
>>> +			break;
>>> +		}
>>> +		if (cur_len < 0)
>>> +			break;
>>
>> This cur_len < 0 test is redundant with the prior if(), delete.
>>> +
>>> +		nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
>>
>> Am I misreading, or will this return be one more page than needed? If start
>> (the first byte offset) is > 0, nr_pages will already be one.
>> And if cur_len is 4KB, even if start is 0, nr_pages will be two.
> 
> I think the calculation is correct, assuming cur_len > 0. (which should be the case if we reach here)

Err, cur_len could possibly be zero, the prior line only breaks the
loop if cur_len < 0.

> If cur_len is 4kb and start is 0, nr_pages will be 1.

Hmm, I guess. But again, it seems as if this page handling is all
being coded inline, in subtly different ways. I'm just concerned
about clarity and robustness. To me, if it's hard to review, it's
even harder to maintain.

Tom.

> 
>>
>>> +
>>> +		wdata = cifs_writedata_direct_alloc(pagevec,
>>> +					     cifs_direct_writev_complete);
>>> +		if (!wdata) {
>>> +			rc = -ENOMEM;
>>> +			add_credits_and_wake_if(server, credits, 0);
>>> +			break;
>>> +		}
>>> +
>>> +		wdata->nr_pages = nr_pages;
>>> +		wdata->page_offset = start;
>>> +		wdata->pagesz = PAGE_SIZE;
>>> +		wdata->tailsz =
>>> +			nr_pages > 1 ?
>>> +			cur_len - (PAGE_SIZE - start) -
>>> +				(nr_pages - 2) * PAGE_SIZE :
>>> +			cur_len;
>>> +
>>> +		wdata->sync_mode = WB_SYNC_ALL;
>>> +		wdata->offset = (__u64)offset;
>>> +		wdata->cfile = cifsFileInfo_get(cfile);
>>> +		wdata->pid = pid;
>>> +		wdata->bytes = cur_len;
>>> +		wdata->credits = credits;
>>> +
>>> +		rc = 0;
>>> +		if (wdata->cfile->invalidHandle)
>>> +			rc = cifs_reopen_file(wdata->cfile, false);
>>> +
>>> +		if (!rc)
>>> +			rc = server->ops->async_writev(wdata,
>>> +					cifs_direct_writedata_release);
>>> +
>>> +		if (rc) {
>>> +			add_credits_and_wake_if(server, wdata->credits, 0);
>>> +			kref_put(&wdata->refcount,
>>> +				 cifs_direct_writedata_release);
>>> +			if (rc == -EAGAIN)
>>> +				continue;
>>> +			break;
>>> +		}
>>
>> Same comments as for previous patch re the if (rc) ladder, and the
>> break/continues both being better expressed as careful goto's.
>>
>>> +
>>> +		wait_for_completion(&wdata->done);
>>> +		if (wdata->result) {
>>> +			rc = wdata->result;
>>> +			kref_put(&wdata->refcount,
>>> +					cifs_direct_writedata_release);
>>> +			if (rc == -EAGAIN)
>>> +				continue;
>>> +			break;
>>> +		}
>>> +
>>> +		kref_put(&wdata->refcount, cifs_direct_writedata_release);
>>> +
>>> +		iov_iter_advance(from, cur_len);
>>> +		total_written += cur_len;
>>> +		offset += cur_len;
>>> +		len -= cur_len;
>>> +	} while (len);
>>> +
>>> +	if (unlikely(!total_written))
>>> +		return rc;
>>> +
>>> +	iocb->ki_pos += total_written;
>>> +	return total_written;
>>> +
>>> +}
>>> +
>>>    ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
>>>    {
>>>    	struct file *file = iocb->ki_filp;
>>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-06-25 21:01     ` Jason Gunthorpe
@ 2018-06-26 15:13       ` Tom Talpey
  2018-06-27  3:21         ` Long Li
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Talpey @ 2018-06-26 15:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: longli, Steve French, linux-cifs, samba-technical, linux-kernel,
	linux-rdma

On 6/25/2018 5:01 PM, Jason Gunthorpe wrote:
> On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
>> On 5/30/2018 3:47 PM, Long Li wrote:
>>> From: Long Li <longli@microsoft.com>
>>>
>>> Add a function to allocate rdata without allocating pages for data
>>> transfer. This gives the caller an option to pass a number of pages
>>> that point to the data buffer.
>>>
>>> rdata is still reponsible for free those pages after it's done.
>>
>> "Caller" is still responsible? Or is the rdata somehow freeing itself
>> via another mechanism?
>>
>>>
>>> Signed-off-by: Long Li <longli@microsoft.com>
>>>   fs/cifs/cifsglob.h |  2 +-
>>>   fs/cifs/file.c     | 23 ++++++++++++++++++++---
>>>   2 files changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>>> index 8d16c3e..56864a87 100644
>>> +++ b/fs/cifs/cifsglob.h
>>> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
>>>   	unsigned int			tailsz;
>>>   	unsigned int			credits;
>>>   	unsigned int			nr_pages;
>>> -	struct page			*pages[];
>>> +	struct page			**pages;
>>
>> Technically speaking, these are syntactically equivalent. It may not
>> be worth changing this historic definition.
> 
> [] is a C99 'flex array', it has a different allocation behavior than
> ** and is not interchangeable..

In that case, it's an even better reason to not change the declaration.

Tom.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
  2018-06-26 15:13       ` Tom Talpey
@ 2018-06-27  3:21         ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-27  3:21 UTC (permalink / raw)
  To: Tom Talpey, Jason Gunthorpe
  Cc: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma

> Subject: Re: [Patch v2 02/15] CIFS: Add support for direct pages in rdata
> 
> On 6/25/2018 5:01 PM, Jason Gunthorpe wrote:
> > On Sat, Jun 23, 2018 at 09:50:20PM -0400, Tom Talpey wrote:
> >> On 5/30/2018 3:47 PM, Long Li wrote:
> >>> From: Long Li <longli@microsoft.com>
> >>>
> >>> Add a function to allocate rdata without allocating pages for data
> >>> transfer. This gives the caller an option to pass a number of pages
> >>> that point to the data buffer.
> >>>
> >>> rdata is still reponsible for free those pages after it's done.
> >>
> >> "Caller" is still responsible? Or is the rdata somehow freeing itself
> >> via another mechanism?
> >>
> >>>
> >>> Signed-off-by: Long Li <longli@microsoft.com>
> >>>   fs/cifs/cifsglob.h |  2 +-
> >>>   fs/cifs/file.c     | 23 ++++++++++++++++++++---
> >>>   2 files changed, 21 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index
> >>> 8d16c3e..56864a87 100644
> >>> +++ b/fs/cifs/cifsglob.h
> >>> @@ -1179,7 +1179,7 @@ struct cifs_readdata {
> >>>   	unsigned int			tailsz;
> >>>   	unsigned int			credits;
> >>>   	unsigned int			nr_pages;
> >>> -	struct page			*pages[];
> >>> +	struct page			**pages;
> >>
> >> Technically speaking, these are syntactically equivalent. It may not
> >> be worth changing this historic definition.
> >
> > [] is a C99 'flex array', it has a different allocation behavior than
> > ** and is not interchangeable..
> 
> In that case, it's an even better reason to not change the declaration.

No, it needs to be declared separately.

With Direct I/O, **pages are allocated and returned from iov_iter_get_pages_alloc() when locking those user pages. They can't be allocated as part of struct cifs_readdata.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-06-26 13:16       ` Tom Talpey
@ 2018-06-27  3:24         ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-27  3:24 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get page
> offset and length in smb_rqst
> 
> On 6/25/2018 5:14 PM, Long Li wrote:
> >> Subject: Re: [Patch v2 06/15] CIFS: Introduce helper function to get
> >> page offset and length in smb_rqst
> >>
> >> On 5/30/2018 3:47 PM, Long Li wrote:
> >>> From: Long Li <longli@microsoft.com>
> >>>
> >>> Introduce a function rqst_page_get_length to return the page offset
> >>> and length for a given page in smb_rqst. This function is to be used
> >>> by following patches.
> >>>
> >>> Signed-off-by: Long Li <longli@microsoft.com>
> >>> ---
> >>>    fs/cifs/cifsproto.h |  3 +++
> >>>    fs/cifs/misc.c      | 17 +++++++++++++++++
> >>>    2 files changed, 20 insertions(+)
> >>>
> >>> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index
> >>> 7933c5f..89dda14 100644
> >>> --- a/fs/cifs/cifsproto.h
> >>> +++ b/fs/cifs/cifsproto.h
> >>> @@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct
> >> crypto_shash **shash,
> >>>    		    struct sdesc **sdesc);
> >>>    void cifs_free_hash(struct crypto_shash **shash, struct sdesc
> >>> **sdesc);
> >>>
> >>> +extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned
> >>> +int
> >> page,
> >>> +				unsigned int *len, unsigned int *offset);
> >>> +
> >>>    #endif			/* _CIFSPROTO_H */
> >>> diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 96849b5..e951417
> >>> 100644
> >>> --- a/fs/cifs/misc.c
> >>> +++ b/fs/cifs/misc.c
> >>> @@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash,
> >> struct sdesc **sdesc)
> >>>    		crypto_free_shash(*shash);
> >>>    	*shash = NULL;
> >>>    }
> >>> +
> >>> +/**
> >>> + * rqst_page_get_length - obtain the length and offset for a page
> >>> +in smb_rqst
> >>> + * Input: rqst - a smb_rqst, page - a page index for rqst
> >>> + * Output: *len - the length for this page, *offset - the offset
> >>> +for this page  */ void rqst_page_get_length(struct smb_rqst *rqst,
> >>> +unsigned int page,
> >>> +				unsigned int *len, unsigned int *offset) {
> >>> +	*len = rqst->rq_pagesz;
> >>> +	*offset = (page == 0) ? rqst->rq_offset : 0;
> >>
> >> Really? Page 0 always has a zero offset??
> >
> > I think you are misreading this line. The offset for page 0 is rq_offset, the
> offset for all subsequent pages are 0.
> 
> Ah, yes, sorry I did read it incorrectly.
> >>> +
> >>> +	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
> >>> +		*len = rqst->rq_tailsz;
> >>> +	else if (page == 0)
> >>> +		*len = rqst->rq_pagesz - rqst->rq_offset; }
> >>>
> >>
> >> This subroutine does what patch 5 does inline. Why not push this
> >> patch up in the sequence and use the helper?
> >
> > This is actually a little different. This function returns the length and offset
> for a given page in the request. There might be multiple pages in a request.
> 
> Ok, but I still think there is quite a bit of inline computation of this stuff that
> would more clearly and more robustly be done in a set of common functions.
> If someone ever touches the code to support a new upper layer, or even
> integrate with more complex compounding, things will get ugly fast.
> 
> > The other function calculates the total length of all the pages in a request.
> 
> Again, a great candidate for a common set of subroutines, IMO.

I will look into this. The reason I didn't make a common function is that this is the only patch that will use it in this way.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [Patch v2 14/15] CIFS: Add support for direct I/O write
  2018-06-26 13:29       ` Tom Talpey
@ 2018-06-27  3:44         ` Long Li
  0 siblings, 0 replies; 50+ messages in thread
From: Long Li @ 2018-06-27  3:44 UTC (permalink / raw)
  To: Tom Talpey, Steve French, linux-cifs, samba-technical,
	linux-kernel, linux-rdma

> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
> 
> On 6/26/2018 12:39 AM, Long Li wrote:
> >> Subject: Re: [Patch v2 14/15] CIFS: Add support for direct I/O write
> >>
> >> On 5/30/2018 3:48 PM, Long Li wrote:
> >>> From: Long Li <longli@microsoft.com>
> >>>
> >>> Implement the function for direct I/O write. It doesn't support AIO,
> >>> which will be implemented in a follow up patch.
> >>>
> >>> Signed-off-by: Long Li <longli@microsoft.com>
> >>> ---
> >>>    fs/cifs/cifsfs.h |   1 +
> >>>    fs/cifs/file.c   | 165
> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>    2 files changed, 166 insertions(+)
> >>>
> >>> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h index
> >>> 7fba9aa..e9c5103 100644
> >>> --- a/fs/cifs/cifsfs.h
> >>> +++ b/fs/cifs/cifsfs.h
> >>> @@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb
> >>> *iocb,
> >> struct iov_iter *to);
> >>>    extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
> >>>    extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
> >>>    extern ssize_t cifs_user_writev(struct kiocb *iocb, struct
> >>> iov_iter *from);
> >>> +extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct
> >>> +iov_iter *from);
> >>>    extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter
> *from);
> >>>    extern int cifs_lock(struct file *, int, struct file_lock *);
> >>>    extern int cifs_fsync(struct file *, loff_t, loff_t, int); diff
> >>> --git a/fs/cifs/file.c b/fs/cifs/file.c index e6e6f24..8c385b1
> >>> 100644
> >>> --- a/fs/cifs/file.c
> >>> +++ b/fs/cifs/file.c
> >>> @@ -2461,6 +2461,35 @@ cifs_uncached_writedata_release(struct kref
> >>> *refcount)
> >>>
> >>>    static void collect_uncached_write_data(struct cifs_aio_ctx
> >>> *ctx);
> >>>
> >>> +static void cifs_direct_writedata_release(struct kref *refcount) {
> >>> +	int i;
> >>> +	struct cifs_writedata *wdata = container_of(refcount,
> >>> +					struct cifs_writedata, refcount);
> >>> +
> >>> +	for (i = 0; i < wdata->nr_pages; i++)
> >>> +		put_page(wdata->pages[i]);
> >>> +
> >>> +	cifs_writedata_release(refcount);
> >>> +}
> >>> +
> >>> +static void cifs_direct_writev_complete(struct work_struct *work) {
> >>> +	struct cifs_writedata *wdata = container_of(work,
> >>> +					struct cifs_writedata, work);
> >>> +	struct inode *inode = d_inode(wdata->cfile->dentry);
> >>> +	struct cifsInodeInfo *cifsi = CIFS_I(inode);
> >>> +
> >>> +	spin_lock(&inode->i_lock);
> >>> +	cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
> >>> +	if (cifsi->server_eof > inode->i_size)
> >>> +		i_size_write(inode, cifsi->server_eof);
> >>> +	spin_unlock(&inode->i_lock);
> >>> +
> >>> +	complete(&wdata->done);
> >>> +	kref_put(&wdata->refcount, cifs_direct_writedata_release); }
> >>> +
> >>>    static void
> >>>    cifs_uncached_writev_complete(struct work_struct *work)
> >>>    {
> >>> @@ -2703,6 +2732,142 @@ static void
> >>> collect_uncached_write_data(struct
> >> cifs_aio_ctx *ctx)
> >>>    		complete(&ctx->done);
> >>>    }
> >>>
> >>> +ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter
> >>> +*from) {
> >>> +	struct file *file = iocb->ki_filp;
> >>> +	ssize_t total_written = 0;
> >>> +	struct cifsFileInfo *cfile;
> >>> +	struct cifs_tcon *tcon;
> >>> +	struct cifs_sb_info *cifs_sb;
> >>> +	struct TCP_Server_Info *server;
> >>> +	pid_t pid;
> >>> +	unsigned long nr_pages;
> >>> +	loff_t offset = iocb->ki_pos;
> >>> +	size_t len = iov_iter_count(from);
> >>> +	int rc;
> >>> +	struct cifs_writedata *wdata;
> >>> +
> >>> +	/*
> >>> +	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
> >>> +	 * In this case, fall back to non-direct write function.
> >>> +	 */
> >>> +	if (from->type & ITER_KVEC) {
> >>> +		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec
> >> I/O\n");
> >>> +		return cifs_user_writev(iocb, from);
> >>> +	}
> >>> +
> >>> +	rc = generic_write_checks(iocb, from);
> >>> +	if (rc <= 0)
> >>> +		return rc;
> >>> +
> >>> +	cifs_sb = CIFS_FILE_SB(file);
> >>> +	cfile = file->private_data;
> >>> +	tcon = tlink_tcon(cfile->tlink);
> >>> +	server = tcon->ses->server;
> >>> +
> >>> +	if (!server->ops->async_writev)
> >>> +		return -ENOSYS;
> >>> +
> >>> +	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
> >>> +		pid = cfile->pid;
> >>> +	else
> >>> +		pid = current->tgid;
> >>> +
> >>> +	do {
> >>> +		unsigned int wsize, credits;
> >>> +		struct page **pagevec;
> >>> +		size_t start;
> >>> +		ssize_t cur_len;
> >>> +
> >>> +		rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
> >>> +						   &wsize, &credits);
> >>> +		if (rc)
> >>> +			break;
> >>> +
> >>> +		cur_len = iov_iter_get_pages_alloc(
> >>> +				from, &pagevec, wsize, &start);
> >>> +		if (cur_len < 0) {
> >>> +			cifs_dbg(VFS,
> >>> +				"direct_writev couldn't get user pages "
> >>> +				"(rc=%zd) iter type %d iov_offset %lu count"
> >>> +				" %lu\n",
> >>> +				cur_len, from->type,
> >>> +				from->iov_offset, from->count);
> >>> +			dump_stack();
> >>> +			break;
> >>> +		}
> >>> +		if (cur_len < 0)
> >>> +			break;
> >>
> >> This cur_len < 0 test is redundant with the prior if(), delete.
> >>> +
> >>> +		nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
> >>
> >> Am I misreading, or will this return be one more page than needed? If
> >> start (the first byte offset) is > 0, nr_pages will already be one.
> >> And if cur_len is 4KB, even if start is 0, nr_pages will be two.
> >
> > I think the calculation is correct, assuming cur_len > 0. (which
> > should be the case if we reach here)
> 
> Err, cur_len could possibly be zero, the prior line only breaks the loop if
> cur_len < 0.

I will look into this. It's a good idea to return error on 0 since we have no way to proceed.

> 
> > If cur_len is 4kb and start is 0, nr_pages will be 1.
> 
> Hmm, I guess. But again, it seems as if this page handling is all being coded
> inline, in subtly different ways. I'm just concerned about clarity and
> robustness. To me, if it's hard to review, it's even harder to maintain.

I will address this in an updated patch.

> 
> Tom.
> 
> >
> >>
> >>> +
> >>> +		wdata = cifs_writedata_direct_alloc(pagevec,
> >>> +					     cifs_direct_writev_complete);
> >>> +		if (!wdata) {
> >>> +			rc = -ENOMEM;
> >>> +			add_credits_and_wake_if(server, credits, 0);
> >>> +			break;
> >>> +		}
> >>> +
> >>> +		wdata->nr_pages = nr_pages;
> >>> +		wdata->page_offset = start;
> >>> +		wdata->pagesz = PAGE_SIZE;
> >>> +		wdata->tailsz =
> >>> +			nr_pages > 1 ?
> >>> +			cur_len - (PAGE_SIZE - start) -
> >>> +				(nr_pages - 2) * PAGE_SIZE :
> >>> +			cur_len;
> >>> +
> >>> +		wdata->sync_mode = WB_SYNC_ALL;
> >>> +		wdata->offset = (__u64)offset;
> >>> +		wdata->cfile = cifsFileInfo_get(cfile);
> >>> +		wdata->pid = pid;
> >>> +		wdata->bytes = cur_len;
> >>> +		wdata->credits = credits;
> >>> +
> >>> +		rc = 0;
> >>> +		if (wdata->cfile->invalidHandle)
> >>> +			rc = cifs_reopen_file(wdata->cfile, false);
> >>> +
> >>> +		if (!rc)
> >>> +			rc = server->ops->async_writev(wdata,
> >>> +					cifs_direct_writedata_release);
> >>> +
> >>> +		if (rc) {
> >>> +			add_credits_and_wake_if(server, wdata->credits, 0);
> >>> +			kref_put(&wdata->refcount,
> >>> +				 cifs_direct_writedata_release);
> >>> +			if (rc == -EAGAIN)
> >>> +				continue;
> >>> +			break;
> >>> +		}
> >>
> >> Same comments as for previous patch re the if (rc) ladder, and the
> >> break/continues both being better expressed as careful goto's.
> >>
> >>> +
> >>> +		wait_for_completion(&wdata->done);
> >>> +		if (wdata->result) {
> >>> +			rc = wdata->result;
> >>> +			kref_put(&wdata->refcount,
> >>> +					cifs_direct_writedata_release);
> >>> +			if (rc == -EAGAIN)
> >>> +				continue;
> >>> +			break;
> >>> +		}
> >>> +
> >>> +		kref_put(&wdata->refcount, cifs_direct_writedata_release);
> >>> +
> >>> +		iov_iter_advance(from, cur_len);
> >>> +		total_written += cur_len;
> >>> +		offset += cur_len;
> >>> +		len -= cur_len;
> >>> +	} while (len);
> >>> +
> >>> +	if (unlikely(!total_written))
> >>> +		return rc;
> >>> +
> >>> +	iocb->ki_pos += total_written;
> >>> +	return total_written;
> >>> +
> >>> +}
> >>> +
> >>>    ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
> >>>    {
> >>>    	struct file *file = iocb->ki_filp;
> >>>


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-06-27  3:44 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30 19:47 [Patch v2 00/15] CIFS: Add direct I/O support Long Li
2018-05-30 19:47 ` [Patch v2 01/15] CIFS: Introduce offset for the 1st page in data transfer structures Long Li
2018-05-30 19:47 ` [Patch v2 02/15] CIFS: Add support for direct pages in rdata Long Li
2018-05-30 20:27   ` Ruhl, Michael J
2018-05-30 20:57     ` Long Li
2018-06-24  1:50   ` Tom Talpey
2018-06-25 20:25     ` Long Li
2018-06-25 21:01     ` Jason Gunthorpe
2018-06-26 15:13       ` Tom Talpey
2018-06-27  3:21         ` Long Li
2018-05-30 19:47 ` [Patch v2 03/15] CIFS: Use offset when reading pages Long Li
2018-06-24  1:58   ` Tom Talpey
2018-06-25 20:27     ` Long Li
2018-05-30 19:47 ` [Patch v2 04/15] CIFS: Add support for direct pages in wdata Long Li
2018-06-24  2:01   ` Tom Talpey
2018-06-25 20:34     ` Long Li
2018-05-30 19:47 ` [Patch v2 05/15] CIFS: Calculate the correct request length based on page offset and tail size Long Li
2018-06-24  2:07   ` Tom Talpey
2018-06-25 21:07     ` Long Li
2018-05-30 19:47 ` [Patch v2 06/15] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
2018-06-24  2:09   ` Tom Talpey
2018-06-25 21:14     ` Long Li
2018-06-26 13:16       ` Tom Talpey
2018-06-27  3:24         ` Long Li
2018-05-30 19:47 ` [Patch v2 07/15] CIFS: When sending data on socket, pass the correct page offset Long Li
2018-05-30 19:48 ` [Patch v2 08/15] CIFS: SMBD: Support page offset in RDMA send Long Li
2018-06-24  2:11   ` Tom Talpey
2018-06-25 21:23     ` Long Li
2018-05-30 19:48 ` [Patch v2 09/15] CIFS: SMBD: Support page offset in RDMA recv Long Li
2018-06-24  2:16   ` Tom Talpey
2018-06-25 21:29     ` Long Li
2018-05-30 19:48 ` [Patch v2 10/15] CIFS: SMBD: Support page offset in memory registration Long Li
2018-06-24  2:24   ` Tom Talpey
2018-05-30 19:48 ` [Patch v2 11/15] CIFS: Pass page offset for calculating signature Long Li
2018-06-24  2:27   ` Tom Talpey
2018-06-26  4:15     ` Long Li
2018-05-30 19:48 ` [Patch v2 12/15] CIFS: Pass page offset for encrypting Long Li
2018-06-24  2:28   ` Tom Talpey
2018-05-30 19:48 ` [Patch v2 13/15] CIFS: Add support for direct I/O read Long Li
2018-06-02  5:51   ` kbuild test robot
2018-06-02  7:15   ` kbuild test robot
2018-06-24  2:39   ` Tom Talpey
2018-06-26  4:34     ` Long Li
2018-05-30 19:48 ` [Patch v2 14/15] CIFS: Add support for direct I/O write Long Li
2018-06-24  2:48   ` Tom Talpey
2018-06-26  4:39     ` Long Li
2018-06-26 13:29       ` Tom Talpey
2018-06-27  3:44         ` Long Li
2018-05-30 19:48 ` [Patch v2 15/15] CIFS: Add direct I/O functions to file_operations Long Li
2018-06-07 11:17   ` Pavel Shilovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).