All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch v3 00/16] CIFS: add support for direct I/O
@ 2018-09-08  2:13 Long Li
  2018-09-08  2:13 ` [Patch v3 01/16] CIFS: Add support for direct pages in rdata Long Li
                   ` (16 more replies)
  0 siblings, 17 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

This patch set implements direct I/O.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons of possible protocol
required signing and encryption on user data.

With this patch set, CIFS passes the I/O data directly from user-space
buffer to the transport layer, when file system is mounted with
"cache-none".

Patch v2 addressed comments from Christoph Hellwig <hch@lst.de> and
Tom Talpey <ttalpey@microsoft.com> to implement direct I/O for both
socket and RDMA.

Patch v3 added support for kernel AIO.


Long Li (16):
  CIFS: Add support for direct pages in rdata
  CIFS: Use offset when reading pages
  CIFS: Add support for direct pages in wdata
  CIFS: pass page offset when issuing SMB write
  CIFS: Calculate the correct request length based on page offset and
    tail size
  CIFS: Introduce helper function to get page offset and length in
    smb_rqst
  CIFS: When sending data on socket, pass the correct page offset
  CIFS: SMBD: Support page offset in RDMA send
  CIFS: SMBD: Support page offset in RDMA recv
  CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration
  CIFS: SMBD: Support page offset in memory registration
  CIFS: Pass page offset for calculating signature
  CIFS: Pass page offset for encrypting
  CIFS: Add support for direct I/O read
  CIFS: Add support for direct I/O write
  CIFS: Add direct I/O functions to file_operations

 fs/cifs/cifsencrypt.c |   9 +-
 fs/cifs/cifsfs.c      |  10 +-
 fs/cifs/cifsfs.h      |   2 +
 fs/cifs/cifsglob.h    |  11 +-
 fs/cifs/cifsproto.h   |   9 +-
 fs/cifs/cifssmb.c     |  19 +-
 fs/cifs/connect.c     |   5 +-
 fs/cifs/file.c        | 477 ++++++++++++++++++++++++++++++++++++++++++--------
 fs/cifs/misc.c        |  17 ++
 fs/cifs/smb2ops.c     |  22 ++-
 fs/cifs/smb2pdu.c     |  20 ++-
 fs/cifs/smbdirect.c   | 156 ++++++++++-------
 fs/cifs/smbdirect.h   |   2 +-
 fs/cifs/transport.c   |  34 ++--
 14 files changed, 606 insertions(+), 187 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Patch v3 01/16] CIFS: Add support for direct pages in rdata
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 02/16] CIFS: Use offset when reading pages Long Li
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Add a function to allocate rdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages
that point to the data buffer.

rdata is reponsible for free those pages after it's done.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h |  3 +--
 fs/cifs/file.c     | 23 ++++++++++++++++++++---
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 80a34ce..166e140 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1173,14 +1173,13 @@ struct cifs_readdata {
 	struct kvec			iov[2];
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
-	struct page			**direct_pages;
 #endif
 	unsigned int			pagesz;
 	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
-	struct page			*pages[];
+	struct page			**pages;
 };
 
 struct cifs_writedata;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..1c98293 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2880,13 +2880,13 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from)
 }
 
 static struct cifs_readdata *
-cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_readdata_direct_alloc(struct page **pages, work_func_t complete)
 {
 	struct cifs_readdata *rdata;
 
-	rdata = kzalloc(sizeof(*rdata) + (sizeof(struct page *) * nr_pages),
-			GFP_KERNEL);
+	rdata = kzalloc(sizeof(*rdata), GFP_KERNEL);
 	if (rdata != NULL) {
+		rdata->pages = pages;
 		kref_init(&rdata->refcount);
 		INIT_LIST_HEAD(&rdata->list);
 		init_completion(&rdata->done);
@@ -2896,6 +2896,22 @@ cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
 	return rdata;
 }
 
+static struct cifs_readdata *
+cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete)
+{
+	struct page **pages =
+		kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL);
+	struct cifs_readdata *ret = NULL;
+
+	if (pages) {
+		ret = cifs_readdata_direct_alloc(pages, complete);
+		if (!ret)
+			kfree(pages);
+	}
+
+	return ret;
+}
+
 void
 cifs_readdata_release(struct kref *refcount)
 {
@@ -2910,6 +2926,7 @@ cifs_readdata_release(struct kref *refcount)
 	if (rdata->cfile)
 		cifsFileInfo_put(rdata->cfile);
 
+	kvfree(rdata->pages);
 	kfree(rdata);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 02/16] CIFS: Use offset when reading pages
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
  2018-09-08  2:13 ` [Patch v3 01/16] CIFS: Add support for direct pages in rdata Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 03/16] CIFS: Add support for direct pages in wdata Long Li
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With offset defined in rdata, transport functions need to look at this
offset when reading data into the correct places in pages.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h |  4 +++-
 fs/cifs/cifssmb.c   |  1 +
 fs/cifs/connect.c   |  5 +++--
 fs/cifs/file.c      | 52 +++++++++++++++++++++++++++++++++++++---------------
 fs/cifs/smb2ops.c   |  2 +-
 fs/cifs/smb2pdu.c   |  1 +
 6 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index dc80f84..1f27d8e 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -203,7 +203,9 @@ extern void dequeue_mid(struct mid_q_entry *mid, bool malformed);
 extern int cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 			         unsigned int to_read);
 extern int cifs_read_page_from_socket(struct TCP_Server_Info *server,
-				      struct page *page, unsigned int to_read);
+					struct page *page,
+					unsigned int page_offset,
+					unsigned int to_read);
 extern int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 			       struct cifs_sb_info *cifs_sb);
 extern int cifs_match_super(struct super_block *, void *);
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index c8e4278..a1af258 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1596,6 +1596,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
 	struct smb_rqst rqst = { .rq_iov = rdata->iov,
 				 .rq_nvec = 2,
 				 .rq_pages = rdata->pages,
+				 .rq_offset = rdata->page_offset,
 				 .rq_npages = rdata->nr_pages,
 				 .rq_pagesz = rdata->pagesz,
 				 .rq_tailsz = rdata->tailsz };
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 83b0234..8501da0 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -594,10 +594,11 @@ cifs_read_from_socket(struct TCP_Server_Info *server, char *buf,
 
 int
 cifs_read_page_from_socket(struct TCP_Server_Info *server, struct page *page,
-		      unsigned int to_read)
+	unsigned int page_offset, unsigned int to_read)
 {
 	struct msghdr smb_msg;
-	struct bio_vec bv = {.bv_page = page, .bv_len = to_read};
+	struct bio_vec bv = {
+		.bv_page = page, .bv_len = to_read, .bv_offset = page_offset};
 	iov_iter_bvec(&smb_msg.msg_iter, READ | ITER_BVEC, &bv, 1, to_read);
 	return cifs_readv_from_socket(server, &smb_msg);
 }
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1c98293..87eece6 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3026,12 +3026,20 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 	int result = 0;
 	unsigned int i;
 	unsigned int nr_pages = rdata->nr_pages;
+	unsigned int page_offset = rdata->page_offset;
 
 	rdata->got_bytes = 0;
 	rdata->tailsz = PAGE_SIZE;
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page = rdata->pages[i];
 		size_t n;
+		unsigned int segment_size = rdata->pagesz;
+
+		if (i == 0)
+			segment_size -= page_offset;
+		else
+			page_offset = 0;
+
 
 		if (len <= 0) {
 			/* no need to hold page hostage */
@@ -3040,24 +3048,25 @@ uncached_fill_pages(struct TCP_Server_Info *server,
 			put_page(page);
 			continue;
 		}
+
 		n = len;
-		if (len >= PAGE_SIZE) {
+		if (len >= segment_size)
 			/* enough data to fill the page */
-			n = PAGE_SIZE;
-			len -= n;
-		} else {
-			zero_user(page, len, PAGE_SIZE - len);
+			n = segment_size;
+		else
 			rdata->tailsz = len;
-			len = 0;
-		}
+		len -= n;
+
 		if (iter)
-			result = copy_page_from_iter(page, 0, n, iter);
+			result = copy_page_from_iter(
+					page, page_offset, n, iter);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 		else if (rdata->mr)
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(
+					server, page, page_offset, n);
 		if (result < 0)
 			break;
 
@@ -3130,6 +3139,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		rdata->bytes = cur_len;
 		rdata->pid = pid;
 		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = PAGE_SIZE;
 		rdata->read_into_pages = cifs_uncached_read_into_pages;
 		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
 		rdata->credits = credits;
@@ -3574,6 +3584,7 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 	u64 eof;
 	pgoff_t eof_index;
 	unsigned int nr_pages = rdata->nr_pages;
+	unsigned int page_offset = rdata->page_offset;
 
 	/* determine the eof that the server (probably) has */
 	eof = CIFS_I(rdata->mapping->host)->server_eof;
@@ -3584,13 +3595,21 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 	rdata->tailsz = PAGE_SIZE;
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page = rdata->pages[i];
-		size_t n = PAGE_SIZE;
+		unsigned int to_read = rdata->pagesz;
+		size_t n;
+
+		if (i == 0)
+			to_read -= page_offset;
+		else
+			page_offset = 0;
+
+		n = to_read;
 
-		if (len >= PAGE_SIZE) {
-			len -= PAGE_SIZE;
+		if (len >= to_read) {
+			len -= to_read;
 		} else if (len > 0) {
 			/* enough for partial page, fill and zero the rest */
-			zero_user(page, len, PAGE_SIZE - len);
+			zero_user(page, len + page_offset, to_read - len);
 			n = rdata->tailsz = len;
 			len = 0;
 		} else if (page->index > eof_index) {
@@ -3622,13 +3641,15 @@ readpages_fill_pages(struct TCP_Server_Info *server,
 		}
 
 		if (iter)
-			result = copy_page_from_iter(page, 0, n, iter);
+			result = copy_page_from_iter(
+					page, page_offset, n, iter);
 #ifdef CONFIG_CIFS_SMB_DIRECT
 		else if (rdata->mr)
 			result = n;
 #endif
 		else
-			result = cifs_read_page_from_socket(server, page, n);
+			result = cifs_read_page_from_socket(
+					server, page, page_offset, n);
 		if (result < 0)
 			break;
 
@@ -3807,6 +3828,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
 		rdata->bytes = bytes;
 		rdata->pid = pid;
 		rdata->pagesz = PAGE_SIZE;
+		rdata->tailsz = PAGE_SIZE;
 		rdata->read_into_pages = cifs_readpages_read_into_pages;
 		rdata->copy_into_pages = cifs_readpages_copy_into_pages;
 		rdata->credits = credits;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 7c0edd2..1fa1c29 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2467,7 +2467,7 @@ read_data_into_pages(struct TCP_Server_Info *server, struct page **pages,
 			zero_user(page, len, PAGE_SIZE - len);
 			len = 0;
 		}
-		length = cifs_read_page_from_socket(server, page, n);
+		length = cifs_read_page_from_socket(server, page, 0, n);
 		if (length < 0)
 			return length;
 		server->total_read += length;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 47d5331..6c22da8 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2683,6 +2683,7 @@ smb2_readv_callback(struct mid_q_entry *mid)
 	struct smb_rqst rqst = { .rq_iov = rdata->iov,
 				 .rq_nvec = 2,
 				 .rq_pages = rdata->pages,
+				 .rq_offset = rdata->page_offset,
 				 .rq_npages = rdata->nr_pages,
 				 .rq_pagesz = rdata->pagesz,
 				 .rq_tailsz = rdata->tailsz };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 03/16] CIFS: Add support for direct pages in wdata
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
  2018-09-08  2:13 ` [Patch v3 01/16] CIFS: Add support for direct pages in rdata Long Li
  2018-09-08  2:13 ` [Patch v3 02/16] CIFS: Use offset when reading pages Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 04/16] CIFS: pass page offset when issuing SMB write Long Li
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Add a function to allocate wdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages that
point to the data buffer to write to.

wdata is reponsible for free those pages after it's done.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsglob.h  |  3 +--
 fs/cifs/cifsproto.h |  2 ++
 fs/cifs/cifssmb.c   | 17 ++++++++++++++---
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 166e140..7f62c98 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1199,14 +1199,13 @@ struct cifs_writedata {
 	int				result;
 #ifdef CONFIG_CIFS_SMB_DIRECT
 	struct smbd_mr			*mr;
-	struct page			**direct_pages;
 #endif
 	unsigned int			pagesz;
 	unsigned int			page_offset;
 	unsigned int			tailsz;
 	unsigned int			credits;
 	unsigned int			nr_pages;
-	struct page			*pages[];
+	struct page			**pages;
 };
 
 /*
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 1f27d8e..7933c5f 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -533,6 +533,8 @@ int cifs_async_writev(struct cifs_writedata *wdata,
 void cifs_writev_complete(struct work_struct *work);
 struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
 						work_func_t complete);
+struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages,
+						work_func_t complete);
 void cifs_writedata_release(struct kref *refcount);
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
 			  struct cifs_sb_info *cifs_sb,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index a1af258..503e0ed 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1953,6 +1953,7 @@ cifs_writedata_release(struct kref *refcount)
 	if (wdata->cfile)
 		cifsFileInfo_put(wdata->cfile);
 
+	kvfree(wdata->pages);
 	kfree(wdata);
 }
 
@@ -2076,12 +2077,22 @@ cifs_writev_complete(struct work_struct *work)
 struct cifs_writedata *
 cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
 {
+	struct page **pages =
+		kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
+	if (pages)
+		return cifs_writedata_direct_alloc(pages, complete);
+
+	return NULL;
+}
+
+struct cifs_writedata *
+cifs_writedata_direct_alloc(struct page **pages, work_func_t complete)
+{
 	struct cifs_writedata *wdata;
 
-	/* writedata + number of page pointers */
-	wdata = kzalloc(sizeof(*wdata) +
-			sizeof(struct page *) * nr_pages, GFP_NOFS);
+	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
 	if (wdata != NULL) {
+		wdata->pages = pages;
 		kref_init(&wdata->refcount);
 		INIT_LIST_HEAD(&wdata->list);
 		init_completion(&wdata->done);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 04/16] CIFS: pass page offset when issuing SMB write
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (2 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 03/16] CIFS: Add support for direct pages in wdata Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 05/16] CIFS: Calculate the correct request length based on page offset and tail size Long Li
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When issuing SMB writes, pass along the write data page offset to transport.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifssmb.c | 1 +
 fs/cifs/smb2pdu.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 503e0ed..0a57c61 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -2200,6 +2200,7 @@ cifs_async_writev(struct cifs_writedata *wdata,
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
 	rqst.rq_pages = wdata->pages;
+	rqst.rq_offset = wdata->page_offset;
 	rqst.rq_npages = wdata->nr_pages;
 	rqst.rq_pagesz = wdata->pagesz;
 	rqst.rq_tailsz = wdata->tailsz;
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 6c22da8..f603fbe 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3046,6 +3046,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
 	rqst.rq_iov = iov;
 	rqst.rq_nvec = 2;
 	rqst.rq_pages = wdata->pages;
+	rqst.rq_offset = wdata->page_offset;
 	rqst.rq_npages = wdata->nr_pages;
 	rqst.rq_pagesz = wdata->pagesz;
 	rqst.rq_tailsz = wdata->tailsz;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 05/16] CIFS: Calculate the correct request length based on page offset and tail size
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (3 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 04/16] CIFS: pass page offset when issuing SMB write Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 06/16] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It's possible that the page offset is non-zero in the pages in a request,
change the function to calculate the correct data buffer length.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/transport.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 927226a..d6b5523 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -212,10 +212,24 @@ rqst_len(struct smb_rqst *rqst)
 	for (i = 0; i < rqst->rq_nvec; i++)
 		buflen += iov[i].iov_len;
 
-	/* add in the page array if there is one */
+	/*
+	 * Add in the page array if there is one. The caller needs to make
+	 * sure rq_offset and rq_tailsz are set correctly. If a buffer of
+	 * multiple pages ends at page boundary, rq_tailsz needs to be set to
+	 * PAGE_SIZE.
+	 */
 	if (rqst->rq_npages) {
-		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-		buflen += rqst->rq_tailsz;
+		if (rqst->rq_npages == 1)
+			buflen += rqst->rq_tailsz;
+		else {
+			/*
+			 * If there is more than one page, calculate the
+			 * buffer length based on rq_offset and rq_tailsz
+			 */
+			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+					rqst->rq_offset;
+			buflen += rqst->rq_tailsz;
+		}
 	}
 
 	return buflen;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 06/16] CIFS: Introduce helper function to get page offset and length in smb_rqst
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (4 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 05/16] CIFS: Calculate the correct request length based on page offset and tail size Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 07/16] CIFS: When sending data on socket, pass the correct page offset Long Li
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Introduce a function rqst_page_get_length to return the page offset and
length for a given page in smb_rqst. This function is to be used by
following patches.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsproto.h |  3 +++
 fs/cifs/misc.c      | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 7933c5f..89dda14 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -557,4 +557,7 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash,
 		    struct sdesc **sdesc);
 void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
 
+extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+				unsigned int *len, unsigned int *offset);
+
 #endif			/* _CIFSPROTO_H */
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index 96849b5..e951417 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -905,3 +905,20 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc)
 		crypto_free_shash(*shash);
 	*shash = NULL;
 }
+
+/**
+ * rqst_page_get_length - obtain the length and offset for a page in smb_rqst
+ * Input: rqst - a smb_rqst, page - a page index for rqst
+ * Output: *len - the length for this page, *offset - the offset for this page
+ */
+void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page,
+				unsigned int *len, unsigned int *offset)
+{
+	*len = rqst->rq_pagesz;
+	*offset = (page == 0) ? rqst->rq_offset : 0;
+
+	if (rqst->rq_npages == 1 || page == rqst->rq_npages-1)
+		*len = rqst->rq_tailsz;
+	else if (page == 0)
+		*len = rqst->rq_pagesz - rqst->rq_offset;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 07/16] CIFS: When sending data on socket, pass the correct page offset
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (5 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 06/16] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 08/16] CIFS: SMBD: Support page offset in RDMA send Long Li
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It's possible that the offset is non-zero in the page to send, change the
function to pass this offset to socket.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/transport.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index d6b5523..5c96ee8 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -288,15 +288,13 @@ __smb_send_rqst(struct TCP_Server_Info *server, struct smb_rqst *rqst)
 
 	/* now walk the page array and send each page in it */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		size_t len = i == rqst->rq_npages - 1
-				? rqst->rq_tailsz
-				: rqst->rq_pagesz;
-		struct bio_vec bvec = {
-			.bv_page = rqst->rq_pages[i],
-			.bv_len = len
-		};
+		struct bio_vec bvec;
+
+		bvec.bv_page = rqst->rq_pages[i];
+		rqst_page_get_length(rqst, i, &bvec.bv_len, &bvec.bv_offset);
+
 		iov_iter_bvec(&smb_msg.msg_iter, WRITE | ITER_BVEC,
-			      &bvec, 1, len);
+			      &bvec, 1, bvec.bv_len);
 		rc = smb_send_kvec(server, &smb_msg, &sent);
 		if (rc < 0)
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 08/16] CIFS: SMBD: Support page offset in RDMA send
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (6 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 07/16] CIFS: When sending data on socket, pass the correct page offset Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 09/16] CIFS: SMBD: Support page offset in RDMA recv Long Li
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

The RDMA send function needs to look at offset in the request pages, and
send data starting from there.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index c62f7c9..6141e3c 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -17,6 +17,7 @@
 #include <linux/highmem.h>
 #include "smbdirect.h"
 #include "cifs_debug.h"
+#include "cifsproto.h"
 
 static struct smbd_response *get_empty_queue_buffer(
 		struct smbd_connection *info);
@@ -2082,7 +2083,7 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 	struct kvec vec;
 	int nvecs;
 	int size;
-	int buflen = 0, remaining_data_length;
+	unsigned int buflen = 0, remaining_data_length;
 	int start, i, j;
 	int max_iov_size =
 		info->max_send_size - sizeof(struct smbd_data_transfer);
@@ -2113,10 +2114,17 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 		buflen += iov[i].iov_len;
 	}
 
-	/* add in the page array if there is one */
+	/*
+	 * Add in the page array if there is one. The caller needs to set
+	 * rq_tailsz to PAGE_SIZE when the buffer has multiple pages and
+	 * ends at page boundary
+	 */
 	if (rqst->rq_npages) {
-		buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-		buflen += rqst->rq_tailsz;
+		if (rqst->rq_npages == 1)
+			buflen += rqst->rq_tailsz;
+		else
+			buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
+					rqst->rq_offset + rqst->rq_tailsz;
 	}
 
 	if (buflen + sizeof(struct smbd_data_transfer) >
@@ -2213,8 +2221,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 
 	/* now sending pages if there are any */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		buflen = (i == rqst->rq_npages-1) ?
-			rqst->rq_tailsz : rqst->rq_pagesz;
+		unsigned int offset;
+
+		rqst_page_get_length(rqst, i, &buflen, &offset);
 		nvecs = (buflen + max_iov_size - 1) / max_iov_size;
 		log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
 			buflen, nvecs);
@@ -2225,9 +2234,11 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
 			remaining_data_length -= size;
 			log_write(INFO, "sending pages i=%d offset=%d size=%d"
 				" remaining_data_length=%d\n",
-				i, j*max_iov_size, size, remaining_data_length);
+				i, j*max_iov_size+offset, size,
+				remaining_data_length);
 			rc = smbd_post_send_page(
-				info, rqst->rq_pages[i], j*max_iov_size,
+				info, rqst->rq_pages[i],
+				j*max_iov_size + offset,
 				size, remaining_data_length);
 			if (rc)
 				goto done;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 09/16] CIFS: SMBD: Support page offset in RDMA recv
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (7 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 08/16] CIFS: SMBD: Support page offset in RDMA send Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 10/16] CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration Long Li
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

RDMA recv function needs to place data to the correct place starting at
page offset.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 6141e3c..ba53c52 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2004,10 +2004,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
  * return value: actual data read
  */
 static int smbd_recv_page(struct smbd_connection *info,
-		struct page *page, unsigned int to_read)
+		struct page *page, unsigned int page_offset,
+		unsigned int to_read)
 {
 	int ret;
 	char *to_address;
+	void *page_address;
 
 	/* make sure we have the page ready for read */
 	ret = wait_event_interruptible(
@@ -2015,16 +2017,17 @@ static int smbd_recv_page(struct smbd_connection *info,
 		info->reassembly_data_length >= to_read ||
 			info->transport_status != SMBD_CONNECTED);
 	if (ret)
-		return 0;
+		return ret;
 
 	/* now we can read from reassembly queue and not sleep */
-	to_address = kmap_atomic(page);
+	page_address = kmap_atomic(page);
+	to_address = (char *) page_address + page_offset;
 
 	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
 		page, to_address, to_read);
 
 	ret = smbd_recv_buf(info, to_address, to_read);
-	kunmap_atomic(to_address);
+	kunmap_atomic(page_address);
 
 	return ret;
 }
@@ -2038,7 +2041,7 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 {
 	char *buf;
 	struct page *page;
-	unsigned int to_read;
+	unsigned int to_read, page_offset;
 	int rc;
 
 	info->smbd_recv_pending++;
@@ -2052,15 +2055,16 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 
 	case READ | ITER_BVEC:
 		page = msg->msg_iter.bvec->bv_page;
+		page_offset = msg->msg_iter.bvec->bv_offset;
 		to_read = msg->msg_iter.bvec->bv_len;
-		rc = smbd_recv_page(info, page, to_read);
+		rc = smbd_recv_page(info, page, page_offset, to_read);
 		break;
 
 	default:
 		/* It's a bug in upper layer to get there */
 		cifs_dbg(VFS, "CIFS: invalid msg type %d\n",
 			msg->msg_iter.type);
-		rc = -EIO;
+		rc = -EINVAL;
 	}
 
 	info->smbd_recv_pending--;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 10/16] CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (8 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 09/16] CIFS: SMBD: Support page offset in RDMA recv Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 11/16] CIFS: SMBD: Support page offset in " Long Li
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

It is not necessary to deregister a memory registration after it has been
successfully invalidated.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smbdirect.c | 82 ++++++++++++++++++++++++++---------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index ba53c52..b470cd0 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2296,50 +2296,50 @@ static void smbd_mr_recovery_work(struct work_struct *work)
 	int rc;
 
 	list_for_each_entry(smbdirect_mr, &info->mr_list, list) {
-		if (smbdirect_mr->state == MR_INVALIDATED ||
-			smbdirect_mr->state == MR_ERROR) {
-
-			if (smbdirect_mr->state == MR_INVALIDATED) {
-				ib_dma_unmap_sg(
-					info->id->device, smbdirect_mr->sgl,
-					smbdirect_mr->sgl_count,
-					smbdirect_mr->dir);
-				smbdirect_mr->state = MR_READY;
-			} else if (smbdirect_mr->state == MR_ERROR) {
-
-				/* recover this MR entry */
-				rc = ib_dereg_mr(smbdirect_mr->mr);
-				if (rc) {
-					log_rdma_mr(ERR,
-						"ib_dereg_mr failed rc=%x\n",
-						rc);
-					smbd_disconnect_rdma_connection(info);
-				}
+		if (smbdirect_mr->state == MR_INVALIDATED)
+			ib_dma_unmap_sg(
+				info->id->device, smbdirect_mr->sgl,
+				smbdirect_mr->sgl_count,
+				smbdirect_mr->dir);
+		else if (smbdirect_mr->state == MR_ERROR) {
+
+			/* recover this MR entry */
+			rc = ib_dereg_mr(smbdirect_mr->mr);
+			if (rc) {
+				log_rdma_mr(ERR,
+					"ib_dereg_mr failed rc=%x\n",
+					rc);
+				smbd_disconnect_rdma_connection(info);
+				continue;
+			}
 
-				smbdirect_mr->mr = ib_alloc_mr(
-					info->pd, info->mr_type,
+			smbdirect_mr->mr = ib_alloc_mr(
+				info->pd, info->mr_type,
+				info->max_frmr_depth);
+			if (IS_ERR(smbdirect_mr->mr)) {
+				log_rdma_mr(ERR,
+					"ib_alloc_mr failed mr_type=%x "
+					"max_frmr_depth=%x\n",
+					info->mr_type,
 					info->max_frmr_depth);
-				if (IS_ERR(smbdirect_mr->mr)) {
-					log_rdma_mr(ERR,
-						"ib_alloc_mr failed mr_type=%x "
-						"max_frmr_depth=%x\n",
-						info->mr_type,
-						info->max_frmr_depth);
-					smbd_disconnect_rdma_connection(info);
-				}
-
-				smbdirect_mr->state = MR_READY;
+				smbd_disconnect_rdma_connection(info);
+				continue;
 			}
-			/* smbdirect_mr->state is updated by this function
-			 * and is read and updated by I/O issuing CPUs trying
-			 * to get a MR, the call to atomic_inc_return
-			 * implicates a memory barrier and guarantees this
-			 * value is updated before waking up any calls to
-			 * get_mr() from the I/O issuing CPUs
-			 */
-			if (atomic_inc_return(&info->mr_ready_count) == 1)
-				wake_up_interruptible(&info->wait_mr);
-		}
+		} else
+			/* This MR is being used, don't recover it */
+			continue;
+
+		smbdirect_mr->state = MR_READY;
+
+		/* smbdirect_mr->state is updated by this function
+		 * and is read and updated by I/O issuing CPUs trying
+		 * to get a MR, the call to atomic_inc_return
+		 * implicates a memory barrier and guarantees this
+		 * value is updated before waking up any calls to
+		 * get_mr() from the I/O issuing CPUs
+		 */
+		if (atomic_inc_return(&info->mr_ready_count) == 1)
+			wake_up_interruptible(&info->wait_mr);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 11/16] CIFS: SMBD: Support page offset in memory registration
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (9 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 10/16] CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 12/16] CIFS: Pass page offset for calculating signature Long Li
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Change code to pass the correct page offset during memory registration for
RDMA read/write.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2pdu.c   | 18 ++++++++++++------
 fs/cifs/smbdirect.c | 29 +++++++++++++++++++++--------
 fs/cifs/smbdirect.h |  2 +-
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index f603fbe..fc30774 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -2623,8 +2623,8 @@ smb2_new_read_req(void **buf, unsigned int *total_len,
 
 		rdata->mr = smbd_register_mr(
 				server->smbd_conn, rdata->pages,
-				rdata->nr_pages, rdata->tailsz,
-				true, need_invalidate);
+				rdata->nr_pages, rdata->page_offset,
+				rdata->tailsz, true, need_invalidate);
 		if (!rdata->mr)
 			return -ENOBUFS;
 
@@ -3013,16 +3013,22 @@ smb2_async_writev(struct cifs_writedata *wdata,
 
 		wdata->mr = smbd_register_mr(
 				server->smbd_conn, wdata->pages,
-				wdata->nr_pages, wdata->tailsz,
-				false, need_invalidate);
+				wdata->nr_pages, wdata->page_offset,
+				wdata->tailsz, false, need_invalidate);
 		if (!wdata->mr) {
 			rc = -ENOBUFS;
 			goto async_writev_out;
 		}
 		req->Length = 0;
 		req->DataOffset = 0;
-		req->RemainingBytes =
-			cpu_to_le32((wdata->nr_pages-1)*PAGE_SIZE + wdata->tailsz);
+		if (wdata->nr_pages > 1)
+			req->RemainingBytes =
+				cpu_to_le32(
+					(wdata->nr_pages - 1) * wdata->pagesz -
+					wdata->page_offset + wdata->tailsz
+				);
+		else
+			req->RemainingBytes = cpu_to_le32(wdata->tailsz);
 		req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE;
 		if (need_invalidate)
 			req->Channel = SMB2_CHANNEL_RDMA_V1;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b470cd0..f61daa9 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2475,7 +2475,7 @@ static struct smbd_mr *get_mr(struct smbd_connection *info)
  */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate)
+	int offset, int tailsz, bool writing, bool need_invalidate)
 {
 	struct smbd_mr *smbdirect_mr;
 	int rc, i;
@@ -2498,17 +2498,30 @@ struct smbd_mr *smbd_register_mr(
 	smbdirect_mr->sgl_count = num_pages;
 	sg_init_table(smbdirect_mr->sgl, num_pages);
 
-	for (i = 0; i < num_pages - 1; i++)
-		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+	log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n",
+			num_pages, offset, tailsz);
+
+	if (num_pages == 1) {
+		sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset);
+		goto skip_multiple_pages;
+	}
 
-	sg_set_page(&smbdirect_mr->sgl[i], pages[i],
-		tailsz ? tailsz : PAGE_SIZE, 0);
+	/* We have at least two pages to register */
+	sg_set_page(
+		&smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset);
+	i = 1;
+	while (i < num_pages - 1) {
+		sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0);
+		i++;
+	}
+	sg_set_page(&smbdirect_mr->sgl[i], pages[i], tailsz, 0);
 
+skip_multiple_pages:
 	dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
 	smbdirect_mr->dir = dir;
 	rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir);
 	if (!rc) {
-		log_rdma_mr(INFO, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
+		log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
 			num_pages, dir, rc);
 		goto dma_map_error;
 	}
@@ -2516,8 +2529,8 @@ struct smbd_mr *smbd_register_mr(
 	rc = ib_map_mr_sg(smbdirect_mr->mr, smbdirect_mr->sgl, num_pages,
 		NULL, PAGE_SIZE);
 	if (rc != num_pages) {
-		log_rdma_mr(INFO,
-			"ib_map_mr_sg failed rc = %x num_pages = %x\n",
+		log_rdma_mr(ERR,
+			"ib_map_mr_sg failed rc = %d num_pages = %x\n",
 			rc, num_pages);
 		goto map_mr_error;
 	}
diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h
index f9038da..1e419c2 100644
--- a/fs/cifs/smbdirect.h
+++ b/fs/cifs/smbdirect.h
@@ -321,7 +321,7 @@ struct smbd_mr {
 /* Interfaces to register and deregister MR for RDMA read/write */
 struct smbd_mr *smbd_register_mr(
 	struct smbd_connection *info, struct page *pages[], int num_pages,
-	int tailsz, bool writing, bool need_invalidate);
+	int offset, int tailsz, bool writing, bool need_invalidate);
 int smbd_deregister_mr(struct smbd_mr *mr);
 
 #else
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 12/16] CIFS: Pass page offset for calculating signature
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (10 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 11/16] CIFS: SMBD: Support page offset in " Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 13/16] CIFS: Pass page offset for encrypting Long Li
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

When calculating signature for the packet, it needs to read into the
correct page offset for the data.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsencrypt.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
index a6ef088..e88303c 100644
--- a/fs/cifs/cifsencrypt.c
+++ b/fs/cifs/cifsencrypt.c
@@ -68,11 +68,12 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
 
 	/* now hash over the rq_pages array */
 	for (i = 0; i < rqst->rq_npages; i++) {
-		void *kaddr = kmap(rqst->rq_pages[i]);
-		size_t len = rqst->rq_pagesz;
+		void *kaddr;
+		unsigned int len, offset;
 
-		if (i == rqst->rq_npages - 1)
-			len = rqst->rq_tailsz;
+		rqst_page_get_length(rqst, i, &len, &offset);
+
+		kaddr = (char *) kmap(rqst->rq_pages[i]) + offset;
 
 		crypto_shash_update(shash, kaddr, len);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 13/16] CIFS: Pass page offset for encrypting
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (11 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 12/16] CIFS: Pass page offset for calculating signature Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 14/16] CIFS: Add support for direct I/O read Long Li
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

Encryption function needs to read data starting page offset from input
buffer.

This doesn't affect decryption path since it allocates its own page
buffers.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/smb2ops.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 1fa1c29..38d19b6 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2189,9 +2189,10 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
 		smb2_sg_set_buf(&sg[i], rqst->rq_iov[i].iov_base,
 						rqst->rq_iov[i].iov_len);
 	for (j = 0; i < sg_len - 1; i++, j++) {
-		unsigned int len = (j < rqst->rq_npages - 1) ? rqst->rq_pagesz
-							: rqst->rq_tailsz;
-		sg_set_page(&sg[i], rqst->rq_pages[j], len, 0);
+		unsigned int len, offset;
+
+		rqst_page_get_length(rqst, j, &len, &offset);
+		sg_set_page(&sg[i], rqst->rq_pages[j], len, offset);
 	}
 	smb2_sg_set_buf(&sg[sg_len - 1], sign, SMB2_SIGNATURE_SIZE);
 	return sg;
@@ -2332,6 +2333,7 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
 		return rc;
 
 	new_rq->rq_pages = pages;
+	new_rq->rq_offset = old_rq->rq_offset;
 	new_rq->rq_npages = old_rq->rq_npages;
 	new_rq->rq_pagesz = old_rq->rq_pagesz;
 	new_rq->rq_tailsz = old_rq->rq_tailsz;
@@ -2363,10 +2365,14 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, struct smb_rqst *new_rq,
 
 	/* copy pages form the old */
 	for (i = 0; i < npages; i++) {
-		char *dst = kmap(new_rq->rq_pages[i]);
-		char *src = kmap(old_rq->rq_pages[i]);
-		unsigned int len = (i < npages - 1) ? new_rq->rq_pagesz :
-							new_rq->rq_tailsz;
+		char *dst, *src;
+		unsigned int offset, len;
+
+		rqst_page_get_length(new_rq, i, &len, &offset);
+
+		dst = (char *) kmap(new_rq->rq_pages[i]) + offset;
+		src = (char *) kmap(old_rq->rq_pages[i]) + offset;
+
 		memcpy(dst, src, len);
 		kunmap(new_rq->rq_pages[i]);
 		kunmap(old_rq->rq_pages[i]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 14/16] CIFS: Add support for direct I/O read
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (12 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 13/16] CIFS: Pass page offset for encrypting Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 15/16] CIFS: Add support for direct I/O write Long Li
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With direct I/O read, we transfer the data directly from transport layer to
the user data buffer.

Change in v3: added support for kernel AIO

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.h   |   1 +
 fs/cifs/cifsglob.h |   5 ++
 fs/cifs/file.c     | 209 +++++++++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 186 insertions(+), 29 deletions(-)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 5f02318..7fba9aa 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -102,6 +102,7 @@ extern int cifs_open(struct inode *inode, struct file *file);
 extern int cifs_close(struct inode *inode, struct file *file);
 extern int cifs_closedir(struct inode *inode, struct file *file);
 extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
+extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 7f62c98..52248dd 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1146,6 +1146,11 @@ struct cifs_aio_ctx {
 	unsigned int		len;
 	unsigned int		total_len;
 	bool			should_dirty;
+	/*
+	 * Indicates if this aio_ctx is for direct_io,
+	 * If yes, iter is a copy of the user passed iov_iter
+	 */
+	bool			direct_io;
 };
 
 struct cifs_readdata;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 87eece6..476b2a1 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2965,7 +2965,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
 	kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release);
 	for (i = 0; i < rdata->nr_pages; i++) {
 		put_page(rdata->pages[i]);
-		rdata->pages[i] = NULL;
 	}
 	cifs_readdata_release(refcount);
 }
@@ -3004,7 +3003,7 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter)
 	return remaining ? -EFAULT : 0;
 }
 
-static void collect_uncached_read_data(struct cifs_aio_ctx *ctx);
+static void collect_uncached_read_data(struct cifs_readdata *rdata, struct cifs_aio_ctx *ctx);
 
 static void
 cifs_uncached_readv_complete(struct work_struct *work)
@@ -3013,7 +3012,7 @@ cifs_uncached_readv_complete(struct work_struct *work)
 						struct cifs_readdata, work);
 
 	complete(&rdata->done);
-	collect_uncached_read_data(rdata->ctx);
+	collect_uncached_read_data(rdata, rdata->ctx);
 	/* the below call can possibly free the last ref to aio ctx */
 	kref_put(&rdata->refcount, cifs_uncached_readdata_release);
 }
@@ -3103,6 +3102,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 	int rc;
 	pid_t pid;
 	struct TCP_Server_Info *server;
+	struct page **pagevec;
+	size_t start;
+	struct iov_iter direct_iov = ctx->iter;
 
 	server = tlink_tcon(open_file->tlink)->ses->server;
 
@@ -3111,6 +3113,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 	else
 		pid = current->tgid;
 
+	if (ctx->direct_io)
+		iov_iter_advance(&direct_iov, offset - ctx->pos);
+
 	do {
 		rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
 						   &rsize, &credits);
@@ -3118,20 +3123,56 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 			break;
 
 		cur_len = min_t(const size_t, len, rsize);
-		npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
 
-		/* allocate a readdata struct */
-		rdata = cifs_readdata_alloc(npages,
+		if (ctx->direct_io) {
+
+			cur_len = iov_iter_get_pages_alloc(
+					&direct_iov, &pagevec,
+					cur_len, &start);
+			if (cur_len < 0) {
+				cifs_dbg(VFS,
+					"couldn't get user pages (cur_len=%zd)"
+					" iter type %d"
+					" iov_offset %lu count %lu\n",
+					cur_len, direct_iov.type, direct_iov.iov_offset,
+					direct_iov.count);
+				dump_stack();
+				break;
+			}
+			iov_iter_advance(&direct_iov, cur_len);
+
+			rdata = cifs_readdata_direct_alloc(
+					pagevec, cifs_uncached_readv_complete);
+			if (!rdata) {
+				add_credits_and_wake_if(server, credits, 0);
+				rc = -ENOMEM;
+				break;
+			}
+
+			npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE;
+			rdata->page_offset = start;
+			rdata->tailsz = npages > 1 ?
+				cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE :
+				cur_len;
+
+		} else {
+
+			npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);
+			/* allocate a readdata struct */
+			rdata = cifs_readdata_alloc(npages,
 					    cifs_uncached_readv_complete);
-		if (!rdata) {
-			add_credits_and_wake_if(server, credits, 0);
-			rc = -ENOMEM;
-			break;
-		}
+			if (!rdata) {
+				add_credits_and_wake_if(server, credits, 0);
+				rc = -ENOMEM;
+				break;
+			}
 
-		rc = cifs_read_allocate_pages(rdata, npages);
-		if (rc)
-			goto error;
+			rc = cifs_read_allocate_pages(rdata, npages);
+			if (rc)
+				goto error;
+
+			rdata->tailsz = PAGE_SIZE;
+		}
 
 		rdata->cfile = cifsFileInfo_get(open_file);
 		rdata->nr_pages = npages;
@@ -3139,7 +3180,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		rdata->bytes = cur_len;
 		rdata->pid = pid;
 		rdata->pagesz = PAGE_SIZE;
-		rdata->tailsz = PAGE_SIZE;
 		rdata->read_into_pages = cifs_uncached_read_into_pages;
 		rdata->copy_into_pages = cifs_uncached_copy_into_pages;
 		rdata->credits = credits;
@@ -3153,13 +3193,17 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		if (rc) {
 			add_credits_and_wake_if(server, rdata->credits, 0);
 			kref_put(&rdata->refcount,
-				 cifs_uncached_readdata_release);
-			if (rc == -EAGAIN)
+				cifs_uncached_readdata_release);
+			if (rc == -EAGAIN) {
+				iov_iter_revert(&direct_iov, cur_len);
 				continue;
+			}
 			break;
 		}
 
-		list_add_tail(&rdata->list, rdata_list);
+		/* Add to aio pending list if it's not there */
+		if (rdata_list)
+			list_add_tail(&rdata->list, rdata_list);
 		offset += cur_len;
 		len -= cur_len;
 	} while (len > 0);
@@ -3168,7 +3212,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 }
 
 static void
-collect_uncached_read_data(struct cifs_aio_ctx *ctx)
+collect_uncached_read_data(struct cifs_readdata *uncached_rdata, struct cifs_aio_ctx *ctx)
 {
 	struct cifs_readdata *rdata, *tmp;
 	struct iov_iter *to = &ctx->iter;
@@ -3211,10 +3255,12 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 				 * reading.
 				 */
 				if (got_bytes && got_bytes < rdata->bytes) {
-					rc = cifs_readdata_to_iov(rdata, to);
+					rc = 0;
+					if (!ctx->direct_io)
+						rc = cifs_readdata_to_iov(rdata, to);
 					if (rc) {
 						kref_put(&rdata->refcount,
-						cifs_uncached_readdata_release);
+							cifs_uncached_readdata_release);
 						continue;
 					}
 				}
@@ -3228,28 +3274,32 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 				list_splice(&tmp_list, &ctx->list);
 
 				kref_put(&rdata->refcount,
-					 cifs_uncached_readdata_release);
+					cifs_uncached_readdata_release);
 				goto again;
 			} else if (rdata->result)
 				rc = rdata->result;
-			else
+			else if (!ctx->direct_io)
 				rc = cifs_readdata_to_iov(rdata, to);
 
 			/* if there was a short read -- discard anything left */
 			if (rdata->got_bytes && rdata->got_bytes < rdata->bytes)
 				rc = -ENODATA;
+
+			ctx->total_len += rdata->got_bytes;
 		}
 		list_del_init(&rdata->list);
 		kref_put(&rdata->refcount, cifs_uncached_readdata_release);
 	}
 
-	for (i = 0; i < ctx->npages; i++) {
-		if (ctx->should_dirty)
-			set_page_dirty(ctx->bv[i].bv_page);
-		put_page(ctx->bv[i].bv_page);
-	}
+	if (!ctx->direct_io) {
+		for (i = 0; i < ctx->npages; i++) {
+			if (ctx->should_dirty)
+				set_page_dirty(ctx->bv[i].bv_page);
+			put_page(ctx->bv[i].bv_page);
+		}
 
-	ctx->total_len = ctx->len - iov_iter_count(to);
+		ctx->total_len = ctx->len - iov_iter_count(to);
+	}
 
 	cifs_stats_bytes_read(tcon, ctx->total_len);
 
@@ -3267,6 +3317,107 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+	size_t len;
+	struct file *file;
+	struct cifs_sb_info *cifs_sb;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	ssize_t rc, total_read = 0;
+	struct TCP_Server_Info *server;
+	loff_t offset = iocb->ki_pos;
+	pid_t pid;
+	struct cifs_aio_ctx *ctx;
+
+	/*
+	 * iov_iter_get_pages_alloc() doesn't work with ITER_KVEC,
+	 * fall back to data copy read path
+	 */
+	if (to->type & ITER_KVEC) {
+		cifs_dbg(FYI, "use non-direct cifs_user_readv for kvec I/O\n");
+		return cifs_user_readv(iocb, to);
+	}
+
+	len = iov_iter_count(to);
+	if (!len)
+		return 0;
+
+	file = iocb->ki_filp;
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_readv)
+		return -ENOSYS;
+
+	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
+		pid = cfile->pid;
+	else
+		pid = current->tgid;
+
+	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+		cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+	ctx = cifs_aio_ctx_alloc();
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->cfile = cifsFileInfo_get(cfile);
+
+	if (!is_sync_kiocb(iocb))
+		ctx->iocb = iocb;
+
+	if (to->type == ITER_IOVEC)
+		ctx->should_dirty = true;
+
+	ctx->pos = offset;
+	ctx->direct_io = true;
+	ctx->iter = *to;
+	ctx->len = len;
+
+	/* grab a lock here due to read response handlers can access ctx */
+	mutex_lock(&ctx->aio_mutex);
+
+	rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx);
+
+	/* if at least one read request send succeeded, then reset rc */
+	if (!list_empty(&ctx->list))
+		rc = 0;
+
+	mutex_unlock(&ctx->aio_mutex);
+
+	if (rc) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return rc;
+	}
+
+	if (!is_sync_kiocb(iocb)) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return -EIOCBQUEUED;
+	}
+
+	rc = wait_for_completion_killable(&ctx->done);
+	if (rc) {
+		mutex_lock(&ctx->aio_mutex);
+		ctx->rc = rc = -EINTR;
+		total_read = ctx->total_len;
+		mutex_unlock(&ctx->aio_mutex);
+	} else {
+		rc = ctx->rc;
+		total_read = ctx->total_len;
+	}
+
+	kref_put(&ctx->refcount, cifs_aio_ctx_release);
+
+	if (total_read) {
+		iocb->ki_pos += total_read;
+		return total_read;
+	}
+	return rc;
+}
+
 ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 15/16] CIFS: Add support for direct I/O write
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (13 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 14/16] CIFS: Add support for direct I/O read Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-08  2:13 ` [Patch v3 16/16] CIFS: Add direct I/O functions to file_operations Long Li
  2018-09-15  9:28 ` [Patch v3 00/16] CIFS: add support for direct I/O Steve French
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With direct I/O write, user supplied buffers are pinned to the memory and data
are transferred directly from user buffers to the transport layer.

Change in v3: added support for kernel AIO

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.h |   1 +
 fs/cifs/file.c   | 195 ++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 165 insertions(+), 31 deletions(-)

diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 7fba9aa..e9c5103 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -105,6 +105,7 @@ extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
+extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 476b2a1..76e0266 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2537,6 +2537,8 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 	loff_t saved_offset = offset;
 	pid_t pid;
 	struct TCP_Server_Info *server;
+	struct page **pagevec;
+	size_t start;
 
 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
 		pid = open_file->pid;
@@ -2553,38 +2555,74 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 		if (rc)
 			break;
 
-		nr_pages = get_numpages(wsize, len, &cur_len);
-		wdata = cifs_writedata_alloc(nr_pages,
+		if (ctx->direct_io) {
+			cur_len = iov_iter_get_pages_alloc(
+				from, &pagevec, wsize, &start);
+			if (cur_len < 0) {
+				cifs_dbg(VFS,
+					"direct_writev couldn't get user pages "
+					"(rc=%zd) iter type %d iov_offset %lu count"
+					" %lu\n",
+					cur_len, from->type,
+					from->iov_offset, from->count);
+				dump_stack();
+				break;
+			}
+			iov_iter_advance(from, cur_len);
+
+			nr_pages = (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
+
+			wdata = cifs_writedata_direct_alloc(pagevec,
 					     cifs_uncached_writev_complete);
-		if (!wdata) {
-			rc = -ENOMEM;
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
+			if (!wdata) {
+				rc = -ENOMEM;
+				add_credits_and_wake_if(server, credits, 0);
+				break;
+			}
 
-		rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
-		if (rc) {
-			kfree(wdata);
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
 
-		num_pages = nr_pages;
-		rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
-		if (rc) {
-			for (i = 0; i < nr_pages; i++)
-				put_page(wdata->pages[i]);
-			kfree(wdata);
-			add_credits_and_wake_if(server, credits, 0);
-			break;
-		}
+			wdata->page_offset = start;
+			wdata->tailsz =
+				nr_pages > 1 ?
+					cur_len - (PAGE_SIZE - start) -
+					(nr_pages - 2) * PAGE_SIZE :
+					cur_len;
+		} else {
+			nr_pages = get_numpages(wsize, len, &cur_len);
+			wdata = cifs_writedata_alloc(nr_pages,
+					     cifs_uncached_writev_complete);
+			if (!wdata) {
+				rc = -ENOMEM;
+				add_credits_and_wake_if(server, credits, 0);
+				break;
+			}
 
-		/*
-		 * Bring nr_pages down to the number of pages we actually used,
-		 * and free any pages that we didn't use.
-		 */
-		for ( ; nr_pages > num_pages; nr_pages--)
-			put_page(wdata->pages[nr_pages - 1]);
+			rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
+			if (rc) {
+				kfree(wdata);
+				add_credits_and_wake_if(server, credits, 0);
+				break;
+			}
+
+			num_pages = nr_pages;
+			rc = wdata_fill_from_iovec(wdata, from, &cur_len, &num_pages);
+			if (rc) {
+				for (i = 0; i < nr_pages; i++)
+					put_page(wdata->pages[i]);
+				kfree(wdata);
+				add_credits_and_wake_if(server, credits, 0);
+				break;
+			}
+
+			/*
+			 * Bring nr_pages down to the number of pages we actually used,
+			 * and free any pages that we didn't use.
+			 */
+			for ( ; nr_pages > num_pages; nr_pages--)
+				put_page(wdata->pages[nr_pages - 1]);
+
+			wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
+		}
 
 		wdata->sync_mode = WB_SYNC_ALL;
 		wdata->nr_pages = nr_pages;
@@ -2593,7 +2631,6 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 		wdata->pid = pid;
 		wdata->bytes = cur_len;
 		wdata->pagesz = PAGE_SIZE;
-		wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
 		wdata->credits = credits;
 		wdata->ctx = ctx;
 		kref_get(&ctx->refcount);
@@ -2687,8 +2724,9 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
 		kref_put(&wdata->refcount, cifs_uncached_writedata_release);
 	}
 
-	for (i = 0; i < ctx->npages; i++)
-		put_page(ctx->bv[i].bv_page);
+	if (!ctx->direct_io)
+		for (i = 0; i < ctx->npages; i++)
+			put_page(ctx->bv[i].bv_page);
 
 	cifs_stats_bytes_written(tcon, ctx->total_len);
 	set_bit(CIFS_INO_INVALID_MAPPING, &CIFS_I(dentry->d_inode)->flags);
@@ -2703,6 +2741,101 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx)
 		complete(&ctx->done);
 }
 
+ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	ssize_t total_written = 0;
+	struct cifsFileInfo *cfile;
+	struct cifs_tcon *tcon;
+	struct cifs_sb_info *cifs_sb;
+	struct TCP_Server_Info *server;
+	size_t len = iov_iter_count(from);
+	int rc;
+	struct cifs_aio_ctx *ctx;
+
+	/*
+	 * iov_iter_get_pages_alloc doesn't work with ITER_KVEC.
+	 * In this case, fall back to non-direct write function.
+	 */
+	if (from->type & ITER_KVEC) {
+		cifs_dbg(FYI, "use non-direct cifs_user_writev for kvec I/O\n");
+		return cifs_user_writev(iocb, from);
+	}
+
+	rc = generic_write_checks(iocb, from);
+	if (rc <= 0)
+		return rc;
+
+	cifs_sb = CIFS_FILE_SB(file);
+	cfile = file->private_data;
+	tcon = tlink_tcon(cfile->tlink);
+	server = tcon->ses->server;
+
+	if (!server->ops->async_writev)
+		return -ENOSYS;
+
+	ctx = cifs_aio_ctx_alloc();
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->cfile = cifsFileInfo_get(cfile);
+
+	if (!is_sync_kiocb(iocb))
+		ctx->iocb = iocb;
+
+	ctx->pos = iocb->ki_pos;
+
+	ctx->direct_io = true;
+	ctx->iter = *from;
+	ctx->len = len;
+
+	/* grab a lock here due to read response handlers can access ctx */
+	mutex_lock(&ctx->aio_mutex);
+
+	rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, from,
+				  cfile, cifs_sb, &ctx->list, ctx);
+
+	/*
+	 * If at least one write was successfully sent, then discard any rc
+	 * value from the later writes. If the other write succeeds, then
+	 * we'll end up returning whatever was written. If it fails, then
+	 * we'll get a new rc value from that.
+	 */
+	if (!list_empty(&ctx->list))
+		rc = 0;
+
+	mutex_unlock(&ctx->aio_mutex);
+
+	if (rc) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return rc;
+	}
+
+	if (!is_sync_kiocb(iocb)) {
+		kref_put(&ctx->refcount, cifs_aio_ctx_release);
+		return -EIOCBQUEUED;
+	}
+
+	rc = wait_for_completion_killable(&ctx->done);
+	if (rc) {
+		mutex_lock(&ctx->aio_mutex);
+		ctx->rc = rc = -EINTR;
+		total_written = ctx->total_len;
+		mutex_unlock(&ctx->aio_mutex);
+	} else {
+		rc = ctx->rc;
+		total_written = ctx->total_len;
+	}
+
+	kref_put(&ctx->refcount, cifs_aio_ctx_release);
+
+	if (unlikely(!total_written))
+		return rc;
+
+	iocb->ki_pos += total_written;
+	return total_written;
+}
+
 ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Patch v3 16/16] CIFS: Add direct I/O functions to file_operations
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (14 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 15/16] CIFS: Add support for direct I/O write Long Li
@ 2018-09-08  2:13 ` Long Li
  2018-09-15  9:28 ` [Patch v3 00/16] CIFS: add support for direct I/O Steve French
  16 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-08  2:13 UTC (permalink / raw)
  To: Steve French, linux-cifs, samba-technical, linux-kernel, linux-rdma
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With direct read/write functions implemented, add them to file_operations.

Dircet I/O is used under two conditions:
1. When mounting with "cache=none", CIFS uses direct I/O for all user file
data transfer.
2. When opening a file with O_DIRECT, CIFS uses direct I/O for all data
transfer on this file.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsfs.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 62f1662..f18091b 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1113,9 +1113,8 @@ const struct file_operations cifs_file_strict_ops = {
 };
 
 const struct file_operations cifs_file_direct_ops = {
-	/* BB reevaluate whether they can be done with directio, no cache */
-	.read_iter = cifs_user_readv,
-	.write_iter = cifs_user_writev,
+	.read_iter = cifs_direct_readv,
+	.write_iter = cifs_direct_writev,
 	.open = cifs_open,
 	.release = cifs_close,
 	.lock = cifs_lock,
@@ -1169,9 +1168,8 @@ const struct file_operations cifs_file_strict_nobrl_ops = {
 };
 
 const struct file_operations cifs_file_direct_nobrl_ops = {
-	/* BB reevaluate whether they can be done with directio, no cache */
-	.read_iter = cifs_user_readv,
-	.write_iter = cifs_user_writev,
+	.read_iter = cifs_direct_readv,
+	.write_iter = cifs_direct_writev,
 	.open = cifs_open,
 	.release = cifs_close,
 	.fsync = cifs_fsync,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Patch v3 00/16] CIFS: add support for direct I/O
  2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
                   ` (15 preceding siblings ...)
  2018-09-08  2:13 ` [Patch v3 16/16] CIFS: Add direct I/O functions to file_operations Long Li
@ 2018-09-15  9:28 ` Steve French
  2018-09-15 20:57   ` Long Li
  16 siblings, 1 reply; 19+ messages in thread
From: Steve French @ 2018-09-15  9:28 UTC (permalink / raw)
  To: Long Li; +Cc: Steve French, CIFS, samba-technical, LKML, linux-rdma

could you rebase these, patch 1 was merged quite a while ago, and
patch 2 etc. doesn't apply cleanly
On Fri, Sep 7, 2018 at 9:18 PM Long Li <longli@linuxonhyperv.com> wrote:
>
> From: Long Li <longli@microsoft.com>
>
> This patch set implements direct I/O.
>
> In normal code path (even with cache=none), CIFS copies I/O data from
> user-space to kernel-space for security reasons of possible protocol
> required signing and encryption on user data.
>
> With this patch set, CIFS passes the I/O data directly from user-space
> buffer to the transport layer, when file system is mounted with
> "cache-none".
>
> Patch v2 addressed comments from Christoph Hellwig <hch@lst.de> and
> Tom Talpey <ttalpey@microsoft.com> to implement direct I/O for both
> socket and RDMA.
>
> Patch v3 added support for kernel AIO.
>
>
> Long Li (16):
>   CIFS: Add support for direct pages in rdata
>   CIFS: Use offset when reading pages
>   CIFS: Add support for direct pages in wdata
>   CIFS: pass page offset when issuing SMB write
>   CIFS: Calculate the correct request length based on page offset and
>     tail size
>   CIFS: Introduce helper function to get page offset and length in
>     smb_rqst
>   CIFS: When sending data on socket, pass the correct page offset
>   CIFS: SMBD: Support page offset in RDMA send
>   CIFS: SMBD: Support page offset in RDMA recv
>   CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration
>   CIFS: SMBD: Support page offset in memory registration
>   CIFS: Pass page offset for calculating signature
>   CIFS: Pass page offset for encrypting
>   CIFS: Add support for direct I/O read
>   CIFS: Add support for direct I/O write
>   CIFS: Add direct I/O functions to file_operations
>
>  fs/cifs/cifsencrypt.c |   9 +-
>  fs/cifs/cifsfs.c      |  10 +-
>  fs/cifs/cifsfs.h      |   2 +
>  fs/cifs/cifsglob.h    |  11 +-
>  fs/cifs/cifsproto.h   |   9 +-
>  fs/cifs/cifssmb.c     |  19 +-
>  fs/cifs/connect.c     |   5 +-
>  fs/cifs/file.c        | 477 ++++++++++++++++++++++++++++++++++++++++++--------
>  fs/cifs/misc.c        |  17 ++
>  fs/cifs/smb2ops.c     |  22 ++-
>  fs/cifs/smb2pdu.c     |  20 ++-
>  fs/cifs/smbdirect.c   | 156 ++++++++++-------
>  fs/cifs/smbdirect.h   |   2 +-
>  fs/cifs/transport.c   |  34 ++--
>  14 files changed, 606 insertions(+), 187 deletions(-)
>
> --
> 2.7.4
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Patch v3 00/16] CIFS: add support for direct I/O
  2018-09-15  9:28 ` [Patch v3 00/16] CIFS: add support for direct I/O Steve French
@ 2018-09-15 20:57   ` Long Li
  0 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2018-09-15 20:57 UTC (permalink / raw)
  To: Steve French; +Cc: Steve French, CIFS, samba-technical, LKML, linux-rdma

> From: Steve French <smfrench@gmail.com>
> Sent: Saturday, September 15, 2018 2:28 AM
> To: Long Li <longli@microsoft.com>
> Cc: Steve French <sfrench@samba.org>; CIFS <linux-cifs@vger.kernel.org>;
> samba-technical <samba-technical@lists.samba.org>; LKML <linux-
> kernel@vger.kernel.org>; linux-rdma@vger.kernel.org
> Subject: Re: [Patch v3 00/16] CIFS: add support for direct I/O
> 
> could you rebase these, patch 1 was merged quite a while ago, and patch 2
> etc. doesn't apply cleanly 

Sorry, I will rebase and resend.


On Fri, Sep 7, 2018 at 9:18 PM Long Li
> <longli@linuxonhyperv.com> wrote:
> >
> > From: Long Li <longli@microsoft.com>
> >
> > This patch set implements direct I/O.
> >
> > In normal code path (even with cache=none), CIFS copies I/O data from
> > user-space to kernel-space for security reasons of possible protocol
> > required signing and encryption on user data.
> >
> > With this patch set, CIFS passes the I/O data directly from user-space
> > buffer to the transport layer, when file system is mounted with
> > "cache-none".
> >
> > Patch v2 addressed comments from Christoph Hellwig <hch@lst.de> and
> > Tom Talpey <ttalpey@microsoft.com> to implement direct I/O for both
> > socket and RDMA.
> >
> > Patch v3 added support for kernel AIO.
> >
> >
> > Long Li (16):
> >   CIFS: Add support for direct pages in rdata
> >   CIFS: Use offset when reading pages
> >   CIFS: Add support for direct pages in wdata
> >   CIFS: pass page offset when issuing SMB write
> >   CIFS: Calculate the correct request length based on page offset and
> >     tail size
> >   CIFS: Introduce helper function to get page offset and length in
> >     smb_rqst
> >   CIFS: When sending data on socket, pass the correct page offset
> >   CIFS: SMBD: Support page offset in RDMA send
> >   CIFS: SMBD: Support page offset in RDMA recv
> >   CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration
> >   CIFS: SMBD: Support page offset in memory registration
> >   CIFS: Pass page offset for calculating signature
> >   CIFS: Pass page offset for encrypting
> >   CIFS: Add support for direct I/O read
> >   CIFS: Add support for direct I/O write
> >   CIFS: Add direct I/O functions to file_operations
> >
> >  fs/cifs/cifsencrypt.c |   9 +-
> >  fs/cifs/cifsfs.c      |  10 +-
> >  fs/cifs/cifsfs.h      |   2 +
> >  fs/cifs/cifsglob.h    |  11 +-
> >  fs/cifs/cifsproto.h   |   9 +-
> >  fs/cifs/cifssmb.c     |  19 +-
> >  fs/cifs/connect.c     |   5 +-
> >  fs/cifs/file.c        | 477
> ++++++++++++++++++++++++++++++++++++++++++--------
> >  fs/cifs/misc.c        |  17 ++
> >  fs/cifs/smb2ops.c     |  22 ++-
> >  fs/cifs/smb2pdu.c     |  20 ++-
> >  fs/cifs/smbdirect.c   | 156 ++++++++++-------
> >  fs/cifs/smbdirect.h   |   2 +-
> >  fs/cifs/transport.c   |  34 ++--
> >  14 files changed, 606 insertions(+), 187 deletions(-)
> >
> > --
> > 2.7.4
> >
> 
> 
> --
> Thanks,
> 
> Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-09-15 20:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-08  2:13 [Patch v3 00/16] CIFS: add support for direct I/O Long Li
2018-09-08  2:13 ` [Patch v3 01/16] CIFS: Add support for direct pages in rdata Long Li
2018-09-08  2:13 ` [Patch v3 02/16] CIFS: Use offset when reading pages Long Li
2018-09-08  2:13 ` [Patch v3 03/16] CIFS: Add support for direct pages in wdata Long Li
2018-09-08  2:13 ` [Patch v3 04/16] CIFS: pass page offset when issuing SMB write Long Li
2018-09-08  2:13 ` [Patch v3 05/16] CIFS: Calculate the correct request length based on page offset and tail size Long Li
2018-09-08  2:13 ` [Patch v3 06/16] CIFS: Introduce helper function to get page offset and length in smb_rqst Long Li
2018-09-08  2:13 ` [Patch v3 07/16] CIFS: When sending data on socket, pass the correct page offset Long Li
2018-09-08  2:13 ` [Patch v3 08/16] CIFS: SMBD: Support page offset in RDMA send Long Li
2018-09-08  2:13 ` [Patch v3 09/16] CIFS: SMBD: Support page offset in RDMA recv Long Li
2018-09-08  2:13 ` [Patch v3 10/16] CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration Long Li
2018-09-08  2:13 ` [Patch v3 11/16] CIFS: SMBD: Support page offset in " Long Li
2018-09-08  2:13 ` [Patch v3 12/16] CIFS: Pass page offset for calculating signature Long Li
2018-09-08  2:13 ` [Patch v3 13/16] CIFS: Pass page offset for encrypting Long Li
2018-09-08  2:13 ` [Patch v3 14/16] CIFS: Add support for direct I/O read Long Li
2018-09-08  2:13 ` [Patch v3 15/16] CIFS: Add support for direct I/O write Long Li
2018-09-08  2:13 ` [Patch v3 16/16] CIFS: Add direct I/O functions to file_operations Long Li
2018-09-15  9:28 ` [Patch v3 00/16] CIFS: add support for direct I/O Steve French
2018-09-15 20:57   ` Long Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.