All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] cifs: asynchronous writepages support (try #2)
@ 2011-04-13 11:43 Jeff Layton
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

This is the second version of this set. There are a number of changes
from the first one:

- it now respects the max_pending limit on writes.

- the cifs_writedata struct is now variable-sized so it can handle
  arrays of pages of more or less any size. This allows us to
  increase the wsize up to much larger values.

- and there's a patch that cleans up the wsize negotiation. The
  new default with this code will be 124k writes, but it can be
  set much larger if the server supports large POSIX writes

This patchset changes the CIFS code to do asynchronous writes in
writepages(). It uses the asynchronous call infrastructure that was
added to handle SMB echoes. The idea here is to have the kernel
issue a write request and then have it handle the reply asynchronously.

For now, this just changes the writepages codepath. Once the patchset
has had a bit more refinement and testing, I'll see about changing some
of the other codepaths (writepage(), for instance).

I'm not 100% thrilled with this approach overall -- I think we do need
to handle writes asynchronously, but the fact that we're using our own
writepages routine kind of hamstrings this code.

Another possible approach here would be to move to more page-based I/O
like NFS has. Have writepage() set up the pages for writeback and
coalesce them together as it goes, and then issue the write all at
once. That would also allow us to handle larger write sizes than 56k.

Obviously, that's a much larger project to cobble together however. Much
of this code would still be applicable if we did decide to go that route
eventually.

I'm targeting this patchset for 2.6.40. I've tested this and it seems
to work, but have not yet done any significant perf testing. Pavel
tested an earlier version of this set and said he saw about a 20%
increase in sequential write performance. Not sure if that will hold
now that we're respecting the max_pending value.

Review and test feedback would be welcome.

Jeff Layton (7):
  cifs: consolidate SendReceive response checks
  cifs: make cifs_send_async take a kvec array
  cifs: don't call mid_q_entry->callback under the Global_MidLock
  cifs: add ignore_pend flag to cifs_call_async
  cifs: add cifs_async_writev
  cifs: convert cifs_writepages to use async writes
  cifs: clean up wsize negotiation and allow for larger wsize

 fs/cifs/cifsglob.h      |    7 +-
 fs/cifs/cifsproto.h     |   30 ++++-
 fs/cifs/cifssmb.c       |  218 ++++++++++++++++++++++++++++++++++-
 fs/cifs/connect.c       |   94 +++++++++++-----
 fs/cifs/file.c          |  292 ++++++++++++++++++++++++----------------------
 fs/cifs/netmisc.c       |    2 +-
 fs/cifs/smb2transport.c |   14 +--
 fs/cifs/transport.c     |  197 ++++++++++----------------------
 8 files changed, 528 insertions(+), 326 deletions(-)

-- 
1.7.4.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] cifs: consolidate SendReceive response checks
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-13 11:43   ` Jeff Layton
  2011-04-13 11:43   ` [PATCH 2/7] cifs: make cifs_send_async take a kvec array Jeff Layton
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Further consolidate the SendReceive code by moving the checks run over
the packet into a separate function that all the SendReceive variants
can call.

We can also eliminate the check for a receive_len that's too big or too
small. cifs_demultiplex_thread already checks that and disconnects the
socket if that occurs, while setting the midStatus to MALFORMED. It'll
never call this code if that's the case.

Finally do a little cleanup. Use "goto out" on errors so that the flow
of code in the normal case is more evident. Also switch the logErr
variable in map_smb_to_linux_error to a bool.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |    4 +-
 fs/cifs/netmisc.c   |    2 +-
 fs/cifs/transport.c |  158 ++++++++++++++-------------------------------------
 3 files changed, 46 insertions(+), 118 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 76c4dc7f..4af7db8 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -74,6 +74,8 @@ extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
 			int * /* bytes returned */ , const int long_op);
 extern int SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
 			struct smb_hdr *in_buf, int flags);
+extern int cifs_check_receive(struct mid_q_entry *mid,
+			struct TCP_Server_Info *server, bool log_error);
 extern int SendReceive2(const unsigned int /* xid */ , struct cifs_ses *,
 			struct kvec *, int /* nvec to send */,
 			int * /* type of buf returned */ , const int flags);
@@ -97,7 +99,7 @@ extern int cifs_convert_address(struct sockaddr *dst, const char *src, int len);
 extern int cifs_set_port(struct sockaddr *addr, const unsigned short int port);
 extern int cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
 				const unsigned short int port);
-extern int map_smb_to_linux_error(struct smb_hdr *smb, int logErr);
+extern int map_smb_to_linux_error(struct smb_hdr *smb, bool logErr);
 extern void header_assemble(struct smb_hdr *, char /* command */ ,
 			    const struct cifs_tcon *, int /* length of
 			    fixed section (word count) in two byte units */);
diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
index 79b71c2..73e47e8 100644
--- a/fs/cifs/netmisc.c
+++ b/fs/cifs/netmisc.c
@@ -836,7 +836,7 @@ ntstatus_to_dos(__u32 ntstatus, __u8 *eclass, __u16 *ecode)
 }
 
 int
-map_smb_to_linux_error(struct smb_hdr *smb, int logErr)
+map_smb_to_linux_error(struct smb_hdr *smb, bool logErr)
 {
 	unsigned int i;
 	int rc = -EIO;	/* if transport error smb error may not be set */
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 85063de..d9833fc 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -505,13 +505,32 @@ send_nt_cancel(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 }
 
 int
+cifs_check_receive(struct mid_q_entry *mid, struct TCP_Server_Info *server,
+		   bool log_error)
+{
+	dump_smb(mid->resp_buf,
+		 min_t(u32, 92, be32_to_cpu(mid->resp_buf->smb_buf_length)));
+
+	/* convert the length into a more usable form */
+	if (server->sec_mode & (SECMODE_SIGN_REQUIRED |
+				SECMODE_SIGN_ENABLED)) {
+		/* FIXME: add code to kill session */
+		if (cifs_verify_signature(mid->resp_buf, server,
+					  mid->sequence_number + 1) != 0)
+			cERROR(1, "Unexpected SMB signature");
+	}
+
+	/* BB special case reconnect tid and uid here? */
+	return map_smb_to_linux_error(mid->resp_buf, log_error);
+}
+
+int
 SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 	     struct kvec *iov, int n_vec, int *pRespBufType /* ret */,
 	     const int flags)
 {
 	int rc = 0;
 	int long_op;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 	struct smb_hdr *in_buf = iov[0].iov_base;
 
@@ -604,54 +623,24 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 		return rc;
 	}
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
+	if (!midQ->resp_buf || midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
+		cFYI(1, "Bad MID state?");
 		goto out;
 	}
 
-	/* rcvd frame is ok */
-
-	if (midQ->resp_buf &&
-	    (midQ->midState == MID_RESPONSE_RECEIVED)) {
-
-		iov[0].iov_base = (char *)midQ->resp_buf;
-		if (midQ->largeBuf)
-			*pRespBufType = CIFS_LARGE_BUFFER;
-		else
-			*pRespBufType = CIFS_SMALL_BUFFER;
-		iov[0].iov_len = receive_len + 4;
-
-		dump_smb(midQ->resp_buf, 80);
-		/* convert the length into a more usable form */
-		if ((receive_len > 24) &&
-		    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-					     SECMODE_SIGN_ENABLED))) {
-			rc = cifs_verify_signature(midQ->resp_buf,
-						ses->server,
-						midQ->sequence_number+1);
-			if (rc) {
-				cERROR(1, "Unexpected SMB signature");
-				/* BB FIXME add code to kill session */
-			}
-		}
-
-		/* BB special case reconnect tid and uid here? */
-		rc = map_smb_to_linux_error(midQ->resp_buf,
-					    flags & CIFS_LOG_ERROR);
+	iov[0].iov_base = (char *)midQ->resp_buf;
+	iov[0].iov_len = be32_to_cpu(midQ->resp_buf->smb_buf_length) + 4;
+	if (midQ->largeBuf)
+		*pRespBufType = CIFS_LARGE_BUFFER;
+	else
+		*pRespBufType = CIFS_SMALL_BUFFER;
 
-		if ((flags & CIFS_NO_RESP) == 0)
-			midQ->resp_buf = NULL;  /* mark it so buf will
-						   not be freed by
-						   delete_mid */
-	} else {
-		rc = -EIO;
-		cFYI(1, "Bad MID state?");
-	}
+	rc = cifs_check_receive(midQ, ses->server, flags & CIFS_LOG_ERROR);
 
+	/* mark it so buf will not be freed by delete_mid */
+	if ((flags & CIFS_NO_RESP) == 0)
+		midQ->resp_buf = NULL;
 out:
 	delete_mid(midQ);
 	atomic_dec(&ses->server->inFlight);
@@ -666,7 +655,6 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 	    int *pbytes_returned, const int long_op)
 {
 	int rc = 0;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 
 	if (ses == NULL) {
@@ -753,47 +741,16 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 		return rc;
 	}
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
-		rc = -EIO;
-		goto out;
-	}
-
-	/* rcvd frame is ok */
-
-	if (midQ->resp_buf && out_buf
-	    && (midQ->midState == MID_RESPONSE_RECEIVED)) {
-		out_buf->smb_buf_length = cpu_to_be32(receive_len);
-		memcpy((char *)out_buf + 4,
-		       (char *)midQ->resp_buf + 4,
-		       receive_len);
-
-		dump_smb(out_buf, 92);
-		/* convert the length into a more usable form */
-		if ((receive_len > 24) &&
-		    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-					     SECMODE_SIGN_ENABLED))) {
-			rc = cifs_verify_signature(out_buf,
-						ses->server,
-						midQ->sequence_number+1);
-			if (rc) {
-				cERROR(1, "Unexpected SMB signature");
-				/* BB FIXME add code to kill session */
-			}
-		}
-
-		*pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
-
-		/* BB special case reconnect tid and uid here? */
-		rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
-	} else {
+	if (!midQ->resp_buf || !out_buf ||
+	    midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
 		cERROR(1, "Bad MID state?");
+		goto out;
 	}
 
+	*pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
+	memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
+	rc = cifs_check_receive(midQ, ses->server, 0);
 out:
 	delete_mid(midQ);
 	atomic_dec(&ses->server->inFlight);
@@ -834,7 +791,6 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 {
 	int rc = 0;
 	int rstart = 0;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 	struct cifs_ses *ses;
 
@@ -953,46 +909,16 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 	if (rc != 0)
 		return rc;
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
-		rc = -EIO;
-		goto out;
-	}
-
 	/* rcvd frame is ok */
-
-	if ((out_buf == NULL) || (midQ->midState != MID_RESPONSE_RECEIVED)) {
+	if (out_buf == NULL || midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
 		cERROR(1, "Bad MID state?");
 		goto out;
 	}
 
-	out_buf->smb_buf_length = cpu_to_be32(receive_len);
-	memcpy((char *)out_buf + 4,
-	       (char *)midQ->resp_buf + 4,
-	       receive_len);
-
-	dump_smb(out_buf, 92);
-	/* convert the length into a more usable form */
-	if ((receive_len > 24) &&
-	    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-				     SECMODE_SIGN_ENABLED))) {
-		rc = cifs_verify_signature(out_buf,
-					   ses->server,
-					   midQ->sequence_number+1);
-		if (rc) {
-			cERROR(1, "Unexpected SMB signature");
-			/* BB FIXME add code to kill session */
-		}
-	}
-
-	*pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
-
-	/* BB special case reconnect tid and uid here? */
-	rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
-
+	*pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
+	memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
+	rc = cifs_check_receive(midQ, ses->server, 0);
 out:
 	delete_mid(midQ);
 	if (rstart && rc == -EACCES)
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] cifs: make cifs_send_async take a kvec array
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-13 11:43   ` [PATCH 1/7] cifs: consolidate SendReceive response checks Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
  2011-04-13 11:43   ` [PATCH 3/7] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

We'll need this for async writes, so convert the call to take a kvec
array. CIFSSMBEcho is changed to put a kvec on the stack and pass
in the SMB buffer using that.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |    4 ++--
 fs/cifs/cifssmb.c   |    6 ++++--
 fs/cifs/transport.c |   13 +++++++------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 4af7db8..e255a2b 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -65,8 +65,8 @@ extern struct mid_q_entry *AllocMidQEntry(const struct smb_hdr *smb_buffer,
 					struct TCP_Server_Info *server);
 extern void DeleteMidQEntry(struct mid_q_entry *midEntry);
 extern int wait_for_free_request(struct TCP_Server_Info *sv, const int long_op);
-extern int cifs_call_async(struct TCP_Server_Info *server,
-			   struct smb_hdr *in_buf, mid_callback_t *callback,
+extern int cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
+			   unsigned int nvec, mid_callback_t *callback,
 			   void *cbdata);
 extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
 			struct smb_hdr * /* input */ ,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index ca58aea..79e4881 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -725,6 +725,7 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
 {
 	ECHO_REQ *smb;
 	int rc = 0;
+	struct kvec iov;
 
 	cFYI(1, "In echo request");
 
@@ -739,9 +740,10 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
 	put_bcc(1, &smb->hdr);
 	smb->Data[0] = 'a';
 	inc_rfc1001_len(smb, 3);
+	iov.iov_base = smb;
+	iov.iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4;
 
-	rc = cifs_call_async(server, (struct smb_hdr *)smb,
-				cifs_echo_callback, server);
+	rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server);
 	if (rc)
 		cFYI(1, "Echo request failed: %d", rc);
 
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index d9833fc..7ccf292 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -349,11 +349,12 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
  * the result. Caller is responsible for dealing with timeouts.
  */
 int
-cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
-		mid_callback_t *callback, void *cbdata)
+cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
+		unsigned int nvec, mid_callback_t *callback, void *cbdata)
 {
 	int rc;
 	struct mid_q_entry *mid;
+	struct smb_hdr *hdr = (struct smb_hdr *)iov[0].iov_base;
 
 	rc = wait_for_free_request(server, CIFS_ASYNC_OP);
 	if (rc)
@@ -361,10 +362,10 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 
 	/* enable signing if server requires it */
 	if (server->sec_mode & (SECMODE_SIGN_REQUIRED | SECMODE_SIGN_ENABLED))
-		in_buf->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
+		hdr->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
 
 	mutex_lock(&server->srv_mutex);
-	mid = AllocMidQEntry(in_buf, server);
+	mid = AllocMidQEntry(hdr, server);
 	if (mid == NULL) {
 		mutex_unlock(&server->srv_mutex);
 		return -ENOMEM;
@@ -375,7 +376,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 	list_add_tail(&mid->qhead, &server->pending_mid_q);
 	spin_unlock(&GlobalMid_Lock);
 
-	rc = cifs_sign_smb(in_buf, server, &mid->sequence_number);
+	rc = cifs_sign_smb2(iov, nvec, server, &mid->sequence_number);
 	if (rc) {
 		mutex_unlock(&server->srv_mutex);
 		goto out_err;
@@ -385,7 +386,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 	mid->callback_data = cbdata;
 	mid->midState = MID_REQUEST_SUBMITTED;
 	cifs_in_send_inc(server);
-	rc = smb_send(server, in_buf, be32_to_cpu(in_buf->smb_buf_length));
+	rc = smb_sendv(server, iov, nvec);
 	cifs_in_send_dec(server);
 	cifs_save_when_sent(mid);
 	mutex_unlock(&server->srv_mutex);
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] cifs: don't call mid_q_entry->callback under the Global_MidLock
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-13 11:43   ` [PATCH 1/7] cifs: consolidate SendReceive response checks Jeff Layton
  2011-04-13 11:43   ` [PATCH 2/7] cifs: make cifs_send_async take a kvec array Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
  2011-04-13 11:43   ` [PATCH 4/7] cifs: add ignore_pend flag to cifs_call_async Jeff Layton
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Holding the spinlock while we call this function means that it can't
sleep, which really limits what it can do. Taking it out from under
the spinlock also means less contention for this global lock.

Change the semantics such that the Global_MidLock is not held when
the callback is called. To do this requires that we take extra care
not to have sync_mid_result remove the mid from the list when the
mid is in a state where that has already happened. This prevents
list corruption when the mid is sitting on a private list for
reconnect or when cifsd is coming down.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsglob.h      |    7 +++----
 fs/cifs/connect.c       |   32 +++++++++++++++++++++++++-------
 fs/cifs/smb2transport.c |   14 ++++----------
 fs/cifs/transport.c     |   23 ++++++++---------------
 4 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index ccbac61..b3a6404 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -583,9 +583,8 @@ struct mid_q_entry;
  * This is the prototype for the mid callback function. When creating one,
  * take special care to avoid deadlocks. Things to bear in mind:
  *
- * - it will be called by cifsd
- * - the GlobalMid_Lock will be held
- * - the mid will be removed from the pending_mid_q list
+ * - it will be called by cifsd, with no locks held
+ * - the mid will be removed from any lists
  */
 typedef void (mid_callback_t)(struct mid_q_entry *mid);
 
@@ -739,7 +738,7 @@ static inline void free_dfs_info_array(struct dfs_info3_param *param,
 #define   MID_RESPONSE_RECEIVED 4
 #define   MID_RETRY_NEEDED      8 /* session closed while this request out */
 #define   MID_RESPONSE_MALFORMED 0x10
-#define   MID_NO_RESPONSE_NEEDED 0x20
+#define   MID_SHUTDOWN		 0x20
 
 /* Types of response buffer returned from SendReceive2 */
 #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index b0c56ca..8f737d2 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -141,6 +141,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	struct cifs_ses *ses;
 	struct cifs_tcon *tcon;
 	struct mid_q_entry *mid_entry;
+	struct list_head retry_list;
 
 	spin_lock(&GlobalMid_Lock);
 	if (server->tcpStatus == CifsExiting) {
@@ -192,16 +193,23 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	mutex_unlock(&server->srv_mutex);
 
 	/* mark submitted MIDs for retry and issue callback */
-	cFYI(1, "%s: issuing mid callbacks", __func__);
+	INIT_LIST_HEAD(&retry_list);
+	cFYI(1, "%s: moving mids to private list", __func__);
 	spin_lock(&GlobalMid_Lock);
 	list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 		if (mid_entry->midState == MID_REQUEST_SUBMITTED)
 			mid_entry->midState = MID_RETRY_NEEDED;
+		list_move(&mid_entry->qhead, &retry_list);
+	}
+	spin_unlock(&GlobalMid_Lock);
+
+	cFYI(1, "%s: moving mids to private list", __func__);
+	list_for_each_safe(tmp, tmp2, &retry_list) {
+		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 		list_del_init(&mid_entry->qhead);
 		mid_entry->callback(mid_entry);
 	}
-	spin_unlock(&GlobalMid_Lock);
 
 	while (server->tcpStatus == CifsNeedReconnect) {
 		try_to_freeze();
@@ -655,11 +663,10 @@ multi_t2_fnd:
 				else
 					mid_entry->midState =
 							MID_RESPONSE_MALFORMED;
+				list_del_init(&mid_entry->qhead);
 #ifdef CONFIG_CIFS_STATS2
 				mid_entry->when_received = jiffies;
 #endif
-				list_del_init(&mid_entry->qhead);
-				mid_entry->callback(mid_entry);
 				break;
 			}
 			mid_entry = NULL;
@@ -667,6 +674,7 @@ multi_t2_fnd:
 		spin_unlock(&GlobalMid_Lock);
 
 		if (mid_entry != NULL) {
+			mid_entry->callback(mid_entry);
 			/* Was previous buf put in mpx struct for multi-rsp? */
 			if (!isMultiRsp) {
 				/* smb buffer will be freed by user thread */
@@ -730,15 +738,25 @@ multi_t2_fnd:
 		cifs_small_buf_release(smallbuf);
 
 	if (!list_empty(&server->pending_mid_q)) {
+		struct list_head dispose_list;
+
+		INIT_LIST_HEAD(&dispose_list);
 		spin_lock(&GlobalMid_Lock);
 		list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-			cFYI(1, "Clearing Mid 0x%x - issuing callback",
-					 mid_entry->mid);
+			cFYI(1, "Clearing mid 0x%x", mid_entry->mid);
+			mid_entry->midState = MID_SHUTDOWN;
+			list_move(&mid_entry->qhead, &dispose_list);
+		}
+		spin_unlock(&GlobalMid_Lock);
+
+		/* now walk dispose list and issue callbacks */
+		list_for_each_safe(tmp, tmp2, &dispose_list) {
+			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+			cFYI(1, "Callback mid 0x%x", mid_entry->mid);
 			list_del_init(&mid_entry->qhead);
 			mid_entry->callback(mid_entry);
 		}
-		spin_unlock(&GlobalMid_Lock);
 		/* 1/8th of sec is more than enough time for them to exit */
 		msleep(125);
 	}
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index a187587..e0fa75308 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -276,28 +276,22 @@ sync_smb2_mid_result(struct smb2_mid_entry *mid, struct TCP_Server_Info *server)
 
 	spin_lock(&GlobalMid_Lock);
 	/* ensure that it's no longer on the pending_mid_q */
-	list_del_init(&mid->qhead);
 
 	switch (mid->mid_state) {
 	case MID_RESPONSE_RECEIVED:
 		spin_unlock(&GlobalMid_Lock);
 		return rc;
-	case MID_REQUEST_SUBMITTED:
-		/* socket is going down, reject all calls */
-		if (server->tcpStatus == CifsExiting) {
-			cERROR(1, "%s: canceling mid=%lld cmd=0x%x state=%d",
-				__func__, mid->mid, le16_to_cpu(mid->command),
-				mid->mid_state);
-			rc = -EHOSTDOWN;
-			break;
-		}
 	case MID_RETRY_NEEDED:
 		rc = -EAGAIN;
 		break;
 	case MID_RESPONSE_MALFORMED:
 		rc = -EIO;
 		break;
+	case MID_SHUTDOWN:
+		rc = -EHOSTDOWN;
+		break;
 	default:
+		list_del_init(&mid->qhead);
 		cERROR(1, "%s: invalid mid state mid=%lld state=%d", __func__,
 			mid->mid, mid->mid_state);
 		rc = -EIO;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 7ccf292..a31c279 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -428,7 +428,7 @@ SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
 }
 
 static int
-sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
+cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 {
 	int rc = 0;
 
@@ -436,28 +436,21 @@ sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 		mid->mid, mid->midState);
 
 	spin_lock(&GlobalMid_Lock);
-	/* ensure that it's no longer on the pending_mid_q */
-	list_del_init(&mid->qhead);
-
 	switch (mid->midState) {
 	case MID_RESPONSE_RECEIVED:
 		spin_unlock(&GlobalMid_Lock);
 		return rc;
-	case MID_REQUEST_SUBMITTED:
-		/* socket is going down, reject all calls */
-		if (server->tcpStatus == CifsExiting) {
-			cERROR(1, "%s: canceling mid=%d cmd=0x%x state=%d",
-			       __func__, mid->mid, mid->command, mid->midState);
-			rc = -EHOSTDOWN;
-			break;
-		}
 	case MID_RETRY_NEEDED:
 		rc = -EAGAIN;
 		break;
 	case MID_RESPONSE_MALFORMED:
 		rc = -EIO;
 		break;
+	case MID_SHUTDOWN:
+		rc = -EHOSTDOWN;
+		break;
 	default:
+		list_del_init(&mid->qhead);
 		cERROR(1, "%s: invalid mid state mid=%d state=%d", __func__,
 			mid->mid, mid->midState);
 		rc = -EIO;
@@ -617,7 +610,7 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 
 	cifs_small_buf_release(in_buf);
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0) {
 		atomic_dec(&ses->server->inFlight);
 		wake_up(&ses->server->request_q);
@@ -735,7 +728,7 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 		spin_unlock(&GlobalMid_Lock);
 	}
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0) {
 		atomic_dec(&ses->server->inFlight);
 		wake_up(&ses->server->request_q);
@@ -906,7 +899,7 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 		rstart = 1;
 	}
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0)
 		return rc;
 
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] cifs: add ignore_pend flag to cifs_call_async
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2011-04-13 11:43   ` [PATCH 3/7] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
       [not found]     ` <1302694994-8303-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-13 11:43   ` [PATCH 5/7] cifs: add cifs_async_writev Jeff Layton
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

The current code always ignores the max_pending limit. Have it instead
only optionally ignore the pending limit. For CIFSSMBEcho, we need to
ignore it to make sure they always can go out. For async reads, writes
and potentially other calls, we need to respect it.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |    2 +-
 fs/cifs/cifssmb.c   |    2 +-
 fs/cifs/transport.c |    5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index e255a2b..c621b45 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -67,7 +67,7 @@ extern void DeleteMidQEntry(struct mid_q_entry *midEntry);
 extern int wait_for_free_request(struct TCP_Server_Info *sv, const int long_op);
 extern int cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
 			   unsigned int nvec, mid_callback_t *callback,
-			   void *cbdata);
+			   void *cbdata, bool ignore_pend);
 extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
 			struct smb_hdr * /* input */ ,
 			struct smb_hdr * /* out */ ,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 79e4881..33adc15 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -743,7 +743,7 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
 	iov.iov_base = smb;
 	iov.iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4;
 
-	rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server);
+	rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server, true);
 	if (rc)
 		cFYI(1, "Echo request failed: %d", rc);
 
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index a31c279..ad32b70 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -350,13 +350,14 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
  */
 int
 cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
-		unsigned int nvec, mid_callback_t *callback, void *cbdata)
+		unsigned int nvec, mid_callback_t *callback, void *cbdata,
+		bool ignore_pend)
 {
 	int rc;
 	struct mid_q_entry *mid;
 	struct smb_hdr *hdr = (struct smb_hdr *)iov[0].iov_base;
 
-	rc = wait_for_free_request(server, CIFS_ASYNC_OP);
+	rc = wait_for_free_request(server, ignore_pend ? CIFS_ASYNC_OP : 0);
 	if (rc)
 		return rc;
 
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] cifs: add cifs_async_writev
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2011-04-13 11:43   ` [PATCH 4/7] cifs: add ignore_pend flag to cifs_call_async Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
  2011-04-13 11:43   ` [PATCH 6/7] cifs: convert cifs_writepages to use async writes Jeff Layton
  2011-04-13 11:43   ` [PATCH 7/7] cifs: clean up wsize negotiation and allow for larger wsize Jeff Layton
  6 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Add the ability for CIFS to do an asynchronous write. The kernel will
set the frame up as it would for a "normal" SMBWrite2 request, and use
cifs_call_async to send it. The mid callback will then be configured to
handle the result.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |   20 +++++
 fs/cifs/cifssmb.c   |  212 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 232 insertions(+), 0 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index c621b45..37e8c35 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -21,6 +21,7 @@
 #ifndef _CIFSPROTO_H
 #define _CIFSPROTO_H
 #include <linux/nls.h>
+#include <linux/pagevec.h>
 
 struct statfs;
 struct smb_vol;
@@ -433,4 +434,23 @@ extern int mdfour(unsigned char *, unsigned char *, int);
 extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
 extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
 			unsigned char *p24);
+
+/* asynchronous write support */
+struct cifs_writedata {
+	struct list_head	pending;
+	struct kref		refcount;
+	struct completion	completion;
+	struct work_struct	work;
+	struct cifsFileInfo	*cfile;
+	__u64			offset;
+	unsigned int		bytes;
+	int			result;
+	unsigned int		nr_pages;
+	struct page 		*pages[1];
+};
+
+int cifs_async_writev(struct cifs_writedata *wdata);
+struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages);
+void cifs_writedata_release(struct kref *refcount);
+
 #endif			/* _CIFSPROTO_H */
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 33adc15..32034cf 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -32,6 +32,7 @@
 #include <linux/vfs.h>
 #include <linux/slab.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
@@ -1604,6 +1605,217 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
 	return rc;
 }
 
+void
+cifs_writedata_release(struct kref *refcount)
+{
+	struct cifs_writedata *wdata = container_of(refcount,
+					struct cifs_writedata, refcount);
+	struct address_space *mapping;
+
+	if (wdata->cfile) {
+		mapping = wdata->cfile->dentry->d_inode->i_mapping;
+
+		spin_lock(&mapping->private_lock);
+		list_del(&wdata->pending);
+		spin_unlock(&mapping->private_lock);
+
+		cifsFileInfo_put(wdata->cfile);
+	}
+
+	kfree(wdata);
+}
+
+static void
+cifs_writev_complete(struct work_struct *work)
+{
+	struct cifs_writedata *wdata = container_of(work,
+						struct cifs_writedata, work);
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	struct page *page;
+	int i = 0;
+
+	if (wdata->result == 0) {
+		cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
+		cifs_stats_bytes_written(tcon, wdata->bytes);
+	}
+
+	for (i = 0; i < wdata->nr_pages; i++) {
+		page = wdata->pages[i];
+		if (wdata->result == -EAGAIN)
+			__set_page_dirty_nobuffers(page);
+		else if (wdata->result < 0)
+			SetPageError(page);
+		end_page_writeback(page);
+		page_cache_release(page);
+	}
+	if (wdata->result != -EAGAIN)
+		mapping_set_error(inode->i_mapping, wdata->result);
+	complete(&wdata->completion);
+	kref_put(&wdata->refcount, cifs_writedata_release);
+}
+
+struct cifs_writedata *
+cifs_writedata_alloc(unsigned int nr_pages)
+{
+	struct cifs_writedata *wdata;
+
+	/* writedata + number of page pointers */
+	wdata = kzalloc(sizeof(*wdata) +
+			sizeof(struct page *) * (nr_pages - 1), GFP_NOFS);
+	if (wdata != NULL) {
+		INIT_LIST_HEAD(&wdata->pending);
+		INIT_WORK(&wdata->work, cifs_writev_complete);
+		kref_init(&wdata->refcount);
+		init_completion(&wdata->completion);
+	}
+	return wdata;
+}
+
+/*
+ * Check the midState and signature on received buffer (if any), and queue the
+ * workqueue completion task.
+ */
+static void
+cifs_writev_callback(struct mid_q_entry *mid)
+{
+	struct cifs_writedata *wdata = mid->callback_data;
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	unsigned int written;
+	WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
+
+	switch(mid->midState) {
+	case MID_RESPONSE_RECEIVED:
+		wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
+		if (wdata->result != 0)
+			break;
+
+		written = le16_to_cpu(smb->CountHigh);
+		written <<= 16;
+		written += le16_to_cpu(smb->Count);
+		/*
+		 * Mask off high 16 bits when bytes written as returned
+		 * by the server is greater than bytes requested by the
+		 * client. OS/2 servers are known to set incorrect
+		 * CountHigh values.
+		 */
+		if (written > wdata->bytes)
+			written &= 0xFFFF;
+
+		if (written < wdata->bytes)
+			wdata->result = -ENOSPC;
+		else
+			wdata->bytes = written;
+		break;
+	case MID_REQUEST_SUBMITTED:
+	case MID_RETRY_NEEDED:
+		wdata->result = -EAGAIN;
+		break;
+	default:
+		wdata->result = -EIO;
+		break;
+	}
+
+	queue_work(system_nrt_wq, &wdata->work);
+	DeleteMidQEntry(mid);
+	atomic_dec(&tcon->ses->server->inFlight);
+	wake_up(&tcon->ses->server->request_q);
+}
+
+/* cifs_async_writev - send an async write, and set up mid to handle result */
+int
+cifs_async_writev(struct cifs_writedata *wdata)
+{
+	int i, rc = -EACCES;
+	WRITE_REQ *smb = NULL;
+	int wct;
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	struct kvec *iov = NULL;
+
+	if (tcon->ses->capabilities & CAP_LARGE_FILES) {
+		wct = 14;
+	} else {
+		wct = 12;
+		if (wdata->offset >> 32 > 0) {
+			/* can not handle big offset for old srv */
+			return -EIO;
+		}
+	}
+
+	rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
+	if (rc)
+		goto async_writev_out;
+
+	/* 1 iov per page + 1 for header */
+	iov = kzalloc((wdata->nr_pages + 1) * sizeof(*iov), GFP_NOFS);
+	if (iov == NULL) {
+		rc = -ENOMEM;
+		goto async_writev_out;
+	}
+
+	smb->AndXCommand = 0xFF;	/* none */
+	smb->Fid = wdata->cfile->netfid;
+	smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
+	if (wct == 14)
+		smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
+	smb->Reserved = 0xFFFFFFFF;
+	smb->WriteMode = 0;
+	smb->Remaining = 0;
+
+	smb->DataOffset =
+	    cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
+
+	/* 4 for RFC1001 length + 1 for BCC */
+	iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
+	iov[0].iov_base = smb;
+
+	/* marshal up the pages into iov array */
+	wdata->bytes = 0;
+	for (i = 0; i < wdata->nr_pages; i++) {
+		iov[i + 1].iov_len = min(inode->i_size -
+				      page_offset(wdata->pages[i]),
+					(loff_t)PAGE_CACHE_SIZE);
+		iov[i + 1].iov_base = kmap(wdata->pages[i]);
+		wdata->bytes += iov[i + 1].iov_len;
+	}
+
+	cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
+
+	smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
+	smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
+
+	if (wct == 14) {
+		inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
+		put_bcc(wdata->bytes + 1, &smb->hdr);
+	} else {
+		/* wct == 12 */
+		struct smb_com_writex_req *smbw =
+				(struct smb_com_writex_req *)smb;
+		inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
+		put_bcc(wdata->bytes + 5, &smbw->hdr);
+		iov[0].iov_len += 4; /* pad bigger by four bytes */
+	}
+
+	kref_get(&wdata->refcount);
+	rc = cifs_call_async(tcon->ses->server, iov, wdata->nr_pages + 1,
+			     cifs_writev_callback, wdata, false);
+
+	if (rc == 0)
+		cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
+	else
+		kref_put(&wdata->refcount, cifs_writedata_release);
+
+	/* send is done, unmap pages */
+	for (i = 0; i < wdata->nr_pages; i++)
+		kunmap(wdata->pages[i]);
+
+async_writev_out:
+	cifs_small_buf_release(smb);
+	kfree(iov);
+	return rc;
+}
+
 int
 CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
 	     const int netfid, const unsigned int count,
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] cifs: convert cifs_writepages to use async writes
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2011-04-13 11:43   ` [PATCH 5/7] cifs: add cifs_async_writev Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
       [not found]     ` <1302694994-8303-7-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-13 11:43   ` [PATCH 7/7] cifs: clean up wsize negotiation and allow for larger wsize Jeff Layton
  6 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Have cifs_writepages issue asynchronous writes instead of waiting on
each write call to complete before issuing another. This also allows us
to return more quickly from writepages in the WB_SYNC_NONE case. It
can just send out all of the I/Os and not wait around for the replies.

In the WB_SYNC_ALL case, have it wait for all of the writes to complete
after issuing them and return any errors that turn up from it. If those
errors turn out to all be retryable (-EAGAIN), then retry the entire
operation again.

This also changes the page locking semantics a little bit. Instead of
holding the page lock until the response is received, release it after
doing the send. This will reduce contention for the page lock and should
prevent processes that have the file mmap'ed from being blocked
unnecessarily.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/file.c |  292 +++++++++++++++++++++++++++++---------------------------
 1 files changed, 152 insertions(+), 140 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index b3d2e3f..7ff138f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1089,32 +1089,65 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
 	return rc;
 }
 
+/*
+ * walk a list of in progress async writes and wait for them to complete.
+ * Check the result code on each one.
+ *
+ * There's a clear heirarchy: 0 < EAGAIN < other errors
+ *
+ * If we get an EAGAIN return on a WB_SYNC_ALL writeout, then we need to
+ * go back and do the whole operation again. If we get other errors then
+ * the mapping already likely has AS_EIO or AS_ENOSPC set, so we can
+ * give up and just return an error.
+ */
+static int
+cifs_wait_for_write_completion(struct address_space *mapping,
+				struct list_head *pending)
+{
+	int tmprc, rc = 0;
+	struct cifs_writedata *wdata;
+
+	/* wait for writes to complete */
+	for (;;) {
+		/* anything left on list? */
+		spin_lock(&mapping->private_lock);
+		if (list_empty(pending)) {
+			spin_unlock(&mapping->private_lock);
+			break;
+		}
+
+		/* dequeue an entry */
+		wdata = list_first_entry(pending, struct cifs_writedata,
+					 pending);
+		list_del_init(&wdata->pending);
+		spin_unlock(&mapping->private_lock);
+
+		/* wait for it to complete */
+		tmprc = wait_for_completion_killable(&wdata->completion);
+		if (tmprc)
+			rc = tmprc;
+
+		if (wdata->result == -EAGAIN)
+			rc = rc ? rc : wdata->result;
+		else if (wdata->result != 0)
+			rc = wdata->result;
+
+		kref_put(&wdata->refcount, cifs_writedata_release);
+	}
+
+	return rc;
+}
+
 static int cifs_writepages(struct address_space *mapping,
 			   struct writeback_control *wbc)
 {
-	unsigned int bytes_to_write;
-	unsigned int bytes_written;
-	struct cifs_sb_info *cifs_sb;
-	int done = 0;
-	pgoff_t end;
-	pgoff_t index;
-	int range_whole = 0;
-	struct kvec *iov;
-	int len;
-	int n_iov = 0;
-	pgoff_t next;
-	int nr_pages;
-	__u64 offset = 0;
-	struct cifsFileInfo *open_file;
-	struct cifs_tcon *tcon;
-	struct cifsInodeInfo *cifsi = CIFS_I(mapping->host);
+	struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
+	bool done = false, scanned = false, range_whole = false;
+	pgoff_t end, index;
+	struct cifs_writedata *wdata;
 	struct page *page;
-	struct pagevec pvec;
+	struct list_head pending;
 	int rc = 0;
-	int scanned = 0;
-	int xid;
-
-	cifs_sb = CIFS_SB(mapping->host->i_sb);
 
 	/*
 	 * If wsize is smaller that the page cache size, default to writing
@@ -1123,27 +1156,8 @@ static int cifs_writepages(struct address_space *mapping,
 	if (cifs_sb->wsize < PAGE_CACHE_SIZE)
 		return generic_writepages(mapping, wbc);
 
-	iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
-	if (iov == NULL)
-		return generic_writepages(mapping, wbc);
-
-	/*
-	 * if there's no open file, then this is likely to fail too,
-	 * but it'll at least handle the return. Maybe it should be
-	 * a BUG() instead?
-	 */
-	open_file = find_writable_file(CIFS_I(mapping->host), false);
-	if (!open_file) {
-		kfree(iov);
-		return generic_writepages(mapping, wbc);
-	}
-
-	tcon = tlink_tcon(open_file->tlink);
-	cifsFileInfo_put(open_file);
+	INIT_LIST_HEAD(&pending);
 
-	xid = GetXid();
-
-	pagevec_init(&pvec, 0);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
@@ -1151,24 +1165,34 @@ static int cifs_writepages(struct address_space *mapping,
 		index = wbc->range_start >> PAGE_CACHE_SHIFT;
 		end = wbc->range_end >> PAGE_CACHE_SHIFT;
 		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
-			range_whole = 1;
-		scanned = 1;
+			range_whole = true;
+		scanned = true;
 	}
 retry:
-	while (!done && (index <= end) &&
-	       (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
-			PAGECACHE_TAG_DIRTY,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
-		int first;
-		unsigned int i;
-
-		first = -1;
-		next = 0;
-		n_iov = 0;
-		bytes_to_write = 0;
-
-		for (i = 0; i < nr_pages; i++) {
-			page = pvec.pages[i];
+	while (!done && index <= end) {
+		unsigned int i, nr_pages, found_pages;
+		pgoff_t next = 0, tofind;
+
+		tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
+				end - index) + 1;
+
+		wdata = cifs_writedata_alloc((unsigned int)tofind);
+		if (!wdata) {
+			rc = -ENOMEM;
+			break;
+		}
+
+		found_pages = find_get_pages_tag(mapping, &index,
+						 PAGECACHE_TAG_DIRTY,
+						 tofind, wdata->pages);
+		if (found_pages == 0) {
+			kref_put(&wdata->refcount, cifs_writedata_release);
+			break;
+		}
+
+		nr_pages = 0;
+		for (i = 0; i < found_pages; i++) {
+			page = wdata->pages[i];
 			/*
 			 * At this point we hold neither mapping->tree_lock nor
 			 * lock on the page itself: the page may be truncated or
@@ -1177,7 +1201,7 @@ retry:
 			 * mapping
 			 */
 
-			if (first < 0)
+			if (nr_pages == 0)
 				lock_page(page);
 			else if (!trylock_page(page))
 				break;
@@ -1188,7 +1212,7 @@ retry:
 			}
 
 			if (!wbc->range_cyclic && page->index > end) {
-				done = 1;
+				done = true;
 				unlock_page(page);
 				break;
 			}
@@ -1215,119 +1239,107 @@ retry:
 			set_page_writeback(page);
 
 			if (page_offset(page) >= mapping->host->i_size) {
-				done = 1;
+				done = true;
 				unlock_page(page);
 				end_page_writeback(page);
 				break;
 			}
 
-			/*
-			 * BB can we get rid of this?  pages are held by pvec
-			 */
-			page_cache_get(page);
+			wdata->pages[i] = page;
+			next = page->index + 1;
+			++nr_pages;
+		}
 
-			len = min(mapping->host->i_size - page_offset(page),
-				  (loff_t)PAGE_CACHE_SIZE);
+		/* reset index to refind any pages skipped */
+		if (nr_pages == 0)
+			index = wdata->pages[0]->index + 1;
 
-			/* reserve iov[0] for the smb header */
-			n_iov++;
-			iov[n_iov].iov_base = kmap(page);
-			iov[n_iov].iov_len = len;
-			bytes_to_write += len;
+		/* put any pages we aren't going to use */
+		for (i = nr_pages; i < found_pages; i++) {
+			page_cache_release(wdata->pages[i]);
+			wdata->pages[i] = NULL;
+		}
 
-			if (first < 0) {
-				first = i;
-				offset = page_offset(page);
-			}
-			next = page->index + 1;
-			if (bytes_to_write + PAGE_CACHE_SIZE > cifs_sb->wsize)
-				break;
+		/* nothing to write? */
+		if (nr_pages == 0) {
+			kref_put(&wdata->refcount, cifs_writedata_release);
+			continue;
 		}
-		if (n_iov) {
-retry_write:
-			open_file = find_writable_file(CIFS_I(mapping->host),
-							false);
-			if (!open_file) {
+
+		wdata->nr_pages = nr_pages;
+		wdata->offset = page_offset(wdata->pages[0]);
+
+		do {
+			if (wdata->cfile != NULL)
+				cifsFileInfo_put(wdata->cfile);
+			wdata->cfile = find_writable_file(CIFS_I(mapping->host),
+							  false);
+			if (!wdata->cfile) {
 				cERROR(1, "No writable handles for inode");
 				rc = -EBADF;
-			} else {
-				rc = CIFSSMBWrite2(xid, tcon, open_file->netfid,
-						   bytes_to_write, offset,
-						   &bytes_written, iov, n_iov,
-						   0);
-				cifsFileInfo_put(open_file);
+				break;
 			}
-
-			cFYI(1, "Write2 rc=%d, wrote=%u", rc, bytes_written);
-
-			/*
-			 * For now, treat a short write as if nothing got
-			 * written. A zero length write however indicates
-			 * ENOSPC or EFBIG. We have no way to know which
-			 * though, so call it ENOSPC for now. EFBIG would
-			 * get translated to AS_EIO anyway.
-			 *
-			 * FIXME: make it take into account the data that did
-			 *	  get written
-			 */
-			if (rc == 0) {
-				if (bytes_written == 0)
-					rc = -ENOSPC;
-				else if (bytes_written < bytes_to_write)
-					rc = -EAGAIN;
+			if (wbc->sync_mode == WB_SYNC_ALL) {
+				spin_lock(&mapping->private_lock);
+				list_move(&wdata->pending, &pending);
+				spin_unlock(&mapping->private_lock);
 			}
+			rc = cifs_async_writev(wdata);
+		} while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
 
-			/* retry on data-integrity flush */
-			if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
-				goto retry_write;
+		for (i = 0; i < nr_pages; ++i)
+			unlock_page(wdata->pages[i]);
 
-			/* fix the stats and EOF */
-			if (bytes_written > 0) {
-				cifs_stats_bytes_written(tcon, bytes_written);
-				cifs_update_eof(cifsi, offset, bytes_written);
-			}
-
-			for (i = 0; i < n_iov; i++) {
-				page = pvec.pages[first + i];
-				/* on retryable write error, redirty page */
+		/* send failure -- clean up the mess */
+		if (rc != 0) {
+			for (i = 0; i < nr_pages; ++i) {
 				if (rc == -EAGAIN)
-					redirty_page_for_writepage(wbc, page);
-				else if (rc != 0)
-					SetPageError(page);
-				kunmap(page);
-				unlock_page(page);
-				end_page_writeback(page);
-				page_cache_release(page);
+					redirty_page_for_writepage(wbc,
+							   wdata->pages[i]);
+				else
+					SetPageError(wdata->pages[i]);
+				end_page_writeback(wdata->pages[i]);
+				page_cache_release(wdata->pages[i]);
 			}
-
 			if (rc != -EAGAIN)
 				mapping_set_error(mapping, rc);
-			else
-				rc = 0;
-
-			if ((wbc->nr_to_write -= n_iov) <= 0)
-				done = 1;
-			index = next;
-		} else
-			/* Need to re-find the pages we skipped */
-			index = pvec.pages[0]->index + 1;
+			kref_put(&wdata->refcount, cifs_writedata_release);
+		} else if (wbc->sync_mode == WB_SYNC_NONE) {
+			kref_put(&wdata->refcount, cifs_writedata_release);
+		}
 
-		pagevec_release(&pvec);
+		if ((wbc->nr_to_write -= nr_pages) <= 0)
+			done = true;
+		index = next;
 	}
+
 	if (!scanned && !done) {
 		/*
 		 * We hit the last page and there is more work to be done: wrap
 		 * back to the start of the file
 		 */
-		scanned = 1;
+		scanned = true;
 		index = 0;
 		goto retry;
 	}
+
+	/*
+	 * for WB_SYNC_ALL, we must wait until the writes complete. If any
+	 * return -EAGAIN however, we need to retry the whole thing again.
+	 * At that point nr_to_write will be all screwed up, but that
+	 * shouldn't really matter for WB_SYNC_ALL (right?)
+	 */
+	if (wbc->sync_mode == WB_SYNC_ALL) {
+		rc = cifs_wait_for_write_completion(mapping, &pending);
+		if (rc == -EAGAIN) {
+			index = wbc->range_start >> PAGE_CACHE_SHIFT;
+			goto retry;
+		}
+	}
+
 	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
 		mapping->writeback_index = index;
 
-	FreeXid(xid);
-	kfree(iov);
 	return rc;
 }
 
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] cifs: clean up wsize negotiation and allow for larger wsize
       [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2011-04-13 11:43   ` [PATCH 6/7] cifs: convert cifs_writepages to use async writes Jeff Layton
@ 2011-04-13 11:43   ` Jeff Layton
       [not found]     ` <1302694994-8303-8-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  6 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 11:43 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Now that we can handle larger wsizes in writepages, fix up the
negotiation of the wsize to allow for that. Make the code default to a
wsize of (128k - PAGE_CACHE_SIZE). That gives us a size that goes up to
the max frame size specified in RFC1001.

If CAP_LARGE_WRITE_AND_X isn't set, then set that down to a the largest
size allowed by the protocol (2^16 - 1), or the specified wsize,
whichever is smaller.

If a larger wsize is specified, then check to make sure the server
supports the POSIX extension that allows it to ignore the 128k limit. If
it doesn't then set it to the default value.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/connect.c |   62 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 8f737d2..dc4de86 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2600,23 +2600,6 @@ static void setup_cifs_sb(struct smb_vol *pvolume_info,
 	else /* default */
 		cifs_sb->rsize = CIFSMaxBufSize;
 
-	if (pvolume_info->wsize > PAGEVEC_SIZE * PAGE_CACHE_SIZE) {
-		cERROR(1, "wsize %d too large, using 4096 instead",
-			  pvolume_info->wsize);
-		cifs_sb->wsize = 4096;
-	} else if (pvolume_info->wsize)
-		cifs_sb->wsize = pvolume_info->wsize;
-	else
-		cifs_sb->wsize = min_t(const int,
-					PAGEVEC_SIZE * PAGE_CACHE_SIZE,
-					127*1024);
-		/* old default of CIFSMaxBufSize was too small now
-		   that SMB Write2 can send multiple pages in kvec.
-		   RFC1001 does not describe what happens when frame
-		   bigger than 128K is sent so use that as max in
-		   conjunction with 52K kvec constraint on arch with 4K
-		   page size  */
-
 	if (cifs_sb->rsize < 2048) {
 		cifs_sb->rsize = 2048;
 		/* Windows ME may prefer this */
@@ -2694,6 +2677,46 @@ static void setup_cifs_sb(struct smb_vol *pvolume_info,
 			   "mount option supported");
 }
 
+/*
+ * default to a wsize of 128k - PAGE_CACHE_SIZE -- one page less than the
+ * largest frame size described in RFC1001. This allows space for the header
+ * without going over that by default.
+ */
+#define CIFS_MAX_RFC1001_WSIZE (128 * 1024 - PAGE_CACHE_SIZE)
+
+/*
+ * It's possible to increase the wsize by setting the wsize parm directly
+ * up to a hard limit of 2^24 - PAGE_CACHE_SIZE. That might make for
+ * "interesting" allocation* problems during write however (as we have to
+ * allocate an array of pointers for the pages).
+ */
+#define CIFS_MAX_WSIZE ((1<<24) - PAGE_CACHE_SIZE)
+
+/* If CAP_LARGE_WRITE_AND_X isn't set, then limit the wsize to 16 bits */
+#define CIFS_MAX_SMALL_WRITEX_WSIZE (0xffff)
+
+static unsigned int
+cifs_negotiate_wsize(struct cifs_tcon *tcon, struct smb_vol *pvolume_info)
+{
+	__u64 unix_cap = le64_to_cpu(tcon->fsUnixInfo.Capability);
+	struct TCP_Server_Info *server = tcon->ses->server;
+	unsigned int wsize = pvolume_info->wsize ? pvolume_info->wsize :
+				CIFS_MAX_RFC1001_WSIZE;
+
+	/* no CAP_LARGE_WRITE_X? Limit it to 16 bits */
+	if (!(server->capabilities & CAP_LARGE_WRITE_X))
+		wsize = min_t(unsigned int, wsize, CIFS_MAX_SMALL_WRITEX_WSIZE);
+
+	/* can server support 24-bit write sizes? (via UNIX extensions) */
+	if (tcon->unix_ext && !(unix_cap & CIFS_UNIX_LARGE_WRITE_CAP))
+		wsize = min_t(unsigned int, wsize, CIFS_MAX_RFC1001_WSIZE);
+
+	/* hard limit of CIFS_MAX_WSIZE */
+	wsize = min_t(unsigned int, wsize, CIFS_MAX_WSIZE);
+
+	return wsize;
+}
+
 static int
 is_path_accessible(int xid, struct cifs_tcon *tcon,
 		   struct cifs_sb_info *cifs_sb, const char *full_path)
@@ -2884,13 +2907,12 @@ try_mount_again:
 		cifs_sb->rsize = 1024 * 127;
 		cFYI(DBG2, "no very large read support, rsize now 127K");
 	}
-	if (!(tcon->ses->capabilities & CAP_LARGE_WRITE_X))
-		cifs_sb->wsize = min(cifs_sb->wsize,
-			       (tcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE));
 	if (!(tcon->ses->capabilities & CAP_LARGE_READ_X))
 		cifs_sb->rsize = min(cifs_sb->rsize,
 			       (tcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE));
 
+	cifs_sb->wsize = cifs_negotiate_wsize(tcon, volume_info);
+
 remote_path_check:
 	/* check if a whole path (including prepath) is not remote */
 	if (!rc && tcon) {
-- 
1.7.4.2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 7/7] cifs: clean up wsize negotiation and allow for larger wsize
       [not found]     ` <1302694994-8303-8-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-13 12:13       ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-04-13 12:13 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Wed, 13 Apr 2011 07:43:14 -0400
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Now that we can handle larger wsizes in writepages, fix up the
> negotiation of the wsize to allow for that. Make the code default to a
> wsize of (128k - PAGE_CACHE_SIZE). That gives us a size that goes up to
> the max frame size specified in RFC1001.
> 
> If CAP_LARGE_WRITE_AND_X isn't set, then set that down to a the largest
> size allowed by the protocol (2^16 - 1), or the specified wsize,
> whichever is smaller.
> 
> If a larger wsize is specified, then check to make sure the server
> supports the POSIX extension that allows it to ignore the 128k limit. If
> it doesn't then set it to the default value.
> 
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/connect.c |   62 +++++++++++++++++++++++++++++++++++-----------------
>  1 files changed, 42 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 8f737d2..dc4de86 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -2600,23 +2600,6 @@ static void setup_cifs_sb(struct smb_vol *pvolume_info,
>  	else /* default */
>  		cifs_sb->rsize = CIFSMaxBufSize;
>  
> -	if (pvolume_info->wsize > PAGEVEC_SIZE * PAGE_CACHE_SIZE) {
> -		cERROR(1, "wsize %d too large, using 4096 instead",
> -			  pvolume_info->wsize);
> -		cifs_sb->wsize = 4096;
> -	} else if (pvolume_info->wsize)
> -		cifs_sb->wsize = pvolume_info->wsize;
> -	else
> -		cifs_sb->wsize = min_t(const int,
> -					PAGEVEC_SIZE * PAGE_CACHE_SIZE,
> -					127*1024);
> -		/* old default of CIFSMaxBufSize was too small now
> -		   that SMB Write2 can send multiple pages in kvec.
> -		   RFC1001 does not describe what happens when frame
> -		   bigger than 128K is sent so use that as max in
> -		   conjunction with 52K kvec constraint on arch with 4K
> -		   page size  */
> -
>  	if (cifs_sb->rsize < 2048) {
>  		cifs_sb->rsize = 2048;
>  		/* Windows ME may prefer this */
> @@ -2694,6 +2677,46 @@ static void setup_cifs_sb(struct smb_vol *pvolume_info,
>  			   "mount option supported");
>  }
>  
> +/*
> + * default to a wsize of 128k - PAGE_CACHE_SIZE -- one page less than the
> + * largest frame size described in RFC1001. This allows space for the header
> + * without going over that by default.
> + */
> +#define CIFS_MAX_RFC1001_WSIZE (128 * 1024 - PAGE_CACHE_SIZE)
> +
> +/*
> + * It's possible to increase the wsize by setting the wsize parm directly
> + * up to a hard limit of 2^24 - PAGE_CACHE_SIZE. That might make for
> + * "interesting" allocation* problems during write however (as we have to
> + * allocate an array of pointers for the pages).
> + */
> +#define CIFS_MAX_WSIZE ((1<<24) - PAGE_CACHE_SIZE)
> +
> +/* If CAP_LARGE_WRITE_AND_X isn't set, then limit the wsize to 16 bits */
> +#define CIFS_MAX_SMALL_WRITEX_WSIZE (0xffff)
> +
> +static unsigned int
> +cifs_negotiate_wsize(struct cifs_tcon *tcon, struct smb_vol *pvolume_info)
> +{
> +	__u64 unix_cap = le64_to_cpu(tcon->fsUnixInfo.Capability);
> +	struct TCP_Server_Info *server = tcon->ses->server;
> +	unsigned int wsize = pvolume_info->wsize ? pvolume_info->wsize :
> +				CIFS_MAX_RFC1001_WSIZE;
> +
> +	/* no CAP_LARGE_WRITE_X? Limit it to 16 bits */
> +	if (!(server->capabilities & CAP_LARGE_WRITE_X))
> +		wsize = min_t(unsigned int, wsize, CIFS_MAX_SMALL_WRITEX_WSIZE);
> +
> +	/* can server support 24-bit write sizes? (via UNIX extensions) */
> +	if (tcon->unix_ext && !(unix_cap & CIFS_UNIX_LARGE_WRITE_CAP))

	oops, logic bug here -- this should be:

	if (!tcon->unix_ext || !(unix_cap & CIFS_UNIX_LARGE_WRITE_CAP))

	...I'll fix it in next version...

> +		wsize = min_t(unsigned int, wsize, CIFS_MAX_RFC1001_WSIZE);
> +
> +	/* hard limit of CIFS_MAX_WSIZE */
> +	wsize = min_t(unsigned int, wsize, CIFS_MAX_WSIZE);
> +
> +	return wsize;
> +}
> +
>  static int
>  is_path_accessible(int xid, struct cifs_tcon *tcon,
>  		   struct cifs_sb_info *cifs_sb, const char *full_path)
> @@ -2884,13 +2907,12 @@ try_mount_again:
>  		cifs_sb->rsize = 1024 * 127;
>  		cFYI(DBG2, "no very large read support, rsize now 127K");
>  	}
> -	if (!(tcon->ses->capabilities & CAP_LARGE_WRITE_X))
> -		cifs_sb->wsize = min(cifs_sb->wsize,
> -			       (tcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE));
>  	if (!(tcon->ses->capabilities & CAP_LARGE_READ_X))
>  		cifs_sb->rsize = min(cifs_sb->rsize,
>  			       (tcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE));
>  
> +	cifs_sb->wsize = cifs_negotiate_wsize(tcon, volume_info);
> +
>  remote_path_check:
>  	/* check if a whole path (including prepath) is not remote */
>  	if (!rc && tcon) {


-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/7] cifs: add ignore_pend flag to cifs_call_async
       [not found]     ` <1302694994-8303-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-15  8:48       ` Pavel Shilovsky
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Shilovsky @ 2011-04-15  8:48 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/13 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> The current code always ignores the max_pending limit. Have it instead
> only optionally ignore the pending limit. For CIFSSMBEcho, we need to
> ignore it to make sure they always can go out. For async reads, writes
> and potentially other calls, we need to respect it.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/cifsproto.h |    2 +-
>  fs/cifs/cifssmb.c   |    2 +-
>  fs/cifs/transport.c |    5 +++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index e255a2b..c621b45 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -67,7 +67,7 @@ extern void DeleteMidQEntry(struct mid_q_entry *midEntry);
>  extern int wait_for_free_request(struct TCP_Server_Info *sv, const int long_op);
>  extern int cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
>                           unsigned int nvec, mid_callback_t *callback,
> -                          void *cbdata);
> +                          void *cbdata, bool ignore_pend);
>  extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
>                        struct smb_hdr * /* input */ ,
>                        struct smb_hdr * /* out */ ,
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 79e4881..33adc15 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -743,7 +743,7 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
>        iov.iov_base = smb;
>        iov.iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4;
>
> -       rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server);
> +       rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server, true);
>        if (rc)
>                cFYI(1, "Echo request failed: %d", rc);
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index a31c279..ad32b70 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -350,13 +350,14 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
>  */
>  int
>  cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
> -               unsigned int nvec, mid_callback_t *callback, void *cbdata)
> +               unsigned int nvec, mid_callback_t *callback, void *cbdata,
> +               bool ignore_pend)
>  {
>        int rc;
>        struct mid_q_entry *mid;
>        struct smb_hdr *hdr = (struct smb_hdr *)iov[0].iov_base;
>
> -       rc = wait_for_free_request(server, CIFS_ASYNC_OP);
> +       rc = wait_for_free_request(server, ignore_pend ? CIFS_ASYNC_OP : 0);
>        if (rc)
>                return rc;
>
> --
> 1.7.4.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Looks right. Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7] cifs: convert cifs_writepages to use async writes
       [not found]     ` <1302694994-8303-7-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-15  9:03       ` Pavel Shilovsky
       [not found]         ` <BANLkTintNfs=W+UFaC8SCUTt-hN5BXSFRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Shilovsky @ 2011-04-15  9:03 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/13 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> Have cifs_writepages issue asynchronous writes instead of waiting on
> each write call to complete before issuing another. This also allows us
> to return more quickly from writepages in the WB_SYNC_NONE case. It
> can just send out all of the I/Os and not wait around for the replies.
>
> In the WB_SYNC_ALL case, have it wait for all of the writes to complete
> after issuing them and return any errors that turn up from it. If those
> errors turn out to all be retryable (-EAGAIN), then retry the entire
> operation again.
>
> This also changes the page locking semantics a little bit. Instead of
> holding the page lock until the response is received, release it after
> doing the send. This will reduce contention for the page lock and should
> prevent processes that have the file mmap'ed from being blocked
> unnecessarily.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/file.c |  292 +++++++++++++++++++++++++++++---------------------------
>  1 files changed, 152 insertions(+), 140 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index b3d2e3f..7ff138f 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1089,32 +1089,65 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
>        return rc;
>  }
>
> +/*
> + * walk a list of in progress async writes and wait for them to complete.
> + * Check the result code on each one.
> + *
> + * There's a clear heirarchy: 0 < EAGAIN < other errors
> + *
> + * If we get an EAGAIN return on a WB_SYNC_ALL writeout, then we need to
> + * go back and do the whole operation again. If we get other errors then
> + * the mapping already likely has AS_EIO or AS_ENOSPC set, so we can
> + * give up and just return an error.
> + */
> +static int
> +cifs_wait_for_write_completion(struct address_space *mapping,
> +                               struct list_head *pending)
> +{
> +       int tmprc, rc = 0;
> +       struct cifs_writedata *wdata;
> +
> +       /* wait for writes to complete */
> +       for (;;) {
> +               /* anything left on list? */
> +               spin_lock(&mapping->private_lock);
> +               if (list_empty(pending)) {
> +                       spin_unlock(&mapping->private_lock);
> +                       break;
> +               }
> +
> +               /* dequeue an entry */
> +               wdata = list_first_entry(pending, struct cifs_writedata,
> +                                        pending);
> +               list_del_init(&wdata->pending);
> +               spin_unlock(&mapping->private_lock);
> +
> +               /* wait for it to complete */
> +               tmprc = wait_for_completion_killable(&wdata->completion);
> +               if (tmprc)
> +                       rc = tmprc;
> +
> +               if (wdata->result == -EAGAIN)
> +                       rc = rc ? rc : wdata->result;
> +               else if (wdata->result != 0)
> +                       rc = wdata->result;
> +
> +               kref_put(&wdata->refcount, cifs_writedata_release);
> +       }
> +
> +       return rc;
> +}
> +
>  static int cifs_writepages(struct address_space *mapping,
>                           struct writeback_control *wbc)
>  {
> -       unsigned int bytes_to_write;
> -       unsigned int bytes_written;
> -       struct cifs_sb_info *cifs_sb;
> -       int done = 0;
> -       pgoff_t end;
> -       pgoff_t index;
> -       int range_whole = 0;
> -       struct kvec *iov;
> -       int len;
> -       int n_iov = 0;
> -       pgoff_t next;
> -       int nr_pages;
> -       __u64 offset = 0;
> -       struct cifsFileInfo *open_file;
> -       struct cifs_tcon *tcon;
> -       struct cifsInodeInfo *cifsi = CIFS_I(mapping->host);
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
> +       bool done = false, scanned = false, range_whole = false;
> +       pgoff_t end, index;
> +       struct cifs_writedata *wdata;
>        struct page *page;
> -       struct pagevec pvec;
> +       struct list_head pending;
>        int rc = 0;
> -       int scanned = 0;
> -       int xid;
> -
> -       cifs_sb = CIFS_SB(mapping->host->i_sb);
>
>        /*
>         * If wsize is smaller that the page cache size, default to writing
> @@ -1123,27 +1156,8 @@ static int cifs_writepages(struct address_space *mapping,
>        if (cifs_sb->wsize < PAGE_CACHE_SIZE)
>                return generic_writepages(mapping, wbc);
>
> -       iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
> -       if (iov == NULL)
> -               return generic_writepages(mapping, wbc);
> -
> -       /*
> -        * if there's no open file, then this is likely to fail too,
> -        * but it'll at least handle the return. Maybe it should be
> -        * a BUG() instead?
> -        */
> -       open_file = find_writable_file(CIFS_I(mapping->host), false);
> -       if (!open_file) {
> -               kfree(iov);
> -               return generic_writepages(mapping, wbc);
> -       }
> -
> -       tcon = tlink_tcon(open_file->tlink);
> -       cifsFileInfo_put(open_file);
> +       INIT_LIST_HEAD(&pending);
>
> -       xid = GetXid();
> -
> -       pagevec_init(&pvec, 0);
>        if (wbc->range_cyclic) {
>                index = mapping->writeback_index; /* Start from prev offset */
>                end = -1;
> @@ -1151,24 +1165,34 @@ static int cifs_writepages(struct address_space *mapping,
>                index = wbc->range_start >> PAGE_CACHE_SHIFT;
>                end = wbc->range_end >> PAGE_CACHE_SHIFT;
>                if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
> -                       range_whole = 1;
> -               scanned = 1;
> +                       range_whole = true;
> +               scanned = true;
>        }
>  retry:
> -       while (!done && (index <= end) &&
> -              (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> -                       PAGECACHE_TAG_DIRTY,
> -                       min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
> -               int first;
> -               unsigned int i;
> -
> -               first = -1;
> -               next = 0;
> -               n_iov = 0;
> -               bytes_to_write = 0;
> -
> -               for (i = 0; i < nr_pages; i++) {
> -                       page = pvec.pages[i];
> +       while (!done && index <= end) {
> +               unsigned int i, nr_pages, found_pages;
> +               pgoff_t next = 0, tofind;
> +
> +               tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
> +                               end - index) + 1;
> +
> +               wdata = cifs_writedata_alloc((unsigned int)tofind);
> +               if (!wdata) {
> +                       rc = -ENOMEM;
> +                       break;
> +               }
> +
> +               found_pages = find_get_pages_tag(mapping, &index,
> +                                                PAGECACHE_TAG_DIRTY,
> +                                                tofind, wdata->pages);
> +               if (found_pages == 0) {
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +                       break;
> +               }
> +
> +               nr_pages = 0;
> +               for (i = 0; i < found_pages; i++) {
> +                       page = wdata->pages[i];
>                        /*
>                         * At this point we hold neither mapping->tree_lock nor
>                         * lock on the page itself: the page may be truncated or
> @@ -1177,7 +1201,7 @@ retry:
>                         * mapping
>                         */
>
> -                       if (first < 0)
> +                       if (nr_pages == 0)
>                                lock_page(page);
>                        else if (!trylock_page(page))
>                                break;
> @@ -1188,7 +1212,7 @@ retry:
>                        }
>
>                        if (!wbc->range_cyclic && page->index > end) {
> -                               done = 1;
> +                               done = true;
>                                unlock_page(page);
>                                break;
>                        }
> @@ -1215,119 +1239,107 @@ retry:
>                        set_page_writeback(page);
>
>                        if (page_offset(page) >= mapping->host->i_size) {
> -                               done = 1;
> +                               done = true;
>                                unlock_page(page);
>                                end_page_writeback(page);
>                                break;
>                        }
>
> -                       /*
> -                        * BB can we get rid of this?  pages are held by pvec
> -                        */
> -                       page_cache_get(page);
> +                       wdata->pages[i] = page;
> +                       next = page->index + 1;
> +                       ++nr_pages;
> +               }
>
> -                       len = min(mapping->host->i_size - page_offset(page),
> -                                 (loff_t)PAGE_CACHE_SIZE);
> +               /* reset index to refind any pages skipped */
> +               if (nr_pages == 0)
> +                       index = wdata->pages[0]->index + 1;
>
> -                       /* reserve iov[0] for the smb header */
> -                       n_iov++;
> -                       iov[n_iov].iov_base = kmap(page);
> -                       iov[n_iov].iov_len = len;
> -                       bytes_to_write += len;
> +               /* put any pages we aren't going to use */
> +               for (i = nr_pages; i < found_pages; i++) {
> +                       page_cache_release(wdata->pages[i]);
> +                       wdata->pages[i] = NULL;
> +               }
>
> -                       if (first < 0) {
> -                               first = i;
> -                               offset = page_offset(page);
> -                       }
> -                       next = page->index + 1;
> -                       if (bytes_to_write + PAGE_CACHE_SIZE > cifs_sb->wsize)
> -                               break;
> +               /* nothing to write? */
> +               if (nr_pages == 0) {
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +                       continue;
>                }
> -               if (n_iov) {
> -retry_write:
> -                       open_file = find_writable_file(CIFS_I(mapping->host),
> -                                                       false);
> -                       if (!open_file) {
> +
> +               wdata->nr_pages = nr_pages;
> +               wdata->offset = page_offset(wdata->pages[0]);
> +
> +               do {
> +                       if (wdata->cfile != NULL)
> +                               cifsFileInfo_put(wdata->cfile);
> +                       wdata->cfile = find_writable_file(CIFS_I(mapping->host),
> +                                                         false);
> +                       if (!wdata->cfile) {
>                                cERROR(1, "No writable handles for inode");
>                                rc = -EBADF;
> -                       } else {
> -                               rc = CIFSSMBWrite2(xid, tcon, open_file->netfid,
> -                                                  bytes_to_write, offset,
> -                                                  &bytes_written, iov, n_iov,
> -                                                  0);
> -                               cifsFileInfo_put(open_file);
> +                               break;
>                        }
> -
> -                       cFYI(1, "Write2 rc=%d, wrote=%u", rc, bytes_written);
> -
> -                       /*
> -                        * For now, treat a short write as if nothing got
> -                        * written. A zero length write however indicates
> -                        * ENOSPC or EFBIG. We have no way to know which
> -                        * though, so call it ENOSPC for now. EFBIG would
> -                        * get translated to AS_EIO anyway.
> -                        *
> -                        * FIXME: make it take into account the data that did
> -                        *        get written
> -                        */
> -                       if (rc == 0) {
> -                               if (bytes_written == 0)
> -                                       rc = -ENOSPC;
> -                               else if (bytes_written < bytes_to_write)
> -                                       rc = -EAGAIN;
> +                       if (wbc->sync_mode == WB_SYNC_ALL) {
> +                               spin_lock(&mapping->private_lock);
> +                               list_move(&wdata->pending, &pending);
> +                               spin_unlock(&mapping->private_lock);
>                        }
> +                       rc = cifs_async_writev(wdata);
> +               } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
>
> -                       /* retry on data-integrity flush */
> -                       if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
> -                               goto retry_write;
> +               for (i = 0; i < nr_pages; ++i)
> +                       unlock_page(wdata->pages[i]);
>
> -                       /* fix the stats and EOF */
> -                       if (bytes_written > 0) {
> -                               cifs_stats_bytes_written(tcon, bytes_written);
> -                               cifs_update_eof(cifsi, offset, bytes_written);
> -                       }
> -
> -                       for (i = 0; i < n_iov; i++) {
> -                               page = pvec.pages[first + i];
> -                               /* on retryable write error, redirty page */
> +               /* send failure -- clean up the mess */
> +               if (rc != 0) {
> +                       for (i = 0; i < nr_pages; ++i) {
>                                if (rc == -EAGAIN)
> -                                       redirty_page_for_writepage(wbc, page);
> -                               else if (rc != 0)
> -                                       SetPageError(page);
> -                               kunmap(page);
> -                               unlock_page(page);
> -                               end_page_writeback(page);
> -                               page_cache_release(page);
> +                                       redirty_page_for_writepage(wbc,
> +                                                          wdata->pages[i]);
> +                               else
> +                                       SetPageError(wdata->pages[i]);
> +                               end_page_writeback(wdata->pages[i]);
> +                               page_cache_release(wdata->pages[i]);
>                        }
> -
>                        if (rc != -EAGAIN)
>                                mapping_set_error(mapping, rc);
> -                       else
> -                               rc = 0;
> -
> -                       if ((wbc->nr_to_write -= n_iov) <= 0)
> -                               done = 1;
> -                       index = next;
> -               } else
> -                       /* Need to re-find the pages we skipped */
> -                       index = pvec.pages[0]->index + 1;
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +               } else if (wbc->sync_mode == WB_SYNC_NONE) {
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +               }
>
> -               pagevec_release(&pvec);
> +               if ((wbc->nr_to_write -= nr_pages) <= 0)
> +                       done = true;
> +               index = next;
>        }
> +
>        if (!scanned && !done) {
>                /*
>                 * We hit the last page and there is more work to be done: wrap
>                 * back to the start of the file
>                 */
> -               scanned = 1;
> +               scanned = true;
>                index = 0;
>                goto retry;
>        }
> +
> +       /*
> +        * for WB_SYNC_ALL, we must wait until the writes complete. If any
> +        * return -EAGAIN however, we need to retry the whole thing again.
> +        * At that point nr_to_write will be all screwed up, but that
> +        * shouldn't really matter for WB_SYNC_ALL (right?)
> +        */
> +       if (wbc->sync_mode == WB_SYNC_ALL) {
> +               rc = cifs_wait_for_write_completion(mapping, &pending);
> +               if (rc == -EAGAIN) {
> +                       index = wbc->range_start >> PAGE_CACHE_SHIFT;
> +                       goto retry;
> +               }
> +       }
> +
>        if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
>                mapping->writeback_index = index;
>
> -       FreeXid(xid);
> -       kfree(iov);
>        return rc;
>  }
>
> --
> 1.7.4.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I tested it again and noticed that it is a little faster that the 1st
variant against Windows 7 on LAN (about 0.5%). Anyway, it seems like a
very good change - Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>.

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/7] cifs: convert cifs_writepages to use async writes
       [not found]         ` <BANLkTintNfs=W+UFaC8SCUTt-hN5BXSFRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-04-15  9:05           ` Pavel Shilovsky
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Shilovsky @ 2011-04-15  9:05 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/15 Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> I tested it again and noticed that it is a little faster that the 1st
> variant against Windows 7 on LAN (about 0.5%). Anyway, it seems like a
> very good change - Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>.
>

Just missed the info that I tested it with a patch #7 (with the fix
for if statement, you proposed later).

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/7] cifs: add cifs_async_writev
       [not found] ` <1305836578-26333-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-05-19 20:22   ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2011-05-19 20:22 UTC (permalink / raw)
  To: smfrench-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Add the ability for CIFS to do an asynchronous write. The kernel will
set the frame up as it would for a "normal" SMBWrite2 request, and use
cifs_call_async to send it. The mid callback will then be configured to
handle the result.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |   18 ++++
 fs/cifs/cifssmb.c   |  236 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 254 insertions(+), 0 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 6931631..41db348 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -438,4 +438,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
 extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
 extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
 			unsigned char *p24);
+
+/* asynchronous write support */
+struct cifs_writedata {
+	struct kref			refcount;
+	enum writeback_sync_modes	sync_mode;
+	struct work_struct		work;
+	struct cifsFileInfo		*cfile;
+	__u64				offset;
+	unsigned int			bytes;
+	int				result;
+	unsigned int			nr_pages;
+	struct page			*pages[1];
+};
+
+int cifs_async_writev(struct cifs_writedata *wdata);
+struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages);
+void cifs_writedata_release(struct kref *refcount);
+
 #endif			/* _CIFSPROTO_H */
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 0aff051..690c8ca 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -32,6 +32,7 @@
 #include <linux/vfs.h>
 #include <linux/slab.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
@@ -1604,6 +1605,241 @@ CIFSSMBWrite(const int xid, struct cifsTconInfo *tcon,
 	return rc;
 }
 
+void
+cifs_writedata_release(struct kref *refcount)
+{
+	struct cifs_writedata *wdata = container_of(refcount,
+					struct cifs_writedata, refcount);
+
+	if (wdata->cfile)
+		cifsFileInfo_put(wdata->cfile);
+
+	kfree(wdata);
+}
+
+/*
+ * Write failed with a retryable error. Resend the write request. It's also
+ * possible that the page was redirtied so re-clean the page.
+ */
+static void
+cifs_writev_requeue(struct cifs_writedata *wdata)
+{
+	int i, rc;
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+
+	for (i = 0; i < wdata->nr_pages; i++) {
+		lock_page(wdata->pages[i]);
+		clear_page_dirty_for_io(wdata->pages[i]);
+	}
+
+	do {
+		rc = cifs_async_writev(wdata);
+	} while (rc == -EAGAIN);
+
+	for (i = 0; i < wdata->nr_pages; i++) {
+		if (rc != 0)
+			SetPageError(wdata->pages[i]);
+		unlock_page(wdata->pages[i]);
+	}
+
+	mapping_set_error(inode->i_mapping, rc);
+	kref_put(&wdata->refcount, cifs_writedata_release);
+}
+
+static void
+cifs_writev_complete(struct work_struct *work)
+{
+	struct cifs_writedata *wdata = container_of(work,
+						struct cifs_writedata, work);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	int i = 0;
+
+	if (wdata->result == 0) {
+		cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
+		cifs_stats_bytes_written(tlink_tcon(wdata->cfile->tlink),
+					 wdata->bytes);
+	} else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN)
+		return cifs_writev_requeue(wdata);
+
+	for (i = 0; i < wdata->nr_pages; i++) {
+		struct page *page = wdata->pages[i];
+		if (wdata->result == -EAGAIN)
+			__set_page_dirty_nobuffers(page);
+		else if (wdata->result < 0)
+			SetPageError(page);
+		end_page_writeback(page);
+		page_cache_release(page);
+	}
+	if (wdata->result != -EAGAIN)
+		mapping_set_error(inode->i_mapping, wdata->result);
+	kref_put(&wdata->refcount, cifs_writedata_release);
+}
+
+struct cifs_writedata *
+cifs_writedata_alloc(unsigned int nr_pages)
+{
+	struct cifs_writedata *wdata;
+
+	/* this would overflow */
+	if (nr_pages == 0) {
+		cERROR(1, "%s: called with nr_pages == 0!", __func__);
+		return NULL;
+	}
+
+	/* writedata + number of page pointers */
+	wdata = kzalloc(sizeof(*wdata) +
+			sizeof(struct page *) * (nr_pages - 1), GFP_NOFS);
+	if (wdata != NULL) {
+		INIT_WORK(&wdata->work, cifs_writev_complete);
+		kref_init(&wdata->refcount);
+	}
+	return wdata;
+}
+
+/*
+ * Check the midState and signature on received buffer (if any), and queue the
+ * workqueue completion task.
+ */
+static void
+cifs_writev_callback(struct mid_q_entry *mid)
+{
+	struct cifs_writedata *wdata = mid->callback_data;
+	struct cifsTconInfo *tcon = tlink_tcon(wdata->cfile->tlink);
+	unsigned int written;
+	WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
+
+	switch (mid->midState) {
+	case MID_RESPONSE_RECEIVED:
+		wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
+		if (wdata->result != 0)
+			break;
+
+		written = le16_to_cpu(smb->CountHigh);
+		written <<= 16;
+		written += le16_to_cpu(smb->Count);
+		/*
+		 * Mask off high 16 bits when bytes written as returned
+		 * by the server is greater than bytes requested by the
+		 * client. OS/2 servers are known to set incorrect
+		 * CountHigh values.
+		 */
+		if (written > wdata->bytes)
+			written &= 0xFFFF;
+
+		if (written < wdata->bytes)
+			wdata->result = -ENOSPC;
+		else
+			wdata->bytes = written;
+		break;
+	case MID_REQUEST_SUBMITTED:
+	case MID_RETRY_NEEDED:
+		wdata->result = -EAGAIN;
+		break;
+	default:
+		wdata->result = -EIO;
+		break;
+	}
+
+	queue_work(system_nrt_wq, &wdata->work);
+	DeleteMidQEntry(mid);
+	atomic_dec(&tcon->ses->server->inFlight);
+	wake_up(&tcon->ses->server->request_q);
+}
+
+/* cifs_async_writev - send an async write, and set up mid to handle result */
+int
+cifs_async_writev(struct cifs_writedata *wdata)
+{
+	int i, rc = -EACCES;
+	WRITE_REQ *smb = NULL;
+	int wct;
+	struct cifsTconInfo *tcon = tlink_tcon(wdata->cfile->tlink);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	struct kvec *iov = NULL;
+
+	if (tcon->ses->capabilities & CAP_LARGE_FILES) {
+		wct = 14;
+	} else {
+		wct = 12;
+		if (wdata->offset >> 32 > 0) {
+			/* can not handle big offset for old srv */
+			return -EIO;
+		}
+	}
+
+	rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
+	if (rc)
+		goto async_writev_out;
+
+	/* 1 iov per page + 1 for header */
+	iov = kzalloc((wdata->nr_pages + 1) * sizeof(*iov), GFP_NOFS);
+	if (iov == NULL) {
+		rc = -ENOMEM;
+		goto async_writev_out;
+	}
+
+	smb->AndXCommand = 0xFF;	/* none */
+	smb->Fid = wdata->cfile->netfid;
+	smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
+	if (wct == 14)
+		smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
+	smb->Reserved = 0xFFFFFFFF;
+	smb->WriteMode = 0;
+	smb->Remaining = 0;
+
+	smb->DataOffset =
+	    cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
+
+	/* 4 for RFC1001 length + 1 for BCC */
+	iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
+	iov[0].iov_base = smb;
+
+	/* marshal up the pages into iov array */
+	wdata->bytes = 0;
+	for (i = 0; i < wdata->nr_pages; i++) {
+		iov[i + 1].iov_len = min(inode->i_size -
+				      page_offset(wdata->pages[i]),
+					(loff_t)PAGE_CACHE_SIZE);
+		iov[i + 1].iov_base = kmap(wdata->pages[i]);
+		wdata->bytes += iov[i + 1].iov_len;
+	}
+
+	cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
+
+	smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
+	smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
+
+	if (wct == 14) {
+		inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
+		put_bcc(wdata->bytes + 1, &smb->hdr);
+	} else {
+		/* wct == 12 */
+		struct smb_com_writex_req *smbw =
+				(struct smb_com_writex_req *)smb;
+		inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
+		put_bcc(wdata->bytes + 5, &smbw->hdr);
+		iov[0].iov_len += 4; /* pad bigger by four bytes */
+	}
+
+	kref_get(&wdata->refcount);
+	rc = cifs_call_async(tcon->ses->server, iov, wdata->nr_pages + 1,
+			     cifs_writev_callback, wdata, false);
+
+	if (rc == 0)
+		cifs_stats_inc(&tcon->num_writes);
+	else
+		kref_put(&wdata->refcount, cifs_writedata_release);
+
+	/* send is done, unmap pages */
+	for (i = 0; i < wdata->nr_pages; i++)
+		kunmap(wdata->pages[i]);
+
+async_writev_out:
+	cifs_small_buf_release(smb);
+	kfree(iov);
+	return rc;
+}
+
 int
 CIFSSMBWrite2(const int xid, struct cifsTconInfo *tcon,
 	     const int netfid, const unsigned int count,
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-19 20:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-13 11:43 [PATCH 0/7] cifs: asynchronous writepages support (try #2) Jeff Layton
     [not found] ` <1302694994-8303-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-13 11:43   ` [PATCH 1/7] cifs: consolidate SendReceive response checks Jeff Layton
2011-04-13 11:43   ` [PATCH 2/7] cifs: make cifs_send_async take a kvec array Jeff Layton
2011-04-13 11:43   ` [PATCH 3/7] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
2011-04-13 11:43   ` [PATCH 4/7] cifs: add ignore_pend flag to cifs_call_async Jeff Layton
     [not found]     ` <1302694994-8303-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-15  8:48       ` Pavel Shilovsky
2011-04-13 11:43   ` [PATCH 5/7] cifs: add cifs_async_writev Jeff Layton
2011-04-13 11:43   ` [PATCH 6/7] cifs: convert cifs_writepages to use async writes Jeff Layton
     [not found]     ` <1302694994-8303-7-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-15  9:03       ` Pavel Shilovsky
     [not found]         ` <BANLkTintNfs=W+UFaC8SCUTt-hN5BXSFRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-15  9:05           ` Pavel Shilovsky
2011-04-13 11:43   ` [PATCH 7/7] cifs: clean up wsize negotiation and allow for larger wsize Jeff Layton
     [not found]     ` <1302694994-8303-8-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-13 12:13       ` Jeff Layton
2011-05-19 20:22 [PATCH 0/7] cifs: asynchronous writepages support (try #4) Jeff Layton
     [not found] ` <1305836578-26333-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-05-19 20:22   ` [PATCH 5/7] cifs: add cifs_async_writev Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.