All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] cifs: asynchronous writepages support (try #1)
@ 2011-04-02 12:23 Jeff Layton
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

This patchset changes the CIFS code to do asynchronous writes in
writepages(). It uses the asynchronous call infrastructure that was
added to handle SMB echoes. The idea here is to have the kernel
issue a write request and then have it handle the reply asynchronously.

For now, this just changes the writepages codepath. Once the patchset
has had a bit more refinement and testing, I'll see about changing some
of the other codepaths (writepage(), for instance).

So far, I have *not* seen a large amount of performance increase from
this. That may just be because my test rig sucks though. I'm planning to
reserve some beefier machines for testing next week sometime. In the
meantime, if anyone wants to help test this out please do so and report
the results.

I'm not 100% thrilled with this approach overall -- I think we do need
to handle writes asynchronously, but the fact that we're using our own
writepages routine kind of hamstrings this code.

Another possible approach here would be to move to more page-based I/O
like NFS has. Have writepage() set up the pages for writeback and
coalesce them together as it goes, and then issue the write all at
once. That would also allow us to handle larger write sizes than 56k.

Obviously, that's a much larger project to cobble together however. Much
of this code would still be applicable if we did decide to go that route
eventually.

Obviously, this patch is not quite ready for merge in 2.6.39, but I
would like to have this ready to go when the 2.6.40 merge window opens.

Jeff Layton (5):
  cifs: consolidate SendReceive response checks
  cifs: make cifs_send_async take a kvec array
  cifs: don't call mid_q_entry->callback under the Global_MidLock
  cifs: add cifs_async_writev
  cifs: convert cifs_writepages to use async writes

 fs/cifs/cifsglob.h      |    7 +-
 fs/cifs/cifsproto.h     |   27 +++++-
 fs/cifs/cifssmb.c       |  225 +++++++++++++++++++++++++++++++++++++++++-
 fs/cifs/connect.c       |   32 +++++--
 fs/cifs/file.c          |  256 +++++++++++++++++++++++++----------------------
 fs/cifs/netmisc.c       |    2 +-
 fs/cifs/smb2transport.c |   14 +--
 fs/cifs/transport.c     |  194 +++++++++++-------------------------
 8 files changed, 473 insertions(+), 284 deletions(-)

-- 
1.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] cifs: consolidate SendReceive response checks
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-02 12:23   ` Jeff Layton
       [not found]     ` <1301747039-31411-2-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 2/5] cifs: make cifs_send_async take a kvec array Jeff Layton
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Further consolidate the SendReceive code by moving the checks run over
the packet into a separate function that all the SendReceive variants
can call.

We can also eliminate the check for a receive_len that's too big or too
small. cifs_demultiplex_thread already checks that and disconnects the
socket if that occurs, while setting the midStatus to MALFORMED. It'll
never call this code if that's the case.

Finally do a little cleanup. Use "goto out" on errors so that the flow
of code in the normal case is more evident. Also switch the logErr
variable in map_smb_to_linux_error to a bool.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |    4 +-
 fs/cifs/netmisc.c   |    2 +-
 fs/cifs/transport.c |  158 ++++++++++++++-------------------------------------
 3 files changed, 46 insertions(+), 118 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 76c4dc7f..4af7db8 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -74,6 +74,8 @@ extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
 			int * /* bytes returned */ , const int long_op);
 extern int SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
 			struct smb_hdr *in_buf, int flags);
+extern int cifs_check_receive(struct mid_q_entry *mid,
+			struct TCP_Server_Info *server, bool log_error);
 extern int SendReceive2(const unsigned int /* xid */ , struct cifs_ses *,
 			struct kvec *, int /* nvec to send */,
 			int * /* type of buf returned */ , const int flags);
@@ -97,7 +99,7 @@ extern int cifs_convert_address(struct sockaddr *dst, const char *src, int len);
 extern int cifs_set_port(struct sockaddr *addr, const unsigned short int port);
 extern int cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
 				const unsigned short int port);
-extern int map_smb_to_linux_error(struct smb_hdr *smb, int logErr);
+extern int map_smb_to_linux_error(struct smb_hdr *smb, bool logErr);
 extern void header_assemble(struct smb_hdr *, char /* command */ ,
 			    const struct cifs_tcon *, int /* length of
 			    fixed section (word count) in two byte units */);
diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
index 79b71c2..73e47e8 100644
--- a/fs/cifs/netmisc.c
+++ b/fs/cifs/netmisc.c
@@ -836,7 +836,7 @@ ntstatus_to_dos(__u32 ntstatus, __u8 *eclass, __u16 *ecode)
 }
 
 int
-map_smb_to_linux_error(struct smb_hdr *smb, int logErr)
+map_smb_to_linux_error(struct smb_hdr *smb, bool logErr)
 {
 	unsigned int i;
 	int rc = -EIO;	/* if transport error smb error may not be set */
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 85063de..d9833fc 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -505,13 +505,32 @@ send_nt_cancel(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 }
 
 int
+cifs_check_receive(struct mid_q_entry *mid, struct TCP_Server_Info *server,
+		   bool log_error)
+{
+	dump_smb(mid->resp_buf,
+		 min_t(u32, 92, be32_to_cpu(mid->resp_buf->smb_buf_length)));
+
+	/* convert the length into a more usable form */
+	if (server->sec_mode & (SECMODE_SIGN_REQUIRED |
+				SECMODE_SIGN_ENABLED)) {
+		/* FIXME: add code to kill session */
+		if (cifs_verify_signature(mid->resp_buf, server,
+					  mid->sequence_number + 1) != 0)
+			cERROR(1, "Unexpected SMB signature");
+	}
+
+	/* BB special case reconnect tid and uid here? */
+	return map_smb_to_linux_error(mid->resp_buf, log_error);
+}
+
+int
 SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 	     struct kvec *iov, int n_vec, int *pRespBufType /* ret */,
 	     const int flags)
 {
 	int rc = 0;
 	int long_op;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 	struct smb_hdr *in_buf = iov[0].iov_base;
 
@@ -604,54 +623,24 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 		return rc;
 	}
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
+	if (!midQ->resp_buf || midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
+		cFYI(1, "Bad MID state?");
 		goto out;
 	}
 
-	/* rcvd frame is ok */
-
-	if (midQ->resp_buf &&
-	    (midQ->midState == MID_RESPONSE_RECEIVED)) {
-
-		iov[0].iov_base = (char *)midQ->resp_buf;
-		if (midQ->largeBuf)
-			*pRespBufType = CIFS_LARGE_BUFFER;
-		else
-			*pRespBufType = CIFS_SMALL_BUFFER;
-		iov[0].iov_len = receive_len + 4;
-
-		dump_smb(midQ->resp_buf, 80);
-		/* convert the length into a more usable form */
-		if ((receive_len > 24) &&
-		    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-					     SECMODE_SIGN_ENABLED))) {
-			rc = cifs_verify_signature(midQ->resp_buf,
-						ses->server,
-						midQ->sequence_number+1);
-			if (rc) {
-				cERROR(1, "Unexpected SMB signature");
-				/* BB FIXME add code to kill session */
-			}
-		}
-
-		/* BB special case reconnect tid and uid here? */
-		rc = map_smb_to_linux_error(midQ->resp_buf,
-					    flags & CIFS_LOG_ERROR);
+	iov[0].iov_base = (char *)midQ->resp_buf;
+	iov[0].iov_len = be32_to_cpu(midQ->resp_buf->smb_buf_length) + 4;
+	if (midQ->largeBuf)
+		*pRespBufType = CIFS_LARGE_BUFFER;
+	else
+		*pRespBufType = CIFS_SMALL_BUFFER;
 
-		if ((flags & CIFS_NO_RESP) == 0)
-			midQ->resp_buf = NULL;  /* mark it so buf will
-						   not be freed by
-						   delete_mid */
-	} else {
-		rc = -EIO;
-		cFYI(1, "Bad MID state?");
-	}
+	rc = cifs_check_receive(midQ, ses->server, flags & CIFS_LOG_ERROR);
 
+	/* mark it so buf will not be freed by delete_mid */
+	if ((flags & CIFS_NO_RESP) == 0)
+		midQ->resp_buf = NULL;
 out:
 	delete_mid(midQ);
 	atomic_dec(&ses->server->inFlight);
@@ -666,7 +655,6 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 	    int *pbytes_returned, const int long_op)
 {
 	int rc = 0;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 
 	if (ses == NULL) {
@@ -753,47 +741,16 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 		return rc;
 	}
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
-		rc = -EIO;
-		goto out;
-	}
-
-	/* rcvd frame is ok */
-
-	if (midQ->resp_buf && out_buf
-	    && (midQ->midState == MID_RESPONSE_RECEIVED)) {
-		out_buf->smb_buf_length = cpu_to_be32(receive_len);
-		memcpy((char *)out_buf + 4,
-		       (char *)midQ->resp_buf + 4,
-		       receive_len);
-
-		dump_smb(out_buf, 92);
-		/* convert the length into a more usable form */
-		if ((receive_len > 24) &&
-		    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-					     SECMODE_SIGN_ENABLED))) {
-			rc = cifs_verify_signature(out_buf,
-						ses->server,
-						midQ->sequence_number+1);
-			if (rc) {
-				cERROR(1, "Unexpected SMB signature");
-				/* BB FIXME add code to kill session */
-			}
-		}
-
-		*pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
-
-		/* BB special case reconnect tid and uid here? */
-		rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
-	} else {
+	if (!midQ->resp_buf || !out_buf ||
+	    midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
 		cERROR(1, "Bad MID state?");
+		goto out;
 	}
 
+	*pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
+	memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
+	rc = cifs_check_receive(midQ, ses->server, 0);
 out:
 	delete_mid(midQ);
 	atomic_dec(&ses->server->inFlight);
@@ -834,7 +791,6 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 {
 	int rc = 0;
 	int rstart = 0;
-	unsigned int receive_len;
 	struct mid_q_entry *midQ;
 	struct cifs_ses *ses;
 
@@ -953,46 +909,16 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 	if (rc != 0)
 		return rc;
 
-	receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
-	if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
-		cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
-			receive_len, xid);
-		rc = -EIO;
-		goto out;
-	}
-
 	/* rcvd frame is ok */
-
-	if ((out_buf == NULL) || (midQ->midState != MID_RESPONSE_RECEIVED)) {
+	if (out_buf == NULL || midQ->midState != MID_RESPONSE_RECEIVED) {
 		rc = -EIO;
 		cERROR(1, "Bad MID state?");
 		goto out;
 	}
 
-	out_buf->smb_buf_length = cpu_to_be32(receive_len);
-	memcpy((char *)out_buf + 4,
-	       (char *)midQ->resp_buf + 4,
-	       receive_len);
-
-	dump_smb(out_buf, 92);
-	/* convert the length into a more usable form */
-	if ((receive_len > 24) &&
-	    (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
-				     SECMODE_SIGN_ENABLED))) {
-		rc = cifs_verify_signature(out_buf,
-					   ses->server,
-					   midQ->sequence_number+1);
-		if (rc) {
-			cERROR(1, "Unexpected SMB signature");
-			/* BB FIXME add code to kill session */
-		}
-	}
-
-	*pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
-
-	/* BB special case reconnect tid and uid here? */
-	rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
-
+	*pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
+	memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
+	rc = cifs_check_receive(midQ, ses->server, 0);
 out:
 	delete_mid(midQ);
 	if (rstart && rc == -EACCES)
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] cifs: make cifs_send_async take a kvec array
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 1/5] cifs: consolidate SendReceive response checks Jeff Layton
@ 2011-04-02 12:23   ` Jeff Layton
       [not found]     ` <1301747039-31411-3-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 3/5] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

We'll need this for async writes, so convert the call to take a kvec
array. CIFSSMBEcho is changed to put a kvec on the stack and pass
in the SMB buffer using that.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |    4 ++--
 fs/cifs/cifssmb.c   |    6 ++++--
 fs/cifs/transport.c |   13 +++++++------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 4af7db8..e255a2b 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -65,8 +65,8 @@ extern struct mid_q_entry *AllocMidQEntry(const struct smb_hdr *smb_buffer,
 					struct TCP_Server_Info *server);
 extern void DeleteMidQEntry(struct mid_q_entry *midEntry);
 extern int wait_for_free_request(struct TCP_Server_Info *sv, const int long_op);
-extern int cifs_call_async(struct TCP_Server_Info *server,
-			   struct smb_hdr *in_buf, mid_callback_t *callback,
+extern int cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
+			   unsigned int nvec, mid_callback_t *callback,
 			   void *cbdata);
 extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
 			struct smb_hdr * /* input */ ,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 655f24c..8de7e79 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -725,6 +725,7 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
 {
 	ECHO_REQ *smb;
 	int rc = 0;
+	struct kvec iov;
 
 	cFYI(1, "In echo request");
 
@@ -739,9 +740,10 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
 	put_bcc(1, &smb->hdr);
 	smb->Data[0] = 'a';
 	inc_rfc1001_len(smb, 3);
+	iov.iov_base = smb;
+	iov.iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4;
 
-	rc = cifs_call_async(server, (struct smb_hdr *)smb,
-				cifs_echo_callback, server);
+	rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server);
 	if (rc)
 		cFYI(1, "Echo request failed: %d", rc);
 
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index d9833fc..7ccf292 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -349,11 +349,12 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
  * the result. Caller is responsible for dealing with timeouts.
  */
 int
-cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
-		mid_callback_t *callback, void *cbdata)
+cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
+		unsigned int nvec, mid_callback_t *callback, void *cbdata)
 {
 	int rc;
 	struct mid_q_entry *mid;
+	struct smb_hdr *hdr = (struct smb_hdr *)iov[0].iov_base;
 
 	rc = wait_for_free_request(server, CIFS_ASYNC_OP);
 	if (rc)
@@ -361,10 +362,10 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 
 	/* enable signing if server requires it */
 	if (server->sec_mode & (SECMODE_SIGN_REQUIRED | SECMODE_SIGN_ENABLED))
-		in_buf->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
+		hdr->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
 
 	mutex_lock(&server->srv_mutex);
-	mid = AllocMidQEntry(in_buf, server);
+	mid = AllocMidQEntry(hdr, server);
 	if (mid == NULL) {
 		mutex_unlock(&server->srv_mutex);
 		return -ENOMEM;
@@ -375,7 +376,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 	list_add_tail(&mid->qhead, &server->pending_mid_q);
 	spin_unlock(&GlobalMid_Lock);
 
-	rc = cifs_sign_smb(in_buf, server, &mid->sequence_number);
+	rc = cifs_sign_smb2(iov, nvec, server, &mid->sequence_number);
 	if (rc) {
 		mutex_unlock(&server->srv_mutex);
 		goto out_err;
@@ -385,7 +386,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
 	mid->callback_data = cbdata;
 	mid->midState = MID_REQUEST_SUBMITTED;
 	cifs_in_send_inc(server);
-	rc = smb_send(server, in_buf, be32_to_cpu(in_buf->smb_buf_length));
+	rc = smb_sendv(server, iov, nvec);
 	cifs_in_send_dec(server);
 	cifs_save_when_sent(mid);
 	mutex_unlock(&server->srv_mutex);
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] cifs: don't call mid_q_entry->callback under the Global_MidLock
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 1/5] cifs: consolidate SendReceive response checks Jeff Layton
  2011-04-02 12:23   ` [PATCH 2/5] cifs: make cifs_send_async take a kvec array Jeff Layton
@ 2011-04-02 12:23   ` Jeff Layton
       [not found]     ` <1301747039-31411-4-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 4/5] cifs: add cifs_async_writev Jeff Layton
  2011-04-02 12:23   ` [PATCH 5/5] cifs: convert cifs_writepages to use async writes Jeff Layton
  4 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Holding the spinlock while we call this function means that it can't
sleep, which really limits what it can do. Taking it out from under
the spinlock also means less contention for this global lock.

Change the semantics such that the Global_MidLock is not held when
the callback is called. To do this requires that we take extra care
not to have sync_mid_result remove the mid from the list when the
mid is in a state where that has already happened. This prevents
list corruption when the mid is sitting on a private list for
reconnect or when cifsd is coming down.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsglob.h      |    7 +++----
 fs/cifs/connect.c       |   32 +++++++++++++++++++++++++-------
 fs/cifs/smb2transport.c |   14 ++++----------
 fs/cifs/transport.c     |   23 ++++++++---------------
 4 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index ccbac61..b3a6404 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -583,9 +583,8 @@ struct mid_q_entry;
  * This is the prototype for the mid callback function. When creating one,
  * take special care to avoid deadlocks. Things to bear in mind:
  *
- * - it will be called by cifsd
- * - the GlobalMid_Lock will be held
- * - the mid will be removed from the pending_mid_q list
+ * - it will be called by cifsd, with no locks held
+ * - the mid will be removed from any lists
  */
 typedef void (mid_callback_t)(struct mid_q_entry *mid);
 
@@ -739,7 +738,7 @@ static inline void free_dfs_info_array(struct dfs_info3_param *param,
 #define   MID_RESPONSE_RECEIVED 4
 #define   MID_RETRY_NEEDED      8 /* session closed while this request out */
 #define   MID_RESPONSE_MALFORMED 0x10
-#define   MID_NO_RESPONSE_NEEDED 0x20
+#define   MID_SHUTDOWN		 0x20
 
 /* Types of response buffer returned from SendReceive2 */
 #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 94e60c5..2b5d248 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -141,6 +141,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	struct cifs_ses *ses;
 	struct cifs_tcon *tcon;
 	struct mid_q_entry *mid_entry;
+	struct list_head retry_list;
 
 	spin_lock(&GlobalMid_Lock);
 	if (server->tcpStatus == CifsExiting) {
@@ -192,16 +193,23 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	mutex_unlock(&server->srv_mutex);
 
 	/* mark submitted MIDs for retry and issue callback */
-	cFYI(1, "%s: issuing mid callbacks", __func__);
+	INIT_LIST_HEAD(&retry_list);
+	cFYI(1, "%s: moving mids to private list", __func__);
 	spin_lock(&GlobalMid_Lock);
 	list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 		if (mid_entry->midState == MID_REQUEST_SUBMITTED)
 			mid_entry->midState = MID_RETRY_NEEDED;
+		list_move(&mid_entry->qhead, &retry_list);
+	}
+	spin_unlock(&GlobalMid_Lock);
+
+	cFYI(1, "%s: moving mids to private list", __func__);
+	list_for_each_safe(tmp, tmp2, &retry_list) {
+		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 		list_del_init(&mid_entry->qhead);
 		mid_entry->callback(mid_entry);
 	}
-	spin_unlock(&GlobalMid_Lock);
 
 	while (server->tcpStatus == CifsNeedReconnect) {
 		try_to_freeze();
@@ -655,11 +663,10 @@ multi_t2_fnd:
 				else
 					mid_entry->midState =
 							MID_RESPONSE_MALFORMED;
+				list_del_init(&mid_entry->qhead);
 #ifdef CONFIG_CIFS_STATS2
 				mid_entry->when_received = jiffies;
 #endif
-				list_del_init(&mid_entry->qhead);
-				mid_entry->callback(mid_entry);
 				break;
 			}
 			mid_entry = NULL;
@@ -667,6 +674,7 @@ multi_t2_fnd:
 		spin_unlock(&GlobalMid_Lock);
 
 		if (mid_entry != NULL) {
+			mid_entry->callback(mid_entry);
 			/* Was previous buf put in mpx struct for multi-rsp? */
 			if (!isMultiRsp) {
 				/* smb buffer will be freed by user thread */
@@ -730,15 +738,25 @@ multi_t2_fnd:
 		cifs_small_buf_release(smallbuf);
 
 	if (!list_empty(&server->pending_mid_q)) {
+		struct list_head dispose_list;
+
+		INIT_LIST_HEAD(&dispose_list);
 		spin_lock(&GlobalMid_Lock);
 		list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-			cFYI(1, "Clearing Mid 0x%x - issuing callback",
-					 mid_entry->mid);
+			cFYI(1, "Clearing mid 0x%x", mid_entry->mid);
+			mid_entry->midState = MID_SHUTDOWN;
+			list_move(&mid_entry->qhead, &dispose_list);
+		}
+		spin_unlock(&GlobalMid_Lock);
+
+		/* now walk dispose list and issue callbacks */
+		list_for_each_safe(tmp, tmp2, &dispose_list) {
+			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
+			cFYI(1, "Callback mid 0x%x", mid_entry->mid);
 			list_del_init(&mid_entry->qhead);
 			mid_entry->callback(mid_entry);
 		}
-		spin_unlock(&GlobalMid_Lock);
 		/* 1/8th of sec is more than enough time for them to exit */
 		msleep(125);
 	}
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index a187587..e0fa75308 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -276,28 +276,22 @@ sync_smb2_mid_result(struct smb2_mid_entry *mid, struct TCP_Server_Info *server)
 
 	spin_lock(&GlobalMid_Lock);
 	/* ensure that it's no longer on the pending_mid_q */
-	list_del_init(&mid->qhead);
 
 	switch (mid->mid_state) {
 	case MID_RESPONSE_RECEIVED:
 		spin_unlock(&GlobalMid_Lock);
 		return rc;
-	case MID_REQUEST_SUBMITTED:
-		/* socket is going down, reject all calls */
-		if (server->tcpStatus == CifsExiting) {
-			cERROR(1, "%s: canceling mid=%lld cmd=0x%x state=%d",
-				__func__, mid->mid, le16_to_cpu(mid->command),
-				mid->mid_state);
-			rc = -EHOSTDOWN;
-			break;
-		}
 	case MID_RETRY_NEEDED:
 		rc = -EAGAIN;
 		break;
 	case MID_RESPONSE_MALFORMED:
 		rc = -EIO;
 		break;
+	case MID_SHUTDOWN:
+		rc = -EHOSTDOWN;
+		break;
 	default:
+		list_del_init(&mid->qhead);
 		cERROR(1, "%s: invalid mid state mid=%lld state=%d", __func__,
 			mid->mid, mid->mid_state);
 		rc = -EIO;
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 7ccf292..a31c279 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -428,7 +428,7 @@ SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
 }
 
 static int
-sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
+cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 {
 	int rc = 0;
 
@@ -436,28 +436,21 @@ sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 		mid->mid, mid->midState);
 
 	spin_lock(&GlobalMid_Lock);
-	/* ensure that it's no longer on the pending_mid_q */
-	list_del_init(&mid->qhead);
-
 	switch (mid->midState) {
 	case MID_RESPONSE_RECEIVED:
 		spin_unlock(&GlobalMid_Lock);
 		return rc;
-	case MID_REQUEST_SUBMITTED:
-		/* socket is going down, reject all calls */
-		if (server->tcpStatus == CifsExiting) {
-			cERROR(1, "%s: canceling mid=%d cmd=0x%x state=%d",
-			       __func__, mid->mid, mid->command, mid->midState);
-			rc = -EHOSTDOWN;
-			break;
-		}
 	case MID_RETRY_NEEDED:
 		rc = -EAGAIN;
 		break;
 	case MID_RESPONSE_MALFORMED:
 		rc = -EIO;
 		break;
+	case MID_SHUTDOWN:
+		rc = -EHOSTDOWN;
+		break;
 	default:
+		list_del_init(&mid->qhead);
 		cERROR(1, "%s: invalid mid state mid=%d state=%d", __func__,
 			mid->mid, mid->midState);
 		rc = -EIO;
@@ -617,7 +610,7 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
 
 	cifs_small_buf_release(in_buf);
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0) {
 		atomic_dec(&ses->server->inFlight);
 		wake_up(&ses->server->request_q);
@@ -735,7 +728,7 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
 		spin_unlock(&GlobalMid_Lock);
 	}
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0) {
 		atomic_dec(&ses->server->inFlight);
 		wake_up(&ses->server->request_q);
@@ -906,7 +899,7 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
 		rstart = 1;
 	}
 
-	rc = sync_mid_result(midQ, ses->server);
+	rc = cifs_sync_mid_result(midQ, ses->server);
 	if (rc != 0)
 		return rc;
 
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] cifs: add cifs_async_writev
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2011-04-02 12:23   ` [PATCH 3/5] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
@ 2011-04-02 12:23   ` Jeff Layton
       [not found]     ` <1301747039-31411-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2011-04-02 12:23   ` [PATCH 5/5] cifs: convert cifs_writepages to use async writes Jeff Layton
  4 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Add the ability for CIFS to do an asynchronous write. The kernel will
set the frame up as it would for a "normal" SMBWrite2 request, and use
cifs_call_async to send it. The mid callback will then be configured to
handle the result.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifsproto.h |   19 +++++
 fs/cifs/cifssmb.c   |  219 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 238 insertions(+), 0 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index e255a2b..7b964df 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -21,6 +21,7 @@
 #ifndef _CIFSPROTO_H
 #define _CIFSPROTO_H
 #include <linux/nls.h>
+#include <linux/pagevec.h>
 
 struct statfs;
 struct smb_vol;
@@ -433,4 +434,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
 extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
 extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
 			unsigned char *p24);
+
+/* asynchronous write support */
+struct cifs_writedata {
+	struct list_head	pending;
+	struct kref		refcount;
+	struct completion	completion;
+	struct work_struct	work;
+	struct cifsFileInfo	*cfile;
+	__u64			offset;
+	unsigned int		bytes;
+	int			result;
+	struct page 		*pages[PAGEVEC_SIZE];
+};
+
+int cifs_async_writev(struct cifs_writedata *wdata);
+struct cifs_writedata *cifs_writedata_alloc(void);
+void cifs_writedata_release(struct kref *refcount);
+
 #endif			/* _CIFSPROTO_H */
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 8de7e79..b308605 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -32,6 +32,7 @@
 #include <linux/vfs.h>
 #include <linux/slab.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/pagemap.h>
 #include <asm/uaccess.h>
 #include "cifspdu.h"
 #include "cifsglob.h"
@@ -1604,6 +1605,224 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
 	return rc;
 }
 
+void
+cifs_writedata_release(struct kref *refcount)
+{
+	struct cifs_writedata *wdata = container_of(refcount,
+					struct cifs_writedata, refcount);
+	struct address_space *mapping;
+
+	if (wdata->cfile) {
+		mapping = wdata->cfile->dentry->d_inode->i_mapping;
+
+		spin_lock(&mapping->private_lock);
+		list_del(&wdata->pending);
+		spin_unlock(&mapping->private_lock);
+
+		cifsFileInfo_put(wdata->cfile);
+	}
+
+	kfree(wdata);
+}
+
+static void
+cifs_writev_complete(struct work_struct *work)
+{
+	struct cifs_writedata *wdata = container_of(work,
+						struct cifs_writedata, work);
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	struct page *page;
+	int i = 0;
+
+	if (wdata->result == 0) {
+		cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
+		cifs_stats_bytes_written(tcon, wdata->bytes);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(wdata->pages) && (page = wdata->pages[i]);
+			i++) {
+		if (wdata->result == -EAGAIN)
+			__set_page_dirty_nobuffers(page);
+		else if (wdata->result < 0)
+			SetPageError(page);
+		end_page_writeback(page);
+		page_cache_release(page);
+	}
+	if (wdata->result != -EAGAIN)
+		mapping_set_error(inode->i_mapping, wdata->result);
+	complete(&wdata->completion);
+	kref_put(&wdata->refcount, cifs_writedata_release);
+}
+
+struct cifs_writedata *
+cifs_writedata_alloc(void)
+{
+	struct cifs_writedata *wdata;
+
+	wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
+	if (wdata != NULL) {
+		INIT_LIST_HEAD(&wdata->pending);
+		INIT_WORK(&wdata->work, cifs_writev_complete);
+		kref_init(&wdata->refcount);
+		init_completion(&wdata->completion);
+	}
+	return wdata;
+}
+
+/*
+ * Check the midState and signature on received buffer (if any), and queue the
+ * workqueue completion task.
+ */
+static void
+cifs_writev_callback(struct mid_q_entry *mid)
+{
+	struct cifs_writedata *wdata = mid->callback_data;
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	unsigned int written;
+	WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
+
+	switch(mid->midState) {
+	case MID_RESPONSE_RECEIVED:
+		wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
+		if (wdata->result != 0)
+			break;
+
+		written = le16_to_cpu(smb->CountHigh);
+		written <<= 16;
+		written += le16_to_cpu(smb->Count);
+		/*
+		 * Mask off high 16 bits when bytes written as returned
+		 * by the server is greater than bytes requested by the
+		 * client. OS/2 servers are known to set incorrect
+		 * CountHigh values.
+		 */
+		if (written > wdata->bytes)
+			written &= 0xFFFF;
+
+		if (written < wdata->bytes)
+			wdata->result = -ENOSPC;
+		else
+			wdata->bytes = written;
+		break;
+	case MID_REQUEST_SUBMITTED:
+	case MID_RETRY_NEEDED:
+		wdata->result = -EAGAIN;
+		break;
+	default:
+		wdata->result = -EIO;
+		break;
+	}
+
+	queue_work(system_nrt_wq, &wdata->work);
+	DeleteMidQEntry(mid);
+	atomic_dec(&tcon->ses->server->inFlight);
+	wake_up(&tcon->ses->server->request_q);
+}
+
+/* cifs_async_writev - send an async write, and set up mid to handle result */
+int
+cifs_async_writev(struct cifs_writedata *wdata)
+{
+	int i, rc = -EACCES;
+	WRITE_REQ *smb = NULL;
+	int wct;
+	struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+	struct inode *inode = wdata->cfile->dentry->d_inode;
+	unsigned int npages = ARRAY_SIZE(wdata->pages);
+	struct kvec *iov = NULL;
+
+	cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
+
+	if (tcon->ses->capabilities & CAP_LARGE_FILES) {
+		wct = 14;
+	} else {
+		wct = 12;
+		if (wdata->offset >> 32 > 0) {
+			/* can not handle big offset for old srv */
+			return -EIO;
+		}
+	}
+
+	rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
+	if (rc)
+		goto async_writev_out;
+
+	/* FIXME: only allocate number of kvecs we need */
+	iov = kzalloc((npages + 1) * sizeof(*iov), GFP_NOFS);
+	if (iov == NULL) {
+		rc = -ENOMEM;
+		goto async_writev_out;
+	}
+
+	smb->AndXCommand = 0xFF;	/* none */
+	smb->Fid = wdata->cfile->netfid;
+	smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
+	if (wct == 14)
+		smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
+	smb->Reserved = 0xFFFFFFFF;
+	smb->WriteMode = 0;
+	smb->Remaining = 0;
+
+	smb->DataOffset =
+	    cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
+
+	/* 4 for RFC1001 length + 1 for BCC */
+	iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
+	iov[0].iov_base = smb;
+
+	/* marshal up the pages into iov array */
+	npages = 0;
+	wdata->bytes = 0;
+	for (i = 0; i < ARRAY_SIZE(wdata->pages) && wdata->pages[i]; i++) {
+		if (wdata->bytes + PAGE_CACHE_SIZE <=
+		    CIFS_SB(inode->i_sb)->wsize) {
+			iov[i + 1].iov_len = min(inode->i_size -
+					      page_offset(wdata->pages[i]),
+						(loff_t)PAGE_CACHE_SIZE);
+			iov[i + 1].iov_base = kmap(wdata->pages[i]);
+			wdata->bytes += iov[i + 1].iov_len;
+			npages++;
+		} else {
+			/* if we're beyond wsize, then re-mark page dirty */
+			__set_page_dirty_nobuffers(wdata->pages[i]);
+		}
+	}
+
+	smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
+	smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
+
+	if (wct == 14) {
+		inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
+		put_bcc(wdata->bytes + 1, &smb->hdr);
+	} else {
+		/* wct == 12 */
+		struct smb_com_writex_req *smbw =
+				(struct smb_com_writex_req *)smb;
+		inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
+		put_bcc(wdata->bytes + 5, &smbw->hdr);
+		iov[0].iov_len += 4; /* pad bigger by four bytes */
+	}
+
+	kref_get(&wdata->refcount);
+	rc = cifs_call_async(tcon->ses->server, iov, npages + 1,
+			     cifs_writev_callback, wdata);
+
+	if (rc == 0)
+		cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
+	else
+		kref_put(&wdata->refcount, cifs_writedata_release);
+
+	/* send is done, unmap pages */
+	for (i = 0; i < npages; i++)
+		kunmap(wdata->pages[i]);
+
+async_writev_out:
+	cifs_small_buf_release(smb);
+	kfree(iov);
+	return rc;
+}
+
 int
 CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
 	     const int netfid, const unsigned int count,
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] cifs: convert cifs_writepages to use async writes
       [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2011-04-02 12:23   ` [PATCH 4/5] cifs: add cifs_async_writev Jeff Layton
@ 2011-04-02 12:23   ` Jeff Layton
       [not found]     ` <1301747039-31411-6-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  4 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-02 12:23 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Have cifs_writepages issue asynchronous writes instead of waiting on
each write call to complete before issuing another. This also allows us
to return more quickly from writepages in the WB_SYNC_NONE case. It
can just send out all of the I/Os and not wait around for the replies.

In the WB_SYNC_ALL case, have it wait for all of the writes to complete
after issuing them and return any errors that turn up from it. If those
errors turn out to all be retryable (-EAGAIN), then retry the entire
operation again.

This also changes the page locking semantics a little bit. Instead of
holding the page lock until the response is received, release it after
doing the send. This will reduce contention for the page lock and should
prevent processes that have the file mmap'ed from being blocked
unnecessarily.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/file.c |  256 ++++++++++++++++++++++++++++++--------------------------
 1 files changed, 136 insertions(+), 120 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index da53246..dbdbe97 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1087,32 +1087,69 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
 	return rc;
 }
 
+/*
+ * walk a list of in progress async writes and wait for them to complete.
+ * Check the result code on each one.
+ *
+ * There's a clear heirarchy: 0 < EAGAIN < other errors
+ *
+ * If we get an EAGAIN return on a WB_SYNC_ALL writeout, then we need to
+ * go back and do the whole operation again. If we get other errors then
+ * the mapping already likely has AS_EIO or AS_ENOSPC set, so we can
+ * give up and just return an error.
+ */
+static int
+cifs_wait_for_write_completion(struct address_space *mapping,
+				struct list_head *pending)
+{
+	int tmprc, rc = 0;
+	struct cifs_writedata *wdata;
+
+	/* wait for writes to complete */
+	for (;;) {
+		/* anything left on list? */
+		spin_lock(&mapping->private_lock);
+		if (list_empty(pending)) {
+			spin_unlock(&mapping->private_lock);
+			break;
+		}
+
+		/* dequeue an entry */
+		wdata = list_first_entry(pending, struct cifs_writedata,
+					 pending);
+		list_del_init(&wdata->pending);
+		spin_unlock(&mapping->private_lock);
+
+		/* wait for it to complete */
+		tmprc = wait_for_completion_killable(&wdata->completion);
+		if (tmprc)
+			rc = tmprc;
+
+		if (wdata->result == -EAGAIN)
+			rc = rc ? rc : wdata->result;
+		else if (wdata->result != 0)
+			rc = wdata->result;
+
+		kref_put(&wdata->refcount, cifs_writedata_release);
+	}
+
+	return rc;
+}
+
 static int cifs_writepages(struct address_space *mapping,
 			   struct writeback_control *wbc)
 {
-	unsigned int bytes_to_write;
-	unsigned int bytes_written;
-	struct cifs_sb_info *cifs_sb;
-	int done = 0;
+	struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
+	bool done = false, scanned = false, range_whole = false;
 	pgoff_t end;
 	pgoff_t index;
-	int range_whole = 0;
-	struct kvec *iov;
-	int len;
-	int n_iov = 0;
-	pgoff_t next;
-	int nr_pages;
-	__u64 offset = 0;
+	unsigned int pvec_pages;
 	struct cifsFileInfo *open_file;
-	struct cifs_tcon *tcon;
-	struct cifsInodeInfo *cifsi = CIFS_I(mapping->host);
+	struct cifs_writedata *wdata;
 	struct page *page;
 	struct pagevec pvec;
+	struct list_head pending;
 	int rc = 0;
-	int scanned = 0;
-	int xid;
-
-	cifs_sb = CIFS_SB(mapping->host->i_sb);
 
 	/*
 	 * If wsize is smaller that the page cache size, default to writing
@@ -1121,27 +1158,19 @@ static int cifs_writepages(struct address_space *mapping,
 	if (cifs_sb->wsize < PAGE_CACHE_SIZE)
 		return generic_writepages(mapping, wbc);
 
-	iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
-	if (iov == NULL)
-		return generic_writepages(mapping, wbc);
-
 	/*
 	 * if there's no open file, then this is likely to fail too,
 	 * but it'll at least handle the return. Maybe it should be
 	 * a BUG() instead?
 	 */
 	open_file = find_writable_file(CIFS_I(mapping->host), false);
-	if (!open_file) {
-		kfree(iov);
+	if (!open_file)
 		return generic_writepages(mapping, wbc);
-	}
-
-	tcon = tlink_tcon(open_file->tlink);
 	cifsFileInfo_put(open_file);
 
-	xid = GetXid();
-
+	INIT_LIST_HEAD(&pending);
 	pagevec_init(&pvec, 0);
+
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
@@ -1149,23 +1178,26 @@ static int cifs_writepages(struct address_space *mapping,
 		index = wbc->range_start >> PAGE_CACHE_SHIFT;
 		end = wbc->range_end >> PAGE_CACHE_SHIFT;
 		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
-			range_whole = 1;
-		scanned = 1;
+			range_whole = true;
+		scanned = true;
 	}
 retry:
 	while (!done && (index <= end) &&
-	       (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
+	       (pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index,
 			PAGECACHE_TAG_DIRTY,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
-		int first;
 		unsigned int i;
+		unsigned int nr_pages = 0;
+		pgoff_t next = 0;
 
-		first = -1;
-		next = 0;
-		n_iov = 0;
-		bytes_to_write = 0;
+		wdata = cifs_writedata_alloc();
+		if (!wdata) {
+			pagevec_release(&pvec);
+			rc = -ENOMEM;
+			break;
+		}
 
-		for (i = 0; i < nr_pages; i++) {
+		for (i = 0; i < pvec_pages; i++) {
 			page = pvec.pages[i];
 			/*
 			 * At this point we hold neither mapping->tree_lock nor
@@ -1175,7 +1207,7 @@ retry:
 			 * mapping
 			 */
 
-			if (first < 0)
+			if (nr_pages == 0)
 				lock_page(page);
 			else if (!trylock_page(page))
 				break;
@@ -1186,7 +1218,7 @@ retry:
 			}
 
 			if (!wbc->range_cyclic && page->index > end) {
-				done = 1;
+				done = true;
 				unlock_page(page);
 				break;
 			}
@@ -1213,119 +1245,103 @@ retry:
 			set_page_writeback(page);
 
 			if (page_offset(page) >= mapping->host->i_size) {
-				done = 1;
+				done = true;
 				unlock_page(page);
 				end_page_writeback(page);
 				break;
 			}
 
-			/*
-			 * BB can we get rid of this?  pages are held by pvec
-			 */
 			page_cache_get(page);
-
-			len = min(mapping->host->i_size - page_offset(page),
-				  (loff_t)PAGE_CACHE_SIZE);
-
-			/* reserve iov[0] for the smb header */
-			n_iov++;
-			iov[n_iov].iov_base = kmap(page);
-			iov[n_iov].iov_len = len;
-			bytes_to_write += len;
-
-			if (first < 0) {
-				first = i;
-				offset = page_offset(page);
-			}
+			wdata->pages[i] = page;
 			next = page->index + 1;
-			if (bytes_to_write + PAGE_CACHE_SIZE > cifs_sb->wsize)
+			++nr_pages;
+
+			/* stop here if we have enough for a full wsize write */
+			if ((nr_pages + 1) * PAGE_CACHE_SIZE > cifs_sb->wsize)
 				break;
 		}
-		if (n_iov) {
-retry_write:
-			open_file = find_writable_file(CIFS_I(mapping->host),
-							false);
-			if (!open_file) {
+
+		if (wdata->pages[0] == NULL) {
+			/* Need to re-find the pages we skipped */
+			index = pvec.pages[0]->index + 1;
+			pagevec_release(&pvec);
+			continue;
+		}
+
+		pagevec_release(&pvec);
+		wdata->offset = page_offset(wdata->pages[0]);
+
+		do {
+			if (wdata->cfile != NULL)
+				cifsFileInfo_put(wdata->cfile);
+			wdata->cfile = find_writable_file(CIFS_I(mapping->host),
+							  false);
+			if (!wdata->cfile) {
 				cERROR(1, "No writable handles for inode");
 				rc = -EBADF;
-			} else {
-				rc = CIFSSMBWrite2(xid, tcon, open_file->netfid,
-						   bytes_to_write, offset,
-						   &bytes_written, iov, n_iov,
-						   0);
-				cifsFileInfo_put(open_file);
+				break;
 			}
-
-			cFYI(1, "Write2 rc=%d, wrote=%u", rc, bytes_written);
-
-			/*
-			 * For now, treat a short write as if nothing got
-			 * written. A zero length write however indicates
-			 * ENOSPC or EFBIG. We have no way to know which
-			 * though, so call it ENOSPC for now. EFBIG would
-			 * get translated to AS_EIO anyway.
-			 *
-			 * FIXME: make it take into account the data that did
-			 *	  get written
-			 */
-			if (rc == 0) {
-				if (bytes_written == 0)
-					rc = -ENOSPC;
-				else if (bytes_written < bytes_to_write)
-					rc = -EAGAIN;
+			if (wbc->sync_mode == WB_SYNC_ALL) {
+				spin_lock(&mapping->private_lock);
+				list_move(&wdata->pending, &pending);
+				spin_unlock(&mapping->private_lock);
 			}
+			rc = cifs_async_writev(wdata);
+		} while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
 
-			/* retry on data-integrity flush */
-			if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
-				goto retry_write;
-
-			/* fix the stats and EOF */
-			if (bytes_written > 0) {
-				cifs_stats_bytes_written(tcon, bytes_written);
-				cifs_update_eof(cifsi, offset, bytes_written);
-			}
+		for (i = 0; i < nr_pages; ++i)
+			unlock_page(wdata->pages[i]);
 
-			for (i = 0; i < n_iov; i++) {
-				page = pvec.pages[first + i];
-				/* on retryable write error, redirty page */
+		/* send failure -- clean up the mess */
+		if (rc != 0) {
+			for (i = 0; i < nr_pages; ++i) {
 				if (rc == -EAGAIN)
-					redirty_page_for_writepage(wbc, page);
-				else if (rc != 0)
-					SetPageError(page);
-				kunmap(page);
-				unlock_page(page);
-				end_page_writeback(page);
-				page_cache_release(page);
+					redirty_page_for_writepage(wbc,
+							   wdata->pages[i]);
+				else
+					SetPageError(wdata->pages[i]);
+				end_page_writeback(wdata->pages[i]);
+				page_cache_release(wdata->pages[i]);
 			}
-
 			if (rc != -EAGAIN)
 				mapping_set_error(mapping, rc);
-			else
-				rc = 0;
-
-			if ((wbc->nr_to_write -= n_iov) <= 0)
-				done = 1;
-			index = next;
-		} else
-			/* Need to re-find the pages we skipped */
-			index = pvec.pages[0]->index + 1;
+			kref_put(&wdata->refcount, cifs_writedata_release);
+		} else if (wbc->sync_mode == WB_SYNC_NONE) {
+			kref_put(&wdata->refcount, cifs_writedata_release);
+		}
 
-		pagevec_release(&pvec);
+		if ((wbc->nr_to_write -= nr_pages) <= 0)
+			done = true;
+		index = next;
 	}
+
 	if (!scanned && !done) {
 		/*
 		 * We hit the last page and there is more work to be done: wrap
 		 * back to the start of the file
 		 */
-		scanned = 1;
+		scanned = true;
 		index = 0;
 		goto retry;
 	}
+
+	/*
+	 * for WB_SYNC_ALL, we must wait until the writes complete. If any
+	 * return -EAGAIN however, we need to retry the whole thing again.
+	 * At that point nr_to_write will be all screwed up, but that
+	 * shouldn't really matter for WB_SYNC_ALL (right?)
+	 */
+	if (wbc->sync_mode == WB_SYNC_ALL) {
+		rc = cifs_wait_for_write_completion(mapping, &pending);
+		if (rc == -EAGAIN) {
+			index = wbc->range_start >> PAGE_CACHE_SHIFT;
+			goto retry;
+		}
+	}
+
 	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
 		mapping->writeback_index = index;
 
-	FreeXid(xid);
-	kfree(iov);
 	return rc;
 }
 
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] cifs: consolidate SendReceive response checks
       [not found]     ` <1301747039-31411-2-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-12  9:38       ` Pavel Shilovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12  9:38 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> Further consolidate the SendReceive code by moving the checks run over
> the packet into a separate function that all the SendReceive variants
> can call.
>
> We can also eliminate the check for a receive_len that's too big or too
> small. cifs_demultiplex_thread already checks that and disconnects the
> socket if that occurs, while setting the midStatus to MALFORMED. It'll
> never call this code if that's the case.
>
> Finally do a little cleanup. Use "goto out" on errors so that the flow
> of code in the normal case is more evident. Also switch the logErr
> variable in map_smb_to_linux_error to a bool.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/cifsproto.h |    4 +-
>  fs/cifs/netmisc.c   |    2 +-
>  fs/cifs/transport.c |  158 ++++++++++++++-------------------------------------
>  3 files changed, 46 insertions(+), 118 deletions(-)
>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 76c4dc7f..4af7db8 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -74,6 +74,8 @@ extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
>                        int * /* bytes returned */ , const int long_op);
>  extern int SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
>                        struct smb_hdr *in_buf, int flags);
> +extern int cifs_check_receive(struct mid_q_entry *mid,
> +                       struct TCP_Server_Info *server, bool log_error);
>  extern int SendReceive2(const unsigned int /* xid */ , struct cifs_ses *,
>                        struct kvec *, int /* nvec to send */,
>                        int * /* type of buf returned */ , const int flags);
> @@ -97,7 +99,7 @@ extern int cifs_convert_address(struct sockaddr *dst, const char *src, int len);
>  extern int cifs_set_port(struct sockaddr *addr, const unsigned short int port);
>  extern int cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
>                                const unsigned short int port);
> -extern int map_smb_to_linux_error(struct smb_hdr *smb, int logErr);
> +extern int map_smb_to_linux_error(struct smb_hdr *smb, bool logErr);
>  extern void header_assemble(struct smb_hdr *, char /* command */ ,
>                            const struct cifs_tcon *, int /* length of
>                            fixed section (word count) in two byte units */);
> diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
> index 79b71c2..73e47e8 100644
> --- a/fs/cifs/netmisc.c
> +++ b/fs/cifs/netmisc.c
> @@ -836,7 +836,7 @@ ntstatus_to_dos(__u32 ntstatus, __u8 *eclass, __u16 *ecode)
>  }
>
>  int
> -map_smb_to_linux_error(struct smb_hdr *smb, int logErr)
> +map_smb_to_linux_error(struct smb_hdr *smb, bool logErr)
>  {
>        unsigned int i;
>        int rc = -EIO;  /* if transport error smb error may not be set */
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 85063de..d9833fc 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -505,13 +505,32 @@ send_nt_cancel(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
>  }
>
>  int
> +cifs_check_receive(struct mid_q_entry *mid, struct TCP_Server_Info *server,
> +                  bool log_error)
> +{
> +       dump_smb(mid->resp_buf,
> +                min_t(u32, 92, be32_to_cpu(mid->resp_buf->smb_buf_length)));
> +
> +       /* convert the length into a more usable form */
> +       if (server->sec_mode & (SECMODE_SIGN_REQUIRED |
> +                               SECMODE_SIGN_ENABLED)) {
> +               /* FIXME: add code to kill session */
> +               if (cifs_verify_signature(mid->resp_buf, server,
> +                                         mid->sequence_number + 1) != 0)
> +                       cERROR(1, "Unexpected SMB signature");
> +       }
> +
> +       /* BB special case reconnect tid and uid here? */
> +       return map_smb_to_linux_error(mid->resp_buf, log_error);
> +}
> +
> +int
>  SendReceive2(const unsigned int xid, struct cifs_ses *ses,
>             struct kvec *iov, int n_vec, int *pRespBufType /* ret */,
>             const int flags)
>  {
>        int rc = 0;
>        int long_op;
> -       unsigned int receive_len;
>        struct mid_q_entry *midQ;
>        struct smb_hdr *in_buf = iov[0].iov_base;
>
> @@ -604,54 +623,24 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
>                return rc;
>        }
>
> -       receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
> -
> -       if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
> -               cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
> -                       receive_len, xid);
> +       if (!midQ->resp_buf || midQ->midState != MID_RESPONSE_RECEIVED) {
>                rc = -EIO;
> +               cFYI(1, "Bad MID state?");
>                goto out;
>        }
>
> -       /* rcvd frame is ok */
> -
> -       if (midQ->resp_buf &&
> -           (midQ->midState == MID_RESPONSE_RECEIVED)) {
> -
> -               iov[0].iov_base = (char *)midQ->resp_buf;
> -               if (midQ->largeBuf)
> -                       *pRespBufType = CIFS_LARGE_BUFFER;
> -               else
> -                       *pRespBufType = CIFS_SMALL_BUFFER;
> -               iov[0].iov_len = receive_len + 4;
> -
> -               dump_smb(midQ->resp_buf, 80);
> -               /* convert the length into a more usable form */
> -               if ((receive_len > 24) &&
> -                   (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
> -                                            SECMODE_SIGN_ENABLED))) {
> -                       rc = cifs_verify_signature(midQ->resp_buf,
> -                                               ses->server,
> -                                               midQ->sequence_number+1);
> -                       if (rc) {
> -                               cERROR(1, "Unexpected SMB signature");
> -                               /* BB FIXME add code to kill session */
> -                       }
> -               }
> -
> -               /* BB special case reconnect tid and uid here? */
> -               rc = map_smb_to_linux_error(midQ->resp_buf,
> -                                           flags & CIFS_LOG_ERROR);
> +       iov[0].iov_base = (char *)midQ->resp_buf;
> +       iov[0].iov_len = be32_to_cpu(midQ->resp_buf->smb_buf_length) + 4;
> +       if (midQ->largeBuf)
> +               *pRespBufType = CIFS_LARGE_BUFFER;
> +       else
> +               *pRespBufType = CIFS_SMALL_BUFFER;
>
> -               if ((flags & CIFS_NO_RESP) == 0)
> -                       midQ->resp_buf = NULL;  /* mark it so buf will
> -                                                  not be freed by
> -                                                  delete_mid */
> -       } else {
> -               rc = -EIO;
> -               cFYI(1, "Bad MID state?");
> -       }
> +       rc = cifs_check_receive(midQ, ses->server, flags & CIFS_LOG_ERROR);
>
> +       /* mark it so buf will not be freed by delete_mid */
> +       if ((flags & CIFS_NO_RESP) == 0)
> +               midQ->resp_buf = NULL;
>  out:
>        delete_mid(midQ);
>        atomic_dec(&ses->server->inFlight);
> @@ -666,7 +655,6 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
>            int *pbytes_returned, const int long_op)
>  {
>        int rc = 0;
> -       unsigned int receive_len;
>        struct mid_q_entry *midQ;
>
>        if (ses == NULL) {
> @@ -753,47 +741,16 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
>                return rc;
>        }
>
> -       receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
> -
> -       if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
> -               cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
> -                       receive_len, xid);
> -               rc = -EIO;
> -               goto out;
> -       }
> -
> -       /* rcvd frame is ok */
> -
> -       if (midQ->resp_buf && out_buf
> -           && (midQ->midState == MID_RESPONSE_RECEIVED)) {
> -               out_buf->smb_buf_length = cpu_to_be32(receive_len);
> -               memcpy((char *)out_buf + 4,
> -                      (char *)midQ->resp_buf + 4,
> -                      receive_len);
> -
> -               dump_smb(out_buf, 92);
> -               /* convert the length into a more usable form */
> -               if ((receive_len > 24) &&
> -                   (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
> -                                            SECMODE_SIGN_ENABLED))) {
> -                       rc = cifs_verify_signature(out_buf,
> -                                               ses->server,
> -                                               midQ->sequence_number+1);
> -                       if (rc) {
> -                               cERROR(1, "Unexpected SMB signature");
> -                               /* BB FIXME add code to kill session */
> -                       }
> -               }
> -
> -               *pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
> -
> -               /* BB special case reconnect tid and uid here? */
> -               rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
> -       } else {
> +       if (!midQ->resp_buf || !out_buf ||
> +           midQ->midState != MID_RESPONSE_RECEIVED) {
>                rc = -EIO;
>                cERROR(1, "Bad MID state?");
> +               goto out;
>        }
>
> +       *pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
> +       memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
> +       rc = cifs_check_receive(midQ, ses->server, 0);
>  out:
>        delete_mid(midQ);
>        atomic_dec(&ses->server->inFlight);
> @@ -834,7 +791,6 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
>  {
>        int rc = 0;
>        int rstart = 0;
> -       unsigned int receive_len;
>        struct mid_q_entry *midQ;
>        struct cifs_ses *ses;
>
> @@ -953,46 +909,16 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
>        if (rc != 0)
>                return rc;
>
> -       receive_len = be32_to_cpu(midQ->resp_buf->smb_buf_length);
> -       if (receive_len > CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) {
> -               cERROR(1, "Frame too large received.  Length: %d  Xid: %d",
> -                       receive_len, xid);
> -               rc = -EIO;
> -               goto out;
> -       }
> -
>        /* rcvd frame is ok */
> -
> -       if ((out_buf == NULL) || (midQ->midState != MID_RESPONSE_RECEIVED)) {
> +       if (out_buf == NULL || midQ->midState != MID_RESPONSE_RECEIVED) {
>                rc = -EIO;
>                cERROR(1, "Bad MID state?");
>                goto out;
>        }
>
> -       out_buf->smb_buf_length = cpu_to_be32(receive_len);
> -       memcpy((char *)out_buf + 4,
> -              (char *)midQ->resp_buf + 4,
> -              receive_len);
> -
> -       dump_smb(out_buf, 92);
> -       /* convert the length into a more usable form */
> -       if ((receive_len > 24) &&
> -           (ses->server->sec_mode & (SECMODE_SIGN_REQUIRED |
> -                                    SECMODE_SIGN_ENABLED))) {
> -               rc = cifs_verify_signature(out_buf,
> -                                          ses->server,
> -                                          midQ->sequence_number+1);
> -               if (rc) {
> -                       cERROR(1, "Unexpected SMB signature");
> -                       /* BB FIXME add code to kill session */
> -               }
> -       }
> -
> -       *pbytes_returned = be32_to_cpu(out_buf->smb_buf_length);
> -
> -       /* BB special case reconnect tid and uid here? */
> -       rc = map_smb_to_linux_error(out_buf, 0 /* no log */ );
> -
> +       *pbytes_returned = be32_to_cpu(midQ->resp_buf->smb_buf_length);
> +       memcpy(out_buf, midQ->resp_buf, *pbytes_returned + 4);
> +       rc = cifs_check_receive(midQ, ses->server, 0);
>  out:
>        delete_mid(midQ);
>        if (rstart && rc == -EACCES)
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Looks good.

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] cifs: make cifs_send_async take a kvec array
       [not found]     ` <1301747039-31411-3-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-12  9:39       ` Pavel Shilovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12  9:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> We'll need this for async writes, so convert the call to take a kvec
> array. CIFSSMBEcho is changed to put a kvec on the stack and pass
> in the SMB buffer using that.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/cifsproto.h |    4 ++--
>  fs/cifs/cifssmb.c   |    6 ++++--
>  fs/cifs/transport.c |   13 +++++++------
>  3 files changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 4af7db8..e255a2b 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -65,8 +65,8 @@ extern struct mid_q_entry *AllocMidQEntry(const struct smb_hdr *smb_buffer,
>                                        struct TCP_Server_Info *server);
>  extern void DeleteMidQEntry(struct mid_q_entry *midEntry);
>  extern int wait_for_free_request(struct TCP_Server_Info *sv, const int long_op);
> -extern int cifs_call_async(struct TCP_Server_Info *server,
> -                          struct smb_hdr *in_buf, mid_callback_t *callback,
> +extern int cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
> +                          unsigned int nvec, mid_callback_t *callback,
>                           void *cbdata);
>  extern int SendReceive(const unsigned int /* xid */ , struct cifs_ses *,
>                        struct smb_hdr * /* input */ ,
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 655f24c..8de7e79 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -725,6 +725,7 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
>  {
>        ECHO_REQ *smb;
>        int rc = 0;
> +       struct kvec iov;
>
>        cFYI(1, "In echo request");
>
> @@ -739,9 +740,10 @@ CIFSSMBEcho(struct TCP_Server_Info *server)
>        put_bcc(1, &smb->hdr);
>        smb->Data[0] = 'a';
>        inc_rfc1001_len(smb, 3);
> +       iov.iov_base = smb;
> +       iov.iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4;
>
> -       rc = cifs_call_async(server, (struct smb_hdr *)smb,
> -                               cifs_echo_callback, server);
> +       rc = cifs_call_async(server, &iov, 1, cifs_echo_callback, server);
>        if (rc)
>                cFYI(1, "Echo request failed: %d", rc);
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index d9833fc..7ccf292 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -349,11 +349,12 @@ wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
>  * the result. Caller is responsible for dealing with timeouts.
>  */
>  int
> -cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
> -               mid_callback_t *callback, void *cbdata)
> +cifs_call_async(struct TCP_Server_Info *server, struct kvec *iov,
> +               unsigned int nvec, mid_callback_t *callback, void *cbdata)
>  {
>        int rc;
>        struct mid_q_entry *mid;
> +       struct smb_hdr *hdr = (struct smb_hdr *)iov[0].iov_base;
>
>        rc = wait_for_free_request(server, CIFS_ASYNC_OP);
>        if (rc)
> @@ -361,10 +362,10 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
>
>        /* enable signing if server requires it */
>        if (server->sec_mode & (SECMODE_SIGN_REQUIRED | SECMODE_SIGN_ENABLED))
> -               in_buf->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
> +               hdr->Flags2 |= SMBFLG2_SECURITY_SIGNATURE;
>
>        mutex_lock(&server->srv_mutex);
> -       mid = AllocMidQEntry(in_buf, server);
> +       mid = AllocMidQEntry(hdr, server);
>        if (mid == NULL) {
>                mutex_unlock(&server->srv_mutex);
>                return -ENOMEM;
> @@ -375,7 +376,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
>        list_add_tail(&mid->qhead, &server->pending_mid_q);
>        spin_unlock(&GlobalMid_Lock);
>
> -       rc = cifs_sign_smb(in_buf, server, &mid->sequence_number);
> +       rc = cifs_sign_smb2(iov, nvec, server, &mid->sequence_number);
>        if (rc) {
>                mutex_unlock(&server->srv_mutex);
>                goto out_err;
> @@ -385,7 +386,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_hdr *in_buf,
>        mid->callback_data = cbdata;
>        mid->midState = MID_REQUEST_SUBMITTED;
>        cifs_in_send_inc(server);
> -       rc = smb_send(server, in_buf, be32_to_cpu(in_buf->smb_buf_length));
> +       rc = smb_sendv(server, iov, nvec);
>        cifs_in_send_dec(server);
>        cifs_save_when_sent(mid);
>        mutex_unlock(&server->srv_mutex);
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] cifs: don't call mid_q_entry->callback under the Global_MidLock
       [not found]     ` <1301747039-31411-4-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-12  9:39       ` Pavel Shilovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12  9:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> Holding the spinlock while we call this function means that it can't
> sleep, which really limits what it can do. Taking it out from under
> the spinlock also means less contention for this global lock.
>
> Change the semantics such that the Global_MidLock is not held when
> the callback is called. To do this requires that we take extra care
> not to have sync_mid_result remove the mid from the list when the
> mid is in a state where that has already happened. This prevents
> list corruption when the mid is sitting on a private list for
> reconnect or when cifsd is coming down.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/cifsglob.h      |    7 +++----
>  fs/cifs/connect.c       |   32 +++++++++++++++++++++++++-------
>  fs/cifs/smb2transport.c |   14 ++++----------
>  fs/cifs/transport.c     |   23 ++++++++---------------
>  4 files changed, 40 insertions(+), 36 deletions(-)
>
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index ccbac61..b3a6404 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -583,9 +583,8 @@ struct mid_q_entry;
>  * This is the prototype for the mid callback function. When creating one,
>  * take special care to avoid deadlocks. Things to bear in mind:
>  *
> - * - it will be called by cifsd
> - * - the GlobalMid_Lock will be held
> - * - the mid will be removed from the pending_mid_q list
> + * - it will be called by cifsd, with no locks held
> + * - the mid will be removed from any lists
>  */
>  typedef void (mid_callback_t)(struct mid_q_entry *mid);
>
> @@ -739,7 +738,7 @@ static inline void free_dfs_info_array(struct dfs_info3_param *param,
>  #define   MID_RESPONSE_RECEIVED 4
>  #define   MID_RETRY_NEEDED      8 /* session closed while this request out */
>  #define   MID_RESPONSE_MALFORMED 0x10
> -#define   MID_NO_RESPONSE_NEEDED 0x20
> +#define   MID_SHUTDOWN          0x20
>
>  /* Types of response buffer returned from SendReceive2 */
>  #define   CIFS_NO_BUFFER        0    /* Response buffer not returned */
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 94e60c5..2b5d248 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -141,6 +141,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>        struct cifs_ses *ses;
>        struct cifs_tcon *tcon;
>        struct mid_q_entry *mid_entry;
> +       struct list_head retry_list;
>
>        spin_lock(&GlobalMid_Lock);
>        if (server->tcpStatus == CifsExiting) {
> @@ -192,16 +193,23 @@ cifs_reconnect(struct TCP_Server_Info *server)
>        mutex_unlock(&server->srv_mutex);
>
>        /* mark submitted MIDs for retry and issue callback */
> -       cFYI(1, "%s: issuing mid callbacks", __func__);
> +       INIT_LIST_HEAD(&retry_list);
> +       cFYI(1, "%s: moving mids to private list", __func__);
>        spin_lock(&GlobalMid_Lock);
>        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                if (mid_entry->midState == MID_REQUEST_SUBMITTED)
>                        mid_entry->midState = MID_RETRY_NEEDED;
> +               list_move(&mid_entry->qhead, &retry_list);
> +       }
> +       spin_unlock(&GlobalMid_Lock);
> +
> +       cFYI(1, "%s: moving mids to private list", __func__);
> +       list_for_each_safe(tmp, tmp2, &retry_list) {
> +               mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
>                list_del_init(&mid_entry->qhead);
>                mid_entry->callback(mid_entry);
>        }
> -       spin_unlock(&GlobalMid_Lock);
>
>        while (server->tcpStatus == CifsNeedReconnect) {
>                try_to_freeze();
> @@ -655,11 +663,10 @@ multi_t2_fnd:
>                                else
>                                        mid_entry->midState =
>                                                        MID_RESPONSE_MALFORMED;
> +                               list_del_init(&mid_entry->qhead);
>  #ifdef CONFIG_CIFS_STATS2
>                                mid_entry->when_received = jiffies;
>  #endif
> -                               list_del_init(&mid_entry->qhead);
> -                               mid_entry->callback(mid_entry);
>                                break;
>                        }
>                        mid_entry = NULL;
> @@ -667,6 +674,7 @@ multi_t2_fnd:
>                spin_unlock(&GlobalMid_Lock);
>
>                if (mid_entry != NULL) {
> +                       mid_entry->callback(mid_entry);
>                        /* Was previous buf put in mpx struct for multi-rsp? */
>                        if (!isMultiRsp) {
>                                /* smb buffer will be freed by user thread */
> @@ -730,15 +738,25 @@ multi_t2_fnd:
>                cifs_small_buf_release(smallbuf);
>
>        if (!list_empty(&server->pending_mid_q)) {
> +               struct list_head dispose_list;
> +
> +               INIT_LIST_HEAD(&dispose_list);
>                spin_lock(&GlobalMid_Lock);
>                list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
>                        mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> -                       cFYI(1, "Clearing Mid 0x%x - issuing callback",
> -                                        mid_entry->mid);
> +                       cFYI(1, "Clearing mid 0x%x", mid_entry->mid);
> +                       mid_entry->midState = MID_SHUTDOWN;
> +                       list_move(&mid_entry->qhead, &dispose_list);
> +               }
> +               spin_unlock(&GlobalMid_Lock);
> +
> +               /* now walk dispose list and issue callbacks */
> +               list_for_each_safe(tmp, tmp2, &dispose_list) {
> +                       mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
> +                       cFYI(1, "Callback mid 0x%x", mid_entry->mid);
>                        list_del_init(&mid_entry->qhead);
>                        mid_entry->callback(mid_entry);
>                }
> -               spin_unlock(&GlobalMid_Lock);
>                /* 1/8th of sec is more than enough time for them to exit */
>                msleep(125);
>        }
> diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
> index a187587..e0fa75308 100644
> --- a/fs/cifs/smb2transport.c
> +++ b/fs/cifs/smb2transport.c
> @@ -276,28 +276,22 @@ sync_smb2_mid_result(struct smb2_mid_entry *mid, struct TCP_Server_Info *server)
>
>        spin_lock(&GlobalMid_Lock);
>        /* ensure that it's no longer on the pending_mid_q */
> -       list_del_init(&mid->qhead);
>
>        switch (mid->mid_state) {
>        case MID_RESPONSE_RECEIVED:
>                spin_unlock(&GlobalMid_Lock);
>                return rc;
> -       case MID_REQUEST_SUBMITTED:
> -               /* socket is going down, reject all calls */
> -               if (server->tcpStatus == CifsExiting) {
> -                       cERROR(1, "%s: canceling mid=%lld cmd=0x%x state=%d",
> -                               __func__, mid->mid, le16_to_cpu(mid->command),
> -                               mid->mid_state);
> -                       rc = -EHOSTDOWN;
> -                       break;
> -               }
>        case MID_RETRY_NEEDED:
>                rc = -EAGAIN;
>                break;
>        case MID_RESPONSE_MALFORMED:
>                rc = -EIO;
>                break;
> +       case MID_SHUTDOWN:
> +               rc = -EHOSTDOWN;
> +               break;
>        default:
> +               list_del_init(&mid->qhead);
>                cERROR(1, "%s: invalid mid state mid=%lld state=%d", __func__,
>                        mid->mid, mid->mid_state);
>                rc = -EIO;
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index 7ccf292..a31c279 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -428,7 +428,7 @@ SendReceiveNoRsp(const unsigned int xid, struct cifs_ses *ses,
>  }
>
>  static int
> -sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
> +cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>  {
>        int rc = 0;
>
> @@ -436,28 +436,21 @@ sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
>                mid->mid, mid->midState);
>
>        spin_lock(&GlobalMid_Lock);
> -       /* ensure that it's no longer on the pending_mid_q */
> -       list_del_init(&mid->qhead);
> -
>        switch (mid->midState) {
>        case MID_RESPONSE_RECEIVED:
>                spin_unlock(&GlobalMid_Lock);
>                return rc;
> -       case MID_REQUEST_SUBMITTED:
> -               /* socket is going down, reject all calls */
> -               if (server->tcpStatus == CifsExiting) {
> -                       cERROR(1, "%s: canceling mid=%d cmd=0x%x state=%d",
> -                              __func__, mid->mid, mid->command, mid->midState);
> -                       rc = -EHOSTDOWN;
> -                       break;
> -               }
>        case MID_RETRY_NEEDED:
>                rc = -EAGAIN;
>                break;
>        case MID_RESPONSE_MALFORMED:
>                rc = -EIO;
>                break;
> +       case MID_SHUTDOWN:
> +               rc = -EHOSTDOWN;
> +               break;
>        default:
> +               list_del_init(&mid->qhead);
>                cERROR(1, "%s: invalid mid state mid=%d state=%d", __func__,
>                        mid->mid, mid->midState);
>                rc = -EIO;
> @@ -617,7 +610,7 @@ SendReceive2(const unsigned int xid, struct cifs_ses *ses,
>
>        cifs_small_buf_release(in_buf);
>
> -       rc = sync_mid_result(midQ, ses->server);
> +       rc = cifs_sync_mid_result(midQ, ses->server);
>        if (rc != 0) {
>                atomic_dec(&ses->server->inFlight);
>                wake_up(&ses->server->request_q);
> @@ -735,7 +728,7 @@ SendReceive(const unsigned int xid, struct cifs_ses *ses,
>                spin_unlock(&GlobalMid_Lock);
>        }
>
> -       rc = sync_mid_result(midQ, ses->server);
> +       rc = cifs_sync_mid_result(midQ, ses->server);
>        if (rc != 0) {
>                atomic_dec(&ses->server->inFlight);
>                wake_up(&ses->server->request_q);
> @@ -906,7 +899,7 @@ SendReceiveBlockingLock(const unsigned int xid, struct cifs_tcon *tcon,
>                rstart = 1;
>        }
>
> -       rc = sync_mid_result(midQ, ses->server);
> +       rc = cifs_sync_mid_result(midQ, ses->server);
>        if (rc != 0)
>                return rc;
>
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Reviewed-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] cifs: add cifs_async_writev
       [not found]     ` <1301747039-31411-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-12  9:43       ` Pavel Shilovsky
       [not found]         ` <BANLkTimHi_w55+bqFstXZZdjaMzw1mw_yA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12  9:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> Add the ability for CIFS to do an asynchronous write. The kernel will
> set the frame up as it would for a "normal" SMBWrite2 request, and use
> cifs_call_async to send it. The mid callback will then be configured to
> handle the result.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/cifsproto.h |   19 +++++
>  fs/cifs/cifssmb.c   |  219 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 238 insertions(+), 0 deletions(-)

Looks good too, except two small notes.

>
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index e255a2b..7b964df 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -21,6 +21,7 @@
>  #ifndef _CIFSPROTO_H
>  #define _CIFSPROTO_H
>  #include <linux/nls.h>
> +#include <linux/pagevec.h>
>
>  struct statfs;
>  struct smb_vol;
> @@ -433,4 +434,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
>  extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
>  extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
>                        unsigned char *p24);
> +
> +/* asynchronous write support */
> +struct cifs_writedata {
> +       struct list_head        pending;
> +       struct kref             refcount;
> +       struct completion       completion;
> +       struct work_struct      work;
> +       struct cifsFileInfo     *cfile;
> +       __u64                   offset;
> +       unsigned int            bytes;
> +       int                     result;
> +       struct page             *pages[PAGEVEC_SIZE];
> +};
> +
> +int cifs_async_writev(struct cifs_writedata *wdata);
> +struct cifs_writedata *cifs_writedata_alloc(void);
> +void cifs_writedata_release(struct kref *refcount);
> +
>  #endif                 /* _CIFSPROTO_H */
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 8de7e79..b308605 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -32,6 +32,7 @@
>  #include <linux/vfs.h>
>  #include <linux/slab.h>
>  #include <linux/posix_acl_xattr.h>
> +#include <linux/pagemap.h>
>  #include <asm/uaccess.h>
>  #include "cifspdu.h"
>  #include "cifsglob.h"
> @@ -1604,6 +1605,224 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
>        return rc;
>  }
>
> +void
> +cifs_writedata_release(struct kref *refcount)
> +{
> +       struct cifs_writedata *wdata = container_of(refcount,
> +                                       struct cifs_writedata, refcount);
> +       struct address_space *mapping;
> +
> +       if (wdata->cfile) {
> +               mapping = wdata->cfile->dentry->d_inode->i_mapping;
> +
> +               spin_lock(&mapping->private_lock);
> +               list_del(&wdata->pending);
> +               spin_unlock(&mapping->private_lock);
> +
> +               cifsFileInfo_put(wdata->cfile);
> +       }
> +
> +       kfree(wdata);
> +}
> +
> +static void
> +cifs_writev_complete(struct work_struct *work)
> +{
> +       struct cifs_writedata *wdata = container_of(work,
> +                                               struct cifs_writedata, work);
> +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
                             ^^^ this variable is unused.

> +       struct inode *inode = wdata->cfile->dentry->d_inode;
> +       struct page *page;
> +       int i = 0;
> +
> +       if (wdata->result == 0) {
> +               cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
> +               cifs_stats_bytes_written(tcon, wdata->bytes);
> +       }
> +
> +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && (page = wdata->pages[i]);
> +                       i++) {
> +               if (wdata->result == -EAGAIN)
> +                       __set_page_dirty_nobuffers(page);
> +               else if (wdata->result < 0)
> +                       SetPageError(page);
> +               end_page_writeback(page);
> +               page_cache_release(page);
> +       }
> +       if (wdata->result != -EAGAIN)
> +               mapping_set_error(inode->i_mapping, wdata->result);
> +       complete(&wdata->completion);
> +       kref_put(&wdata->refcount, cifs_writedata_release);
> +}
> +
> +struct cifs_writedata *
> +cifs_writedata_alloc(void)
> +{
> +       struct cifs_writedata *wdata;
> +
> +       wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> +       if (wdata != NULL) {
> +               INIT_LIST_HEAD(&wdata->pending);
> +               INIT_WORK(&wdata->work, cifs_writev_complete);
> +               kref_init(&wdata->refcount);
> +               init_completion(&wdata->completion);
> +       }
> +       return wdata;
> +}
> +
> +/*
> + * Check the midState and signature on received buffer (if any), and queue the
> + * workqueue completion task.
> + */
> +static void
> +cifs_writev_callback(struct mid_q_entry *mid)
> +{
> +       struct cifs_writedata *wdata = mid->callback_data;
> +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> +       unsigned int written;
> +       WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
> +
> +       switch(mid->midState) {
> +       case MID_RESPONSE_RECEIVED:
> +               wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
> +               if (wdata->result != 0)
> +                       break;
> +
> +               written = le16_to_cpu(smb->CountHigh);
> +               written <<= 16;
> +               written += le16_to_cpu(smb->Count);
> +               /*
> +                * Mask off high 16 bits when bytes written as returned
> +                * by the server is greater than bytes requested by the
> +                * client. OS/2 servers are known to set incorrect
> +                * CountHigh values.
> +                */
> +               if (written > wdata->bytes)
> +                       written &= 0xFFFF;
> +
> +               if (written < wdata->bytes)
> +                       wdata->result = -ENOSPC;
> +               else
> +                       wdata->bytes = written;
> +               break;
> +       case MID_REQUEST_SUBMITTED:
> +       case MID_RETRY_NEEDED:
> +               wdata->result = -EAGAIN;
> +               break;
> +       default:
> +               wdata->result = -EIO;
> +               break;
> +       }
> +
> +       queue_work(system_nrt_wq, &wdata->work);
> +       DeleteMidQEntry(mid);
> +       atomic_dec(&tcon->ses->server->inFlight);
> +       wake_up(&tcon->ses->server->request_q);
> +}
> +
> +/* cifs_async_writev - send an async write, and set up mid to handle result */
> +int
> +cifs_async_writev(struct cifs_writedata *wdata)
> +{
> +       int i, rc = -EACCES;
> +       WRITE_REQ *smb = NULL;
> +       int wct;
> +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> +       struct inode *inode = wdata->cfile->dentry->d_inode;
> +       unsigned int npages = ARRAY_SIZE(wdata->pages);
> +       struct kvec *iov = NULL;
> +
> +       cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
                                                       ^^^^ bytes is
zero here. You calculate it next - I suggest to move this info message
after you get right bytes value.

> +
> +       if (tcon->ses->capabilities & CAP_LARGE_FILES) {
> +               wct = 14;
> +       } else {
> +               wct = 12;
> +               if (wdata->offset >> 32 > 0) {
> +                       /* can not handle big offset for old srv */
> +                       return -EIO;
> +               }
> +       }
> +
> +       rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
> +       if (rc)
> +               goto async_writev_out;
> +
> +       /* FIXME: only allocate number of kvecs we need */
> +       iov = kzalloc((npages + 1) * sizeof(*iov), GFP_NOFS);
> +       if (iov == NULL) {
> +               rc = -ENOMEM;
> +               goto async_writev_out;
> +       }
> +
> +       smb->AndXCommand = 0xFF;        /* none */
> +       smb->Fid = wdata->cfile->netfid;
> +       smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
> +       if (wct == 14)
> +               smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
> +       smb->Reserved = 0xFFFFFFFF;
> +       smb->WriteMode = 0;
> +       smb->Remaining = 0;
> +
> +       smb->DataOffset =
> +           cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
> +
> +       /* 4 for RFC1001 length + 1 for BCC */
> +       iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
> +       iov[0].iov_base = smb;
> +
> +       /* marshal up the pages into iov array */
> +       npages = 0;
> +       wdata->bytes = 0;
> +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && wdata->pages[i]; i++) {
> +               if (wdata->bytes + PAGE_CACHE_SIZE <=
> +                   CIFS_SB(inode->i_sb)->wsize) {
> +                       iov[i + 1].iov_len = min(inode->i_size -
> +                                             page_offset(wdata->pages[i]),
> +                                               (loff_t)PAGE_CACHE_SIZE);
> +                       iov[i + 1].iov_base = kmap(wdata->pages[i]);
> +                       wdata->bytes += iov[i + 1].iov_len;
> +                       npages++;
> +               } else {
> +                       /* if we're beyond wsize, then re-mark page dirty */
> +                       __set_page_dirty_nobuffers(wdata->pages[i]);
> +               }
> +       }
> +
> +       smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
> +       smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
> +
> +       if (wct == 14) {
> +               inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
> +               put_bcc(wdata->bytes + 1, &smb->hdr);
> +       } else {
> +               /* wct == 12 */
> +               struct smb_com_writex_req *smbw =
> +                               (struct smb_com_writex_req *)smb;
> +               inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
> +               put_bcc(wdata->bytes + 5, &smbw->hdr);
> +               iov[0].iov_len += 4; /* pad bigger by four bytes */
> +       }
> +
> +       kref_get(&wdata->refcount);
> +       rc = cifs_call_async(tcon->ses->server, iov, npages + 1,
> +                            cifs_writev_callback, wdata);
> +
> +       if (rc == 0)
> +               cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
> +       else
> +               kref_put(&wdata->refcount, cifs_writedata_release);
> +
> +       /* send is done, unmap pages */
> +       for (i = 0; i < npages; i++)
> +               kunmap(wdata->pages[i]);
> +
> +async_writev_out:
> +       cifs_small_buf_release(smb);
> +       kfree(iov);
> +       return rc;
> +}
> +
>  int
>  CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
>             const int netfid, const unsigned int count,
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Anyway, it's good. You can my "Reviewed-by: Pavel Shilovsky
<piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>" tag when you fix the notes above.

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] cifs: convert cifs_writepages to use async writes
       [not found]     ` <1301747039-31411-6-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2011-04-12 10:07       ` Pavel Shilovsky
       [not found]         ` <BANLkTi=HPkAC3kY3g5=XYHzfTLcde9C7ww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12 10:07 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> Have cifs_writepages issue asynchronous writes instead of waiting on
> each write call to complete before issuing another. This also allows us
> to return more quickly from writepages in the WB_SYNC_NONE case. It
> can just send out all of the I/Os and not wait around for the replies.
>
> In the WB_SYNC_ALL case, have it wait for all of the writes to complete
> after issuing them and return any errors that turn up from it. If those
> errors turn out to all be retryable (-EAGAIN), then retry the entire
> operation again.
>
> This also changes the page locking semantics a little bit. Instead of
> holding the page lock until the response is received, release it after
> doing the send. This will reduce contention for the page lock and should
> prevent processes that have the file mmap'ed from being blocked
> unnecessarily.
>
> Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cifs/file.c |  256 ++++++++++++++++++++++++++++++--------------------------
>  1 files changed, 136 insertions(+), 120 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index da53246..dbdbe97 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1087,32 +1087,69 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
>        return rc;
>  }
>
> +/*
> + * walk a list of in progress async writes and wait for them to complete.
> + * Check the result code on each one.
> + *
> + * There's a clear heirarchy: 0 < EAGAIN < other errors
> + *
> + * If we get an EAGAIN return on a WB_SYNC_ALL writeout, then we need to
> + * go back and do the whole operation again. If we get other errors then
> + * the mapping already likely has AS_EIO or AS_ENOSPC set, so we can
> + * give up and just return an error.
> + */
> +static int
> +cifs_wait_for_write_completion(struct address_space *mapping,
> +                               struct list_head *pending)
> +{
> +       int tmprc, rc = 0;
> +       struct cifs_writedata *wdata;
> +
> +       /* wait for writes to complete */
> +       for (;;) {
> +               /* anything left on list? */
> +               spin_lock(&mapping->private_lock);
> +               if (list_empty(pending)) {
> +                       spin_unlock(&mapping->private_lock);
> +                       break;
> +               }
> +
> +               /* dequeue an entry */
> +               wdata = list_first_entry(pending, struct cifs_writedata,
> +                                        pending);
> +               list_del_init(&wdata->pending);
> +               spin_unlock(&mapping->private_lock);
> +
> +               /* wait for it to complete */
> +               tmprc = wait_for_completion_killable(&wdata->completion);
> +               if (tmprc)
> +                       rc = tmprc;
> +
> +               if (wdata->result == -EAGAIN)
> +                       rc = rc ? rc : wdata->result;
> +               else if (wdata->result != 0)
> +                       rc = wdata->result;
> +
> +               kref_put(&wdata->refcount, cifs_writedata_release);
> +       }
> +
> +       return rc;
> +}
> +
>  static int cifs_writepages(struct address_space *mapping,
>                           struct writeback_control *wbc)
>  {
> -       unsigned int bytes_to_write;
> -       unsigned int bytes_written;
> -       struct cifs_sb_info *cifs_sb;
> -       int done = 0;
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
> +       bool done = false, scanned = false, range_whole = false;
>        pgoff_t end;
>        pgoff_t index;
> -       int range_whole = 0;
> -       struct kvec *iov;
> -       int len;
> -       int n_iov = 0;
> -       pgoff_t next;
> -       int nr_pages;
> -       __u64 offset = 0;
> +       unsigned int pvec_pages;
>        struct cifsFileInfo *open_file;
> -       struct cifs_tcon *tcon;
> -       struct cifsInodeInfo *cifsi = CIFS_I(mapping->host);
> +       struct cifs_writedata *wdata;
>        struct page *page;
>        struct pagevec pvec;
> +       struct list_head pending;
>        int rc = 0;
> -       int scanned = 0;
> -       int xid;
> -
> -       cifs_sb = CIFS_SB(mapping->host->i_sb);
>
>        /*
>         * If wsize is smaller that the page cache size, default to writing
> @@ -1121,27 +1158,19 @@ static int cifs_writepages(struct address_space *mapping,
>        if (cifs_sb->wsize < PAGE_CACHE_SIZE)
>                return generic_writepages(mapping, wbc);
>
> -       iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
> -       if (iov == NULL)
> -               return generic_writepages(mapping, wbc);
> -
>        /*
>         * if there's no open file, then this is likely to fail too,
>         * but it'll at least handle the return. Maybe it should be
>         * a BUG() instead?
>         */
>        open_file = find_writable_file(CIFS_I(mapping->host), false);
> -       if (!open_file) {
> -               kfree(iov);
> +       if (!open_file)
>                return generic_writepages(mapping, wbc);
> -       }
> -
> -       tcon = tlink_tcon(open_file->tlink);
>        cifsFileInfo_put(open_file);
>
> -       xid = GetXid();
> -
> +       INIT_LIST_HEAD(&pending);
>        pagevec_init(&pvec, 0);
> +
>        if (wbc->range_cyclic) {
>                index = mapping->writeback_index; /* Start from prev offset */
>                end = -1;
> @@ -1149,23 +1178,26 @@ static int cifs_writepages(struct address_space *mapping,
>                index = wbc->range_start >> PAGE_CACHE_SHIFT;
>                end = wbc->range_end >> PAGE_CACHE_SHIFT;
>                if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
> -                       range_whole = 1;
> -               scanned = 1;
> +                       range_whole = true;
> +               scanned = true;
>        }
>  retry:
>        while (!done && (index <= end) &&
> -              (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> +              (pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index,
>                        PAGECACHE_TAG_DIRTY,
>                        min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
> -               int first;
>                unsigned int i;
> +               unsigned int nr_pages = 0;
> +               pgoff_t next = 0;
>
> -               first = -1;
> -               next = 0;
> -               n_iov = 0;
> -               bytes_to_write = 0;
> +               wdata = cifs_writedata_alloc();
> +               if (!wdata) {
> +                       pagevec_release(&pvec);
> +                       rc = -ENOMEM;
> +                       break;
> +               }
>
> -               for (i = 0; i < nr_pages; i++) {
> +               for (i = 0; i < pvec_pages; i++) {
>                        page = pvec.pages[i];
>                        /*
>                         * At this point we hold neither mapping->tree_lock nor
> @@ -1175,7 +1207,7 @@ retry:
>                         * mapping
>                         */
>
> -                       if (first < 0)
> +                       if (nr_pages == 0)
>                                lock_page(page);
>                        else if (!trylock_page(page))
>                                break;
> @@ -1186,7 +1218,7 @@ retry:
>                        }
>
>                        if (!wbc->range_cyclic && page->index > end) {
> -                               done = 1;
> +                               done = true;
>                                unlock_page(page);
>                                break;
>                        }
> @@ -1213,119 +1245,103 @@ retry:
>                        set_page_writeback(page);
>
>                        if (page_offset(page) >= mapping->host->i_size) {
> -                               done = 1;
> +                               done = true;
>                                unlock_page(page);
>                                end_page_writeback(page);
>                                break;
>                        }
>
> -                       /*
> -                        * BB can we get rid of this?  pages are held by pvec
> -                        */
>                        page_cache_get(page);
> -
> -                       len = min(mapping->host->i_size - page_offset(page),
> -                                 (loff_t)PAGE_CACHE_SIZE);
> -
> -                       /* reserve iov[0] for the smb header */
> -                       n_iov++;
> -                       iov[n_iov].iov_base = kmap(page);
> -                       iov[n_iov].iov_len = len;
> -                       bytes_to_write += len;
> -
> -                       if (first < 0) {
> -                               first = i;
> -                               offset = page_offset(page);
> -                       }
> +                       wdata->pages[i] = page;
>                        next = page->index + 1;
> -                       if (bytes_to_write + PAGE_CACHE_SIZE > cifs_sb->wsize)
> +                       ++nr_pages;
> +
> +                       /* stop here if we have enough for a full wsize write */
> +                       if ((nr_pages + 1) * PAGE_CACHE_SIZE > cifs_sb->wsize)
>                                break;
>                }
> -               if (n_iov) {
> -retry_write:
> -                       open_file = find_writable_file(CIFS_I(mapping->host),
> -                                                       false);
> -                       if (!open_file) {
> +
> +               if (wdata->pages[0] == NULL) {
> +                       /* Need to re-find the pages we skipped */
> +                       index = pvec.pages[0]->index + 1;
> +                       pagevec_release(&pvec);
> +                       continue;
> +               }
> +
> +               pagevec_release(&pvec);
> +               wdata->offset = page_offset(wdata->pages[0]);
> +
> +               do {
> +                       if (wdata->cfile != NULL)
> +                               cifsFileInfo_put(wdata->cfile);
> +                       wdata->cfile = find_writable_file(CIFS_I(mapping->host),
> +                                                         false);
> +                       if (!wdata->cfile) {
>                                cERROR(1, "No writable handles for inode");
>                                rc = -EBADF;
> -                       } else {
> -                               rc = CIFSSMBWrite2(xid, tcon, open_file->netfid,
> -                                                  bytes_to_write, offset,
> -                                                  &bytes_written, iov, n_iov,
> -                                                  0);
> -                               cifsFileInfo_put(open_file);
> +                               break;
>                        }
> -
> -                       cFYI(1, "Write2 rc=%d, wrote=%u", rc, bytes_written);
> -
> -                       /*
> -                        * For now, treat a short write as if nothing got
> -                        * written. A zero length write however indicates
> -                        * ENOSPC or EFBIG. We have no way to know which
> -                        * though, so call it ENOSPC for now. EFBIG would
> -                        * get translated to AS_EIO anyway.
> -                        *
> -                        * FIXME: make it take into account the data that did
> -                        *        get written
> -                        */
> -                       if (rc == 0) {
> -                               if (bytes_written == 0)
> -                                       rc = -ENOSPC;
> -                               else if (bytes_written < bytes_to_write)
> -                                       rc = -EAGAIN;
> +                       if (wbc->sync_mode == WB_SYNC_ALL) {
> +                               spin_lock(&mapping->private_lock);
> +                               list_move(&wdata->pending, &pending);
> +                               spin_unlock(&mapping->private_lock);
>                        }
> +                       rc = cifs_async_writev(wdata);
> +               } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
>
> -                       /* retry on data-integrity flush */
> -                       if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
> -                               goto retry_write;
> -
> -                       /* fix the stats and EOF */
> -                       if (bytes_written > 0) {
> -                               cifs_stats_bytes_written(tcon, bytes_written);
> -                               cifs_update_eof(cifsi, offset, bytes_written);
> -                       }
> +               for (i = 0; i < nr_pages; ++i)
> +                       unlock_page(wdata->pages[i]);
>
> -                       for (i = 0; i < n_iov; i++) {
> -                               page = pvec.pages[first + i];
> -                               /* on retryable write error, redirty page */
> +               /* send failure -- clean up the mess */
> +               if (rc != 0) {
> +                       for (i = 0; i < nr_pages; ++i) {
>                                if (rc == -EAGAIN)
> -                                       redirty_page_for_writepage(wbc, page);
> -                               else if (rc != 0)
> -                                       SetPageError(page);
> -                               kunmap(page);
> -                               unlock_page(page);
> -                               end_page_writeback(page);
> -                               page_cache_release(page);
> +                                       redirty_page_for_writepage(wbc,
> +                                                          wdata->pages[i]);
> +                               else
> +                                       SetPageError(wdata->pages[i]);
> +                               end_page_writeback(wdata->pages[i]);
> +                               page_cache_release(wdata->pages[i]);
>                        }
> -
>                        if (rc != -EAGAIN)
>                                mapping_set_error(mapping, rc);
> -                       else
> -                               rc = 0;
> -
> -                       if ((wbc->nr_to_write -= n_iov) <= 0)
> -                               done = 1;
> -                       index = next;
> -               } else
> -                       /* Need to re-find the pages we skipped */
> -                       index = pvec.pages[0]->index + 1;
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +               } else if (wbc->sync_mode == WB_SYNC_NONE) {
> +                       kref_put(&wdata->refcount, cifs_writedata_release);
> +               }
>
> -               pagevec_release(&pvec);
> +               if ((wbc->nr_to_write -= nr_pages) <= 0)
> +                       done = true;
> +               index = next;
>        }
> +
>        if (!scanned && !done) {
>                /*
>                 * We hit the last page and there is more work to be done: wrap
>                 * back to the start of the file
>                 */
> -               scanned = 1;
> +               scanned = true;
>                index = 0;
>                goto retry;
>        }
> +
> +       /*
> +        * for WB_SYNC_ALL, we must wait until the writes complete. If any
> +        * return -EAGAIN however, we need to retry the whole thing again.
> +        * At that point nr_to_write will be all screwed up, but that
> +        * shouldn't really matter for WB_SYNC_ALL (right?)
> +        */
> +       if (wbc->sync_mode == WB_SYNC_ALL) {
> +               rc = cifs_wait_for_write_completion(mapping, &pending);
> +               if (rc == -EAGAIN) {
> +                       index = wbc->range_start >> PAGE_CACHE_SHIFT;
> +                       goto retry;
> +               }
> +       }
> +
>        if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
>                mapping->writeback_index = index;
>
> -       FreeXid(xid);
> -       kfree(iov);
>        return rc;
>  }
>
> --
> 1.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Good. I tested it and found out that it is ~20% faster when I write
~gigabyte file on LAN against Windows 7 (MaxWorkItems is set to 4096,
otherwise - get STATUS_INSUFF_SERVER_RESOURCES after a write
finishes).

As a test I do the following (python):
  f = os.open('file', os.O_RDWR)
  str = ''.join('q' for _ in xrange(4096))
  for _ in xrange(250000):
      os.write(f, str)
  os.close(f)

Reviewed-and-Tested-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>

-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] cifs: add cifs_async_writev
       [not found]         ` <BANLkTimHi_w55+bqFstXZZdjaMzw1mw_yA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-04-12 11:11           ` Jeff Layton
       [not found]             ` <20110412071120.75e45b38-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2011-04-12 11:11 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Tue, 12 Apr 2011 13:43:07 +0400
Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> 2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> > Add the ability for CIFS to do an asynchronous write. The kernel will
> > set the frame up as it would for a "normal" SMBWrite2 request, and use
> > cifs_call_async to send it. The mid callback will then be configured to
> > handle the result.
> >
> > Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/cifs/cifsproto.h |   19 +++++
> >  fs/cifs/cifssmb.c   |  219 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 238 insertions(+), 0 deletions(-)
> 
> Looks good too, except two small notes.
> 

Thanks for looking. I respun this yesterday actually and the new patch
is a bit different. With the new patchset, we'll no longer be
constrained by the current wsize limit of 14 pages. It should be able
to handle an arbitrary wsize, which should also help performance.


> >
> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> > index e255a2b..7b964df 100644
> > --- a/fs/cifs/cifsproto.h
> > +++ b/fs/cifs/cifsproto.h
> > @@ -21,6 +21,7 @@
> >  #ifndef _CIFSPROTO_H
> >  #define _CIFSPROTO_H
> >  #include <linux/nls.h>
> > +#include <linux/pagevec.h>
> >
> >  struct statfs;
> >  struct smb_vol;
> > @@ -433,4 +434,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
> >  extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
> >  extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
> >                        unsigned char *p24);
> > +
> > +/* asynchronous write support */
> > +struct cifs_writedata {
> > +       struct list_head        pending;
> > +       struct kref             refcount;
> > +       struct completion       completion;
> > +       struct work_struct      work;
> > +       struct cifsFileInfo     *cfile;
> > +       __u64                   offset;
> > +       unsigned int            bytes;
> > +       int                     result;
> > +       struct page             *pages[PAGEVEC_SIZE];
> > +};
> > +
> > +int cifs_async_writev(struct cifs_writedata *wdata);
> > +struct cifs_writedata *cifs_writedata_alloc(void);
> > +void cifs_writedata_release(struct kref *refcount);
> > +
> >  #endif                 /* _CIFSPROTO_H */
> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> > index 8de7e79..b308605 100644
> > --- a/fs/cifs/cifssmb.c
> > +++ b/fs/cifs/cifssmb.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/vfs.h>
> >  #include <linux/slab.h>
> >  #include <linux/posix_acl_xattr.h>
> > +#include <linux/pagemap.h>
> >  #include <asm/uaccess.h>
> >  #include "cifspdu.h"
> >  #include "cifsglob.h"
> > @@ -1604,6 +1605,224 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
> >        return rc;
> >  }
> >
> > +void
> > +cifs_writedata_release(struct kref *refcount)
> > +{
> > +       struct cifs_writedata *wdata = container_of(refcount,
> > +                                       struct cifs_writedata, refcount);
> > +       struct address_space *mapping;
> > +
> > +       if (wdata->cfile) {
> > +               mapping = wdata->cfile->dentry->d_inode->i_mapping;
> > +
> > +               spin_lock(&mapping->private_lock);
> > +               list_del(&wdata->pending);
> > +               spin_unlock(&mapping->private_lock);
> > +
> > +               cifsFileInfo_put(wdata->cfile);
> > +       }
> > +
> > +       kfree(wdata);
> > +}
> > +
> > +static void
> > +cifs_writev_complete(struct work_struct *work)
> > +{
> > +       struct cifs_writedata *wdata = container_of(work,
> > +                                               struct cifs_writedata, work);
> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>                              ^^^ this variable is unused.
> 
				Not exactly.

> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
> > +       struct page *page;
> > +       int i = 0;
> > +
> > +       if (wdata->result == 0) {
> > +               cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
> > +               cifs_stats_bytes_written(tcon, wdata->bytes);
					     ^^^^^	
					The tcon is used here
> > +       }
> > +
> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && (page = wdata->pages[i]);
> > +                       i++) {
> > +               if (wdata->result == -EAGAIN)
> > +                       __set_page_dirty_nobuffers(page);
> > +               else if (wdata->result < 0)
> > +                       SetPageError(page);
> > +               end_page_writeback(page);
> > +               page_cache_release(page);
> > +       }
> > +       if (wdata->result != -EAGAIN)
> > +               mapping_set_error(inode->i_mapping, wdata->result);
> > +       complete(&wdata->completion);
> > +       kref_put(&wdata->refcount, cifs_writedata_release);
> > +}
> > +
> > +struct cifs_writedata *
> > +cifs_writedata_alloc(void)
> > +{
> > +       struct cifs_writedata *wdata;
> > +
> > +       wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
> > +       if (wdata != NULL) {
> > +               INIT_LIST_HEAD(&wdata->pending);
> > +               INIT_WORK(&wdata->work, cifs_writev_complete);
> > +               kref_init(&wdata->refcount);
> > +               init_completion(&wdata->completion);
> > +       }
> > +       return wdata;
> > +}
> > +
> > +/*
> > + * Check the midState and signature on received buffer (if any), and queue the
> > + * workqueue completion task.
> > + */
> > +static void
> > +cifs_writev_callback(struct mid_q_entry *mid)
> > +{
> > +       struct cifs_writedata *wdata = mid->callback_data;
> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> > +       unsigned int written;
> > +       WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
> > +
> > +       switch(mid->midState) {
> > +       case MID_RESPONSE_RECEIVED:
> > +               wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
> > +               if (wdata->result != 0)
> > +                       break;
> > +
> > +               written = le16_to_cpu(smb->CountHigh);
> > +               written <<= 16;
> > +               written += le16_to_cpu(smb->Count);
> > +               /*
> > +                * Mask off high 16 bits when bytes written as returned
> > +                * by the server is greater than bytes requested by the
> > +                * client. OS/2 servers are known to set incorrect
> > +                * CountHigh values.
> > +                */
> > +               if (written > wdata->bytes)
> > +                       written &= 0xFFFF;
> > +
> > +               if (written < wdata->bytes)
> > +                       wdata->result = -ENOSPC;
> > +               else
> > +                       wdata->bytes = written;
> > +               break;
> > +       case MID_REQUEST_SUBMITTED:
> > +       case MID_RETRY_NEEDED:
> > +               wdata->result = -EAGAIN;
> > +               break;
> > +       default:
> > +               wdata->result = -EIO;
> > +               break;
> > +       }
> > +
> > +       queue_work(system_nrt_wq, &wdata->work);
> > +       DeleteMidQEntry(mid);
> > +       atomic_dec(&tcon->ses->server->inFlight);
> > +       wake_up(&tcon->ses->server->request_q);
> > +}
> > +
> > +/* cifs_async_writev - send an async write, and set up mid to handle result */
> > +int
> > +cifs_async_writev(struct cifs_writedata *wdata)
> > +{
> > +       int i, rc = -EACCES;
> > +       WRITE_REQ *smb = NULL;
> > +       int wct;
> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
> > +       unsigned int npages = ARRAY_SIZE(wdata->pages);
> > +       struct kvec *iov = NULL;
> > +
> > +       cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
>                                                        ^^^^ bytes is
> zero here. You calculate it next - I suggest to move this info message
> after you get right bytes value.
> 

Good catch. Will fix.

> > +
> > +       if (tcon->ses->capabilities & CAP_LARGE_FILES) {
> > +               wct = 14;
> > +       } else {
> > +               wct = 12;
> > +               if (wdata->offset >> 32 > 0) {
> > +                       /* can not handle big offset for old srv */
> > +                       return -EIO;
> > +               }
> > +       }
> > +
> > +       rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
> > +       if (rc)
> > +               goto async_writev_out;
> > +
> > +       /* FIXME: only allocate number of kvecs we need */
> > +       iov = kzalloc((npages + 1) * sizeof(*iov), GFP_NOFS);
> > +       if (iov == NULL) {
> > +               rc = -ENOMEM;
> > +               goto async_writev_out;
> > +       }
> > +
> > +       smb->AndXCommand = 0xFF;        /* none */
> > +       smb->Fid = wdata->cfile->netfid;
> > +       smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
> > +       if (wct == 14)
> > +               smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
> > +       smb->Reserved = 0xFFFFFFFF;
> > +       smb->WriteMode = 0;
> > +       smb->Remaining = 0;
> > +
> > +       smb->DataOffset =
> > +           cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
> > +
> > +       /* 4 for RFC1001 length + 1 for BCC */
> > +       iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
> > +       iov[0].iov_base = smb;
> > +
> > +       /* marshal up the pages into iov array */
> > +       npages = 0;
> > +       wdata->bytes = 0;
> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && wdata->pages[i]; i++) {
> > +               if (wdata->bytes + PAGE_CACHE_SIZE <=
> > +                   CIFS_SB(inode->i_sb)->wsize) {
> > +                       iov[i + 1].iov_len = min(inode->i_size -
> > +                                             page_offset(wdata->pages[i]),
> > +                                               (loff_t)PAGE_CACHE_SIZE);
> > +                       iov[i + 1].iov_base = kmap(wdata->pages[i]);
> > +                       wdata->bytes += iov[i + 1].iov_len;
> > +                       npages++;
> > +               } else {
> > +                       /* if we're beyond wsize, then re-mark page dirty */
> > +                       __set_page_dirty_nobuffers(wdata->pages[i]);
> > +               }
> > +       }
> > +
> > +       smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
> > +       smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
> > +
> > +       if (wct == 14) {
> > +               inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
> > +               put_bcc(wdata->bytes + 1, &smb->hdr);
> > +       } else {
> > +               /* wct == 12 */
> > +               struct smb_com_writex_req *smbw =
> > +                               (struct smb_com_writex_req *)smb;
> > +               inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
> > +               put_bcc(wdata->bytes + 5, &smbw->hdr);
> > +               iov[0].iov_len += 4; /* pad bigger by four bytes */
> > +       }
> > +
> > +       kref_get(&wdata->refcount);
> > +       rc = cifs_call_async(tcon->ses->server, iov, npages + 1,
> > +                            cifs_writev_callback, wdata);
> > +
> > +       if (rc == 0)
> > +               cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
> > +       else
> > +               kref_put(&wdata->refcount, cifs_writedata_release);
> > +
> > +       /* send is done, unmap pages */
> > +       for (i = 0; i < npages; i++)
> > +               kunmap(wdata->pages[i]);
> > +
> > +async_writev_out:
> > +       cifs_small_buf_release(smb);
> > +       kfree(iov);
> > +       return rc;
> > +}
> > +
> >  int
> >  CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
> >             const int netfid, const unsigned int count,
> > --
> > 1.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Anyway, it's good. You can my "Reviewed-by: Pavel Shilovsky
> <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>" tag when you fix the notes above.
> 


-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] cifs: convert cifs_writepages to use async writes
       [not found]         ` <BANLkTi=HPkAC3kY3g5=XYHzfTLcde9C7ww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-04-12 11:15           ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2011-04-12 11:15 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Tue, 12 Apr 2011 14:07:04 +0400
Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> 2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> > Have cifs_writepages issue asynchronous writes instead of waiting on
> > each write call to complete before issuing another. This also allows us
> > to return more quickly from writepages in the WB_SYNC_NONE case. It
> > can just send out all of the I/Os and not wait around for the replies.
> >
> > In the WB_SYNC_ALL case, have it wait for all of the writes to complete
> > after issuing them and return any errors that turn up from it. If those
> > errors turn out to all be retryable (-EAGAIN), then retry the entire
> > operation again.
> >
> > This also changes the page locking semantics a little bit. Instead of
> > holding the page lock until the response is received, release it after
> > doing the send. This will reduce contention for the page lock and should
> > prevent processes that have the file mmap'ed from being blocked
> > unnecessarily.
> >
> > Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/cifs/file.c |  256 ++++++++++++++++++++++++++++++--------------------------
> >  1 files changed, 136 insertions(+), 120 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index da53246..dbdbe97 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -1087,32 +1087,69 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
> >        return rc;
> >  }
> >
> > +/*
> > + * walk a list of in progress async writes and wait for them to complete.
> > + * Check the result code on each one.
> > + *
> > + * There's a clear heirarchy: 0 < EAGAIN < other errors
> > + *
> > + * If we get an EAGAIN return on a WB_SYNC_ALL writeout, then we need to
> > + * go back and do the whole operation again. If we get other errors then
> > + * the mapping already likely has AS_EIO or AS_ENOSPC set, so we can
> > + * give up and just return an error.
> > + */
> > +static int
> > +cifs_wait_for_write_completion(struct address_space *mapping,
> > +                               struct list_head *pending)
> > +{
> > +       int tmprc, rc = 0;
> > +       struct cifs_writedata *wdata;
> > +
> > +       /* wait for writes to complete */
> > +       for (;;) {
> > +               /* anything left on list? */
> > +               spin_lock(&mapping->private_lock);
> > +               if (list_empty(pending)) {
> > +                       spin_unlock(&mapping->private_lock);
> > +                       break;
> > +               }
> > +
> > +               /* dequeue an entry */
> > +               wdata = list_first_entry(pending, struct cifs_writedata,
> > +                                        pending);
> > +               list_del_init(&wdata->pending);
> > +               spin_unlock(&mapping->private_lock);
> > +
> > +               /* wait for it to complete */
> > +               tmprc = wait_for_completion_killable(&wdata->completion);
> > +               if (tmprc)
> > +                       rc = tmprc;
> > +
> > +               if (wdata->result == -EAGAIN)
> > +                       rc = rc ? rc : wdata->result;
> > +               else if (wdata->result != 0)
> > +                       rc = wdata->result;
> > +
> > +               kref_put(&wdata->refcount, cifs_writedata_release);
> > +       }
> > +
> > +       return rc;
> > +}
> > +
> >  static int cifs_writepages(struct address_space *mapping,
> >                           struct writeback_control *wbc)
> >  {
> > -       unsigned int bytes_to_write;
> > -       unsigned int bytes_written;
> > -       struct cifs_sb_info *cifs_sb;
> > -       int done = 0;
> > +       struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
> > +       bool done = false, scanned = false, range_whole = false;
> >        pgoff_t end;
> >        pgoff_t index;
> > -       int range_whole = 0;
> > -       struct kvec *iov;
> > -       int len;
> > -       int n_iov = 0;
> > -       pgoff_t next;
> > -       int nr_pages;
> > -       __u64 offset = 0;
> > +       unsigned int pvec_pages;
> >        struct cifsFileInfo *open_file;
> > -       struct cifs_tcon *tcon;
> > -       struct cifsInodeInfo *cifsi = CIFS_I(mapping->host);
> > +       struct cifs_writedata *wdata;
> >        struct page *page;
> >        struct pagevec pvec;
> > +       struct list_head pending;
> >        int rc = 0;
> > -       int scanned = 0;
> > -       int xid;
> > -
> > -       cifs_sb = CIFS_SB(mapping->host->i_sb);
> >
> >        /*
> >         * If wsize is smaller that the page cache size, default to writing
> > @@ -1121,27 +1158,19 @@ static int cifs_writepages(struct address_space *mapping,
> >        if (cifs_sb->wsize < PAGE_CACHE_SIZE)
> >                return generic_writepages(mapping, wbc);
> >
> > -       iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
> > -       if (iov == NULL)
> > -               return generic_writepages(mapping, wbc);
> > -
> >        /*
> >         * if there's no open file, then this is likely to fail too,
> >         * but it'll at least handle the return. Maybe it should be
> >         * a BUG() instead?
> >         */
> >        open_file = find_writable_file(CIFS_I(mapping->host), false);
> > -       if (!open_file) {
> > -               kfree(iov);
> > +       if (!open_file)
> >                return generic_writepages(mapping, wbc);
> > -       }
> > -
> > -       tcon = tlink_tcon(open_file->tlink);
> >        cifsFileInfo_put(open_file);
> >
> > -       xid = GetXid();
> > -
> > +       INIT_LIST_HEAD(&pending);
> >        pagevec_init(&pvec, 0);
> > +
> >        if (wbc->range_cyclic) {
> >                index = mapping->writeback_index; /* Start from prev offset */
> >                end = -1;
> > @@ -1149,23 +1178,26 @@ static int cifs_writepages(struct address_space *mapping,
> >                index = wbc->range_start >> PAGE_CACHE_SHIFT;
> >                end = wbc->range_end >> PAGE_CACHE_SHIFT;
> >                if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
> > -                       range_whole = 1;
> > -               scanned = 1;
> > +                       range_whole = true;
> > +               scanned = true;
> >        }
> >  retry:
> >        while (!done && (index <= end) &&
> > -              (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> > +              (pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> >                        PAGECACHE_TAG_DIRTY,
> >                        min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
> > -               int first;
> >                unsigned int i;
> > +               unsigned int nr_pages = 0;
> > +               pgoff_t next = 0;
> >
> > -               first = -1;
> > -               next = 0;
> > -               n_iov = 0;
> > -               bytes_to_write = 0;
> > +               wdata = cifs_writedata_alloc();
> > +               if (!wdata) {
> > +                       pagevec_release(&pvec);
> > +                       rc = -ENOMEM;
> > +                       break;
> > +               }
> >
> > -               for (i = 0; i < nr_pages; i++) {
> > +               for (i = 0; i < pvec_pages; i++) {
> >                        page = pvec.pages[i];
> >                        /*
> >                         * At this point we hold neither mapping->tree_lock nor
> > @@ -1175,7 +1207,7 @@ retry:
> >                         * mapping
> >                         */
> >
> > -                       if (first < 0)
> > +                       if (nr_pages == 0)
> >                                lock_page(page);
> >                        else if (!trylock_page(page))
> >                                break;
> > @@ -1186,7 +1218,7 @@ retry:
> >                        }
> >
> >                        if (!wbc->range_cyclic && page->index > end) {
> > -                               done = 1;
> > +                               done = true;
> >                                unlock_page(page);
> >                                break;
> >                        }
> > @@ -1213,119 +1245,103 @@ retry:
> >                        set_page_writeback(page);
> >
> >                        if (page_offset(page) >= mapping->host->i_size) {
> > -                               done = 1;
> > +                               done = true;
> >                                unlock_page(page);
> >                                end_page_writeback(page);
> >                                break;
> >                        }
> >
> > -                       /*
> > -                        * BB can we get rid of this?  pages are held by pvec
> > -                        */
> >                        page_cache_get(page);
> > -
> > -                       len = min(mapping->host->i_size - page_offset(page),
> > -                                 (loff_t)PAGE_CACHE_SIZE);
> > -
> > -                       /* reserve iov[0] for the smb header */
> > -                       n_iov++;
> > -                       iov[n_iov].iov_base = kmap(page);
> > -                       iov[n_iov].iov_len = len;
> > -                       bytes_to_write += len;
> > -
> > -                       if (first < 0) {
> > -                               first = i;
> > -                               offset = page_offset(page);
> > -                       }
> > +                       wdata->pages[i] = page;
> >                        next = page->index + 1;
> > -                       if (bytes_to_write + PAGE_CACHE_SIZE > cifs_sb->wsize)
> > +                       ++nr_pages;
> > +
> > +                       /* stop here if we have enough for a full wsize write */
> > +                       if ((nr_pages + 1) * PAGE_CACHE_SIZE > cifs_sb->wsize)
> >                                break;
> >                }
> > -               if (n_iov) {
> > -retry_write:
> > -                       open_file = find_writable_file(CIFS_I(mapping->host),
> > -                                                       false);
> > -                       if (!open_file) {
> > +
> > +               if (wdata->pages[0] == NULL) {
> > +                       /* Need to re-find the pages we skipped */
> > +                       index = pvec.pages[0]->index + 1;
> > +                       pagevec_release(&pvec);
> > +                       continue;
> > +               }
> > +
> > +               pagevec_release(&pvec);
> > +               wdata->offset = page_offset(wdata->pages[0]);
> > +
> > +               do {
> > +                       if (wdata->cfile != NULL)
> > +                               cifsFileInfo_put(wdata->cfile);
> > +                       wdata->cfile = find_writable_file(CIFS_I(mapping->host),
> > +                                                         false);
> > +                       if (!wdata->cfile) {
> >                                cERROR(1, "No writable handles for inode");
> >                                rc = -EBADF;
> > -                       } else {
> > -                               rc = CIFSSMBWrite2(xid, tcon, open_file->netfid,
> > -                                                  bytes_to_write, offset,
> > -                                                  &bytes_written, iov, n_iov,
> > -                                                  0);
> > -                               cifsFileInfo_put(open_file);
> > +                               break;
> >                        }
> > -
> > -                       cFYI(1, "Write2 rc=%d, wrote=%u", rc, bytes_written);
> > -
> > -                       /*
> > -                        * For now, treat a short write as if nothing got
> > -                        * written. A zero length write however indicates
> > -                        * ENOSPC or EFBIG. We have no way to know which
> > -                        * though, so call it ENOSPC for now. EFBIG would
> > -                        * get translated to AS_EIO anyway.
> > -                        *
> > -                        * FIXME: make it take into account the data that did
> > -                        *        get written
> > -                        */
> > -                       if (rc == 0) {
> > -                               if (bytes_written == 0)
> > -                                       rc = -ENOSPC;
> > -                               else if (bytes_written < bytes_to_write)
> > -                                       rc = -EAGAIN;
> > +                       if (wbc->sync_mode == WB_SYNC_ALL) {
> > +                               spin_lock(&mapping->private_lock);
> > +                               list_move(&wdata->pending, &pending);
> > +                               spin_unlock(&mapping->private_lock);
> >                        }
> > +                       rc = cifs_async_writev(wdata);
> > +               } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
> >
> > -                       /* retry on data-integrity flush */
> > -                       if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN)
> > -                               goto retry_write;
> > -
> > -                       /* fix the stats and EOF */
> > -                       if (bytes_written > 0) {
> > -                               cifs_stats_bytes_written(tcon, bytes_written);
> > -                               cifs_update_eof(cifsi, offset, bytes_written);
> > -                       }
> > +               for (i = 0; i < nr_pages; ++i)
> > +                       unlock_page(wdata->pages[i]);
> >
> > -                       for (i = 0; i < n_iov; i++) {
> > -                               page = pvec.pages[first + i];
> > -                               /* on retryable write error, redirty page */
> > +               /* send failure -- clean up the mess */
> > +               if (rc != 0) {
> > +                       for (i = 0; i < nr_pages; ++i) {
> >                                if (rc == -EAGAIN)
> > -                                       redirty_page_for_writepage(wbc, page);
> > -                               else if (rc != 0)
> > -                                       SetPageError(page);
> > -                               kunmap(page);
> > -                               unlock_page(page);
> > -                               end_page_writeback(page);
> > -                               page_cache_release(page);
> > +                                       redirty_page_for_writepage(wbc,
> > +                                                          wdata->pages[i]);
> > +                               else
> > +                                       SetPageError(wdata->pages[i]);
> > +                               end_page_writeback(wdata->pages[i]);
> > +                               page_cache_release(wdata->pages[i]);
> >                        }
> > -
> >                        if (rc != -EAGAIN)
> >                                mapping_set_error(mapping, rc);
> > -                       else
> > -                               rc = 0;
> > -
> > -                       if ((wbc->nr_to_write -= n_iov) <= 0)
> > -                               done = 1;
> > -                       index = next;
> > -               } else
> > -                       /* Need to re-find the pages we skipped */
> > -                       index = pvec.pages[0]->index + 1;
> > +                       kref_put(&wdata->refcount, cifs_writedata_release);
> > +               } else if (wbc->sync_mode == WB_SYNC_NONE) {
> > +                       kref_put(&wdata->refcount, cifs_writedata_release);
> > +               }
> >
> > -               pagevec_release(&pvec);
> > +               if ((wbc->nr_to_write -= nr_pages) <= 0)
> > +                       done = true;
> > +               index = next;
> >        }
> > +
> >        if (!scanned && !done) {
> >                /*
> >                 * We hit the last page and there is more work to be done: wrap
> >                 * back to the start of the file
> >                 */
> > -               scanned = 1;
> > +               scanned = true;
> >                index = 0;
> >                goto retry;
> >        }
> > +
> > +       /*
> > +        * for WB_SYNC_ALL, we must wait until the writes complete. If any
> > +        * return -EAGAIN however, we need to retry the whole thing again.
> > +        * At that point nr_to_write will be all screwed up, but that
> > +        * shouldn't really matter for WB_SYNC_ALL (right?)
> > +        */
> > +       if (wbc->sync_mode == WB_SYNC_ALL) {
> > +               rc = cifs_wait_for_write_completion(mapping, &pending);
> > +               if (rc == -EAGAIN) {
> > +                       index = wbc->range_start >> PAGE_CACHE_SHIFT;
> > +                       goto retry;
> > +               }
> > +       }
> > +
> >        if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
> >                mapping->writeback_index = index;
> >
> > -       FreeXid(xid);
> > -       kfree(iov);
> >        return rc;
> >  }
> >
> > --
> > 1.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Good. I tested it and found out that it is ~20% faster when I write
> ~gigabyte file on LAN against Windows 7 (MaxWorkItems is set to 4096,
> otherwise - get STATUS_INSUFF_SERVER_RESOURCES after a write
> finishes).
> 
> As a test I do the following (python):
>   f = os.open('file', os.O_RDWR)
>   str = ''.join('q' for _ in xrange(4096))
>   for _ in xrange(250000):
>       os.write(f, str)
>   os.close(f)
> 
> Reviewed-and-Tested-by: Pavel Shilovsky <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>
> 

Thanks for testing it! Yes, I think I'll need to make this patch
respect the cifs_max_pending module parameter so we don't end up with
too many outstanding writes at a time. At the same time, we probably
ought to see about raising that above the hardcoded limit of 50, or
come up with a better heuristic for autonegotiating it.

I should have a new set in about a week or so, but I also need to do a
bit of research on how best to negotiate a larger wsize. That should
help performance even more I think.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] cifs: add cifs_async_writev
       [not found]             ` <20110412071120.75e45b38-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
@ 2011-04-12 11:24               ` Pavel Shilovsky
       [not found]                 ` <BANLkTi=G1U591vh4cc6XxT=FRe0puir2Ew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-12 11:24 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/12 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
> On Tue, 12 Apr 2011 13:43:07 +0400
> Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> 2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
>> > Add the ability for CIFS to do an asynchronous write. The kernel will
>> > set the frame up as it would for a "normal" SMBWrite2 request, and use
>> > cifs_call_async to send it. The mid callback will then be configured to
>> > handle the result.
>> >
>> > Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> > ---
>> >  fs/cifs/cifsproto.h |   19 +++++
>> >  fs/cifs/cifssmb.c   |  219 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 238 insertions(+), 0 deletions(-)
>>
>> Looks good too, except two small notes.
>>
>
> Thanks for looking. I respun this yesterday actually and the new patch
> is a bit different. With the new patchset, we'll no longer be
> constrained by the current wsize limit of 14 pages. It should be able
> to handle an arbitrary wsize, which should also help performance.
>
>
>> >
>> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
>> > index e255a2b..7b964df 100644
>> > --- a/fs/cifs/cifsproto.h
>> > +++ b/fs/cifs/cifsproto.h
>> > @@ -21,6 +21,7 @@
>> >  #ifndef _CIFSPROTO_H
>> >  #define _CIFSPROTO_H
>> >  #include <linux/nls.h>
>> > +#include <linux/pagevec.h>
>> >
>> >  struct statfs;
>> >  struct smb_vol;
>> > @@ -433,4 +434,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
>> >  extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
>> >  extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
>> >                        unsigned char *p24);
>> > +
>> > +/* asynchronous write support */
>> > +struct cifs_writedata {
>> > +       struct list_head        pending;
>> > +       struct kref             refcount;
>> > +       struct completion       completion;
>> > +       struct work_struct      work;
>> > +       struct cifsFileInfo     *cfile;
>> > +       __u64                   offset;
>> > +       unsigned int            bytes;
>> > +       int                     result;
>> > +       struct page             *pages[PAGEVEC_SIZE];
>> > +};
>> > +
>> > +int cifs_async_writev(struct cifs_writedata *wdata);
>> > +struct cifs_writedata *cifs_writedata_alloc(void);
>> > +void cifs_writedata_release(struct kref *refcount);
>> > +
>> >  #endif                 /* _CIFSPROTO_H */
>> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
>> > index 8de7e79..b308605 100644
>> > --- a/fs/cifs/cifssmb.c
>> > +++ b/fs/cifs/cifssmb.c
>> > @@ -32,6 +32,7 @@
>> >  #include <linux/vfs.h>
>> >  #include <linux/slab.h>
>> >  #include <linux/posix_acl_xattr.h>
>> > +#include <linux/pagemap.h>
>> >  #include <asm/uaccess.h>
>> >  #include "cifspdu.h"
>> >  #include "cifsglob.h"
>> > @@ -1604,6 +1605,224 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
>> >        return rc;
>> >  }
>> >
>> > +void
>> > +cifs_writedata_release(struct kref *refcount)
>> > +{
>> > +       struct cifs_writedata *wdata = container_of(refcount,
>> > +                                       struct cifs_writedata, refcount);
>> > +       struct address_space *mapping;
>> > +
>> > +       if (wdata->cfile) {
>> > +               mapping = wdata->cfile->dentry->d_inode->i_mapping;
>> > +
>> > +               spin_lock(&mapping->private_lock);
>> > +               list_del(&wdata->pending);
>> > +               spin_unlock(&mapping->private_lock);
>> > +
>> > +               cifsFileInfo_put(wdata->cfile);
>> > +       }
>> > +
>> > +       kfree(wdata);
>> > +}
>> > +
>> > +static void
>> > +cifs_writev_complete(struct work_struct *work)
>> > +{
>> > +       struct cifs_writedata *wdata = container_of(work,
>> > +                                               struct cifs_writedata, work);
>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>>                              ^^^ this variable is unused.
>>
>                                Not exactly.
>
>> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
>> > +       struct page *page;
>> > +       int i = 0;
>> > +
>> > +       if (wdata->result == 0) {
>> > +               cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
>> > +               cifs_stats_bytes_written(tcon, wdata->bytes);
>                                             ^^^^^
>                                        The tcon is used here

Strange, the compiler (gcc-4.4.4-1ubuntu2) reports it:
/home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c: In function
‘cifs_writev_complete’:
/home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c:1633: warning: unused
variable ‘tcon’

but you are right - it is used.

>> > +       }
>> > +
>> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && (page = wdata->pages[i]);
>> > +                       i++) {
>> > +               if (wdata->result == -EAGAIN)
>> > +                       __set_page_dirty_nobuffers(page);
>> > +               else if (wdata->result < 0)
>> > +                       SetPageError(page);
>> > +               end_page_writeback(page);
>> > +               page_cache_release(page);
>> > +       }
>> > +       if (wdata->result != -EAGAIN)
>> > +               mapping_set_error(inode->i_mapping, wdata->result);
>> > +       complete(&wdata->completion);
>> > +       kref_put(&wdata->refcount, cifs_writedata_release);
>> > +}
>> > +
>> > +struct cifs_writedata *
>> > +cifs_writedata_alloc(void)
>> > +{
>> > +       struct cifs_writedata *wdata;
>> > +
>> > +       wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
>> > +       if (wdata != NULL) {
>> > +               INIT_LIST_HEAD(&wdata->pending);
>> > +               INIT_WORK(&wdata->work, cifs_writev_complete);
>> > +               kref_init(&wdata->refcount);
>> > +               init_completion(&wdata->completion);
>> > +       }
>> > +       return wdata;
>> > +}
>> > +
>> > +/*
>> > + * Check the midState and signature on received buffer (if any), and queue the
>> > + * workqueue completion task.
>> > + */
>> > +static void
>> > +cifs_writev_callback(struct mid_q_entry *mid)
>> > +{
>> > +       struct cifs_writedata *wdata = mid->callback_data;
>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>> > +       unsigned int written;
>> > +       WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
>> > +
>> > +       switch(mid->midState) {
>> > +       case MID_RESPONSE_RECEIVED:
>> > +               wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
>> > +               if (wdata->result != 0)
>> > +                       break;
>> > +
>> > +               written = le16_to_cpu(smb->CountHigh);
>> > +               written <<= 16;
>> > +               written += le16_to_cpu(smb->Count);
>> > +               /*
>> > +                * Mask off high 16 bits when bytes written as returned
>> > +                * by the server is greater than bytes requested by the
>> > +                * client. OS/2 servers are known to set incorrect
>> > +                * CountHigh values.
>> > +                */
>> > +               if (written > wdata->bytes)
>> > +                       written &= 0xFFFF;
>> > +
>> > +               if (written < wdata->bytes)
>> > +                       wdata->result = -ENOSPC;
>> > +               else
>> > +                       wdata->bytes = written;
>> > +               break;
>> > +       case MID_REQUEST_SUBMITTED:
>> > +       case MID_RETRY_NEEDED:
>> > +               wdata->result = -EAGAIN;
>> > +               break;
>> > +       default:
>> > +               wdata->result = -EIO;
>> > +               break;
>> > +       }
>> > +
>> > +       queue_work(system_nrt_wq, &wdata->work);
>> > +       DeleteMidQEntry(mid);
>> > +       atomic_dec(&tcon->ses->server->inFlight);
>> > +       wake_up(&tcon->ses->server->request_q);
>> > +}
>> > +
>> > +/* cifs_async_writev - send an async write, and set up mid to handle result */
>> > +int
>> > +cifs_async_writev(struct cifs_writedata *wdata)
>> > +{
>> > +       int i, rc = -EACCES;
>> > +       WRITE_REQ *smb = NULL;
>> > +       int wct;
>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
>> > +       unsigned int npages = ARRAY_SIZE(wdata->pages);
>> > +       struct kvec *iov = NULL;
>> > +
>> > +       cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
>>                                                        ^^^^ bytes is
>> zero here. You calculate it next - I suggest to move this info message
>> after you get right bytes value.
>>
>
> Good catch. Will fix.
>
>> > +
>> > +       if (tcon->ses->capabilities & CAP_LARGE_FILES) {
>> > +               wct = 14;
>> > +       } else {
>> > +               wct = 12;
>> > +               if (wdata->offset >> 32 > 0) {
>> > +                       /* can not handle big offset for old srv */
>> > +                       return -EIO;
>> > +               }
>> > +       }
>> > +
>> > +       rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
>> > +       if (rc)
>> > +               goto async_writev_out;
>> > +
>> > +       /* FIXME: only allocate number of kvecs we need */
>> > +       iov = kzalloc((npages + 1) * sizeof(*iov), GFP_NOFS);
>> > +       if (iov == NULL) {
>> > +               rc = -ENOMEM;
>> > +               goto async_writev_out;
>> > +       }
>> > +
>> > +       smb->AndXCommand = 0xFF;        /* none */
>> > +       smb->Fid = wdata->cfile->netfid;
>> > +       smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
>> > +       if (wct == 14)
>> > +               smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
>> > +       smb->Reserved = 0xFFFFFFFF;
>> > +       smb->WriteMode = 0;
>> > +       smb->Remaining = 0;
>> > +
>> > +       smb->DataOffset =
>> > +           cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
>> > +
>> > +       /* 4 for RFC1001 length + 1 for BCC */
>> > +       iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
>> > +       iov[0].iov_base = smb;
>> > +
>> > +       /* marshal up the pages into iov array */
>> > +       npages = 0;
>> > +       wdata->bytes = 0;
>> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && wdata->pages[i]; i++) {
>> > +               if (wdata->bytes + PAGE_CACHE_SIZE <=
>> > +                   CIFS_SB(inode->i_sb)->wsize) {
>> > +                       iov[i + 1].iov_len = min(inode->i_size -
>> > +                                             page_offset(wdata->pages[i]),
>> > +                                               (loff_t)PAGE_CACHE_SIZE);
>> > +                       iov[i + 1].iov_base = kmap(wdata->pages[i]);
>> > +                       wdata->bytes += iov[i + 1].iov_len;
>> > +                       npages++;
>> > +               } else {
>> > +                       /* if we're beyond wsize, then re-mark page dirty */
>> > +                       __set_page_dirty_nobuffers(wdata->pages[i]);
>> > +               }
>> > +       }
>> > +
>> > +       smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
>> > +       smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
>> > +
>> > +       if (wct == 14) {
>> > +               inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
>> > +               put_bcc(wdata->bytes + 1, &smb->hdr);
>> > +       } else {
>> > +               /* wct == 12 */
>> > +               struct smb_com_writex_req *smbw =
>> > +                               (struct smb_com_writex_req *)smb;
>> > +               inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
>> > +               put_bcc(wdata->bytes + 5, &smbw->hdr);
>> > +               iov[0].iov_len += 4; /* pad bigger by four bytes */
>> > +       }
>> > +
>> > +       kref_get(&wdata->refcount);
>> > +       rc = cifs_call_async(tcon->ses->server, iov, npages + 1,
>> > +                            cifs_writev_callback, wdata);
>> > +
>> > +       if (rc == 0)
>> > +               cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
>> > +       else
>> > +               kref_put(&wdata->refcount, cifs_writedata_release);
>> > +
>> > +       /* send is done, unmap pages */
>> > +       for (i = 0; i < npages; i++)
>> > +               kunmap(wdata->pages[i]);
>> > +
>> > +async_writev_out:
>> > +       cifs_small_buf_release(smb);
>> > +       kfree(iov);
>> > +       return rc;
>> > +}
>> > +
>> >  int
>> >  CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
>> >             const int netfid, const unsigned int count,
>> > --
>> > 1.7.4
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>> Anyway, it's good. You can my "Reviewed-by: Pavel Shilovsky
>> <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>" tag when you fix the notes above.
>>
>
>
> --
> Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>



-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] cifs: add cifs_async_writev
       [not found]                 ` <BANLkTi=G1U591vh4cc6XxT=FRe0puir2Ew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-04-15  7:52                   ` Pavel Shilovsky
       [not found]                     ` <BANLkTimhDSgsq=Obq-TRSpxQrJZpT6+5iA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2011-04-15  7:52 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

2011/4/12 Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> 2011/4/12 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
>> On Tue, 12 Apr 2011 13:43:07 +0400
>> Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>>> 2011/4/2 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>:
>>> > Add the ability for CIFS to do an asynchronous write. The kernel will
>>> > set the frame up as it would for a "normal" SMBWrite2 request, and use
>>> > cifs_call_async to send it. The mid callback will then be configured to
>>> > handle the result.
>>> >
>>> > Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> > ---
>>> >  fs/cifs/cifsproto.h |   19 +++++
>>> >  fs/cifs/cifssmb.c   |  219 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >  2 files changed, 238 insertions(+), 0 deletions(-)
>>>
>>> Looks good too, except two small notes.
>>>
>>
>> Thanks for looking. I respun this yesterday actually and the new patch
>> is a bit different. With the new patchset, we'll no longer be
>> constrained by the current wsize limit of 14 pages. It should be able
>> to handle an arbitrary wsize, which should also help performance.
>>
>>
>>> >
>>> > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
>>> > index e255a2b..7b964df 100644
>>> > --- a/fs/cifs/cifsproto.h
>>> > +++ b/fs/cifs/cifsproto.h
>>> > @@ -21,6 +21,7 @@
>>> >  #ifndef _CIFSPROTO_H
>>> >  #define _CIFSPROTO_H
>>> >  #include <linux/nls.h>
>>> > +#include <linux/pagevec.h>
>>> >
>>> >  struct statfs;
>>> >  struct smb_vol;
>>> > @@ -433,4 +434,22 @@ extern int mdfour(unsigned char *, unsigned char *, int);
>>> >  extern int E_md4hash(const unsigned char *passwd, unsigned char *p16);
>>> >  extern int SMBencrypt(unsigned char *passwd, const unsigned char *c8,
>>> >                        unsigned char *p24);
>>> > +
>>> > +/* asynchronous write support */
>>> > +struct cifs_writedata {
>>> > +       struct list_head        pending;
>>> > +       struct kref             refcount;
>>> > +       struct completion       completion;
>>> > +       struct work_struct      work;
>>> > +       struct cifsFileInfo     *cfile;
>>> > +       __u64                   offset;
>>> > +       unsigned int            bytes;
>>> > +       int                     result;
>>> > +       struct page             *pages[PAGEVEC_SIZE];
>>> > +};
>>> > +
>>> > +int cifs_async_writev(struct cifs_writedata *wdata);
>>> > +struct cifs_writedata *cifs_writedata_alloc(void);
>>> > +void cifs_writedata_release(struct kref *refcount);
>>> > +
>>> >  #endif                 /* _CIFSPROTO_H */
>>> > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
>>> > index 8de7e79..b308605 100644
>>> > --- a/fs/cifs/cifssmb.c
>>> > +++ b/fs/cifs/cifssmb.c
>>> > @@ -32,6 +32,7 @@
>>> >  #include <linux/vfs.h>
>>> >  #include <linux/slab.h>
>>> >  #include <linux/posix_acl_xattr.h>
>>> > +#include <linux/pagemap.h>
>>> >  #include <asm/uaccess.h>
>>> >  #include "cifspdu.h"
>>> >  #include "cifsglob.h"
>>> > @@ -1604,6 +1605,224 @@ CIFSSMBWrite(const int xid, struct cifs_tcon *tcon,
>>> >        return rc;
>>> >  }
>>> >
>>> > +void
>>> > +cifs_writedata_release(struct kref *refcount)
>>> > +{
>>> > +       struct cifs_writedata *wdata = container_of(refcount,
>>> > +                                       struct cifs_writedata, refcount);
>>> > +       struct address_space *mapping;
>>> > +
>>> > +       if (wdata->cfile) {
>>> > +               mapping = wdata->cfile->dentry->d_inode->i_mapping;
>>> > +
>>> > +               spin_lock(&mapping->private_lock);
>>> > +               list_del(&wdata->pending);
>>> > +               spin_unlock(&mapping->private_lock);
>>> > +
>>> > +               cifsFileInfo_put(wdata->cfile);
>>> > +       }
>>> > +
>>> > +       kfree(wdata);
>>> > +}
>>> > +
>>> > +static void
>>> > +cifs_writev_complete(struct work_struct *work)
>>> > +{
>>> > +       struct cifs_writedata *wdata = container_of(work,
>>> > +                                               struct cifs_writedata, work);
>>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>>>                              ^^^ this variable is unused.
>>>
>>                                Not exactly.
>>
>>> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
>>> > +       struct page *page;
>>> > +       int i = 0;
>>> > +
>>> > +       if (wdata->result == 0) {
>>> > +               cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
>>> > +               cifs_stats_bytes_written(tcon, wdata->bytes);
>>                                             ^^^^^
>>                                        The tcon is used here
>
> Strange, the compiler (gcc-4.4.4-1ubuntu2) reports it:
> /home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c: In function
> ‘cifs_writev_complete’:
> /home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c:1633: warning: unused
> variable ‘tcon’
>
> but you are right - it is used.

figured it out: I compiled module without CONFIG_CIFS_STATS.

>
>>> > +       }
>>> > +
>>> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && (page = wdata->pages[i]);
>>> > +                       i++) {
>>> > +               if (wdata->result == -EAGAIN)
>>> > +                       __set_page_dirty_nobuffers(page);
>>> > +               else if (wdata->result < 0)
>>> > +                       SetPageError(page);
>>> > +               end_page_writeback(page);
>>> > +               page_cache_release(page);
>>> > +       }
>>> > +       if (wdata->result != -EAGAIN)
>>> > +               mapping_set_error(inode->i_mapping, wdata->result);
>>> > +       complete(&wdata->completion);
>>> > +       kref_put(&wdata->refcount, cifs_writedata_release);
>>> > +}
>>> > +
>>> > +struct cifs_writedata *
>>> > +cifs_writedata_alloc(void)
>>> > +{
>>> > +       struct cifs_writedata *wdata;
>>> > +
>>> > +       wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
>>> > +       if (wdata != NULL) {
>>> > +               INIT_LIST_HEAD(&wdata->pending);
>>> > +               INIT_WORK(&wdata->work, cifs_writev_complete);
>>> > +               kref_init(&wdata->refcount);
>>> > +               init_completion(&wdata->completion);
>>> > +       }
>>> > +       return wdata;
>>> > +}
>>> > +
>>> > +/*
>>> > + * Check the midState and signature on received buffer (if any), and queue the
>>> > + * workqueue completion task.
>>> > + */
>>> > +static void
>>> > +cifs_writev_callback(struct mid_q_entry *mid)
>>> > +{
>>> > +       struct cifs_writedata *wdata = mid->callback_data;
>>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>>> > +       unsigned int written;
>>> > +       WRITE_RSP *smb = (WRITE_RSP *)mid->resp_buf;
>>> > +
>>> > +       switch(mid->midState) {
>>> > +       case MID_RESPONSE_RECEIVED:
>>> > +               wdata->result = cifs_check_receive(mid, tcon->ses->server, 0);
>>> > +               if (wdata->result != 0)
>>> > +                       break;
>>> > +
>>> > +               written = le16_to_cpu(smb->CountHigh);
>>> > +               written <<= 16;
>>> > +               written += le16_to_cpu(smb->Count);
>>> > +               /*
>>> > +                * Mask off high 16 bits when bytes written as returned
>>> > +                * by the server is greater than bytes requested by the
>>> > +                * client. OS/2 servers are known to set incorrect
>>> > +                * CountHigh values.
>>> > +                */
>>> > +               if (written > wdata->bytes)
>>> > +                       written &= 0xFFFF;
>>> > +
>>> > +               if (written < wdata->bytes)
>>> > +                       wdata->result = -ENOSPC;
>>> > +               else
>>> > +                       wdata->bytes = written;
>>> > +               break;
>>> > +       case MID_REQUEST_SUBMITTED:
>>> > +       case MID_RETRY_NEEDED:
>>> > +               wdata->result = -EAGAIN;
>>> > +               break;
>>> > +       default:
>>> > +               wdata->result = -EIO;
>>> > +               break;
>>> > +       }
>>> > +
>>> > +       queue_work(system_nrt_wq, &wdata->work);
>>> > +       DeleteMidQEntry(mid);
>>> > +       atomic_dec(&tcon->ses->server->inFlight);
>>> > +       wake_up(&tcon->ses->server->request_q);
>>> > +}
>>> > +
>>> > +/* cifs_async_writev - send an async write, and set up mid to handle result */
>>> > +int
>>> > +cifs_async_writev(struct cifs_writedata *wdata)
>>> > +{
>>> > +       int i, rc = -EACCES;
>>> > +       WRITE_REQ *smb = NULL;
>>> > +       int wct;
>>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
>>> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
>>> > +       unsigned int npages = ARRAY_SIZE(wdata->pages);
>>> > +       struct kvec *iov = NULL;
>>> > +
>>> > +       cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
>>>                                                        ^^^^ bytes is
>>> zero here. You calculate it next - I suggest to move this info message
>>> after you get right bytes value.
>>>
>>
>> Good catch. Will fix.
>>
>>> > +
>>> > +       if (tcon->ses->capabilities & CAP_LARGE_FILES) {
>>> > +               wct = 14;
>>> > +       } else {
>>> > +               wct = 12;
>>> > +               if (wdata->offset >> 32 > 0) {
>>> > +                       /* can not handle big offset for old srv */
>>> > +                       return -EIO;
>>> > +               }
>>> > +       }
>>> > +
>>> > +       rc = small_smb_init(SMB_COM_WRITE_ANDX, wct, tcon, (void **)&smb);
>>> > +       if (rc)
>>> > +               goto async_writev_out;
>>> > +
>>> > +       /* FIXME: only allocate number of kvecs we need */
>>> > +       iov = kzalloc((npages + 1) * sizeof(*iov), GFP_NOFS);
>>> > +       if (iov == NULL) {
>>> > +               rc = -ENOMEM;
>>> > +               goto async_writev_out;
>>> > +       }
>>> > +
>>> > +       smb->AndXCommand = 0xFF;        /* none */
>>> > +       smb->Fid = wdata->cfile->netfid;
>>> > +       smb->OffsetLow = cpu_to_le32(wdata->offset & 0xFFFFFFFF);
>>> > +       if (wct == 14)
>>> > +               smb->OffsetHigh = cpu_to_le32(wdata->offset >> 32);
>>> > +       smb->Reserved = 0xFFFFFFFF;
>>> > +       smb->WriteMode = 0;
>>> > +       smb->Remaining = 0;
>>> > +
>>> > +       smb->DataOffset =
>>> > +           cpu_to_le16(offsetof(struct smb_com_write_req, Data) - 4);
>>> > +
>>> > +       /* 4 for RFC1001 length + 1 for BCC */
>>> > +       iov[0].iov_len = be32_to_cpu(smb->hdr.smb_buf_length) + 4 + 1;
>>> > +       iov[0].iov_base = smb;
>>> > +
>>> > +       /* marshal up the pages into iov array */
>>> > +       npages = 0;
>>> > +       wdata->bytes = 0;
>>> > +       for (i = 0; i < ARRAY_SIZE(wdata->pages) && wdata->pages[i]; i++) {
>>> > +               if (wdata->bytes + PAGE_CACHE_SIZE <=
>>> > +                   CIFS_SB(inode->i_sb)->wsize) {
>>> > +                       iov[i + 1].iov_len = min(inode->i_size -
>>> > +                                             page_offset(wdata->pages[i]),
>>> > +                                               (loff_t)PAGE_CACHE_SIZE);
>>> > +                       iov[i + 1].iov_base = kmap(wdata->pages[i]);
>>> > +                       wdata->bytes += iov[i + 1].iov_len;
>>> > +                       npages++;
>>> > +               } else {
>>> > +                       /* if we're beyond wsize, then re-mark page dirty */
>>> > +                       __set_page_dirty_nobuffers(wdata->pages[i]);
>>> > +               }
>>> > +       }
>>> > +
>>> > +       smb->DataLengthLow = cpu_to_le16(wdata->bytes & 0xFFFF);
>>> > +       smb->DataLengthHigh = cpu_to_le16(wdata->bytes >> 16);
>>> > +
>>> > +       if (wct == 14) {
>>> > +               inc_rfc1001_len(&smb->hdr, wdata->bytes + 1);
>>> > +               put_bcc(wdata->bytes + 1, &smb->hdr);
>>> > +       } else {
>>> > +               /* wct == 12 */
>>> > +               struct smb_com_writex_req *smbw =
>>> > +                               (struct smb_com_writex_req *)smb;
>>> > +               inc_rfc1001_len(&smbw->hdr, wdata->bytes + 5);
>>> > +               put_bcc(wdata->bytes + 5, &smbw->hdr);
>>> > +               iov[0].iov_len += 4; /* pad bigger by four bytes */
>>> > +       }
>>> > +
>>> > +       kref_get(&wdata->refcount);
>>> > +       rc = cifs_call_async(tcon->ses->server, iov, npages + 1,
>>> > +                            cifs_writev_callback, wdata);
>>> > +
>>> > +       if (rc == 0)
>>> > +               cifs_stats_inc(&tcon->stats.cifs_stats.num_writes);
>>> > +       else
>>> > +               kref_put(&wdata->refcount, cifs_writedata_release);
>>> > +
>>> > +       /* send is done, unmap pages */
>>> > +       for (i = 0; i < npages; i++)
>>> > +               kunmap(wdata->pages[i]);
>>> > +
>>> > +async_writev_out:
>>> > +       cifs_small_buf_release(smb);
>>> > +       kfree(iov);
>>> > +       return rc;
>>> > +}
>>> > +
>>> >  int
>>> >  CIFSSMBWrite2(const int xid, struct cifs_tcon *tcon,
>>> >             const int netfid, const unsigned int count,
>>> > --
>>> > 1.7.4
>>> >
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>>> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>>
>>> Anyway, it's good. You can my "Reviewed-by: Pavel Shilovsky
>>> <piastry-7qunaywFIewox3rIn2DAYQ@public.gmane.org>" tag when you fix the notes above.
>>>
>>
>>
>> --
>> Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>
>
>
> --
> Best regards,
> Pavel Shilovsky.
>



-- 
Best regards,
Pavel Shilovsky.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] cifs: add cifs_async_writev
       [not found]                     ` <BANLkTimhDSgsq=Obq-TRSpxQrJZpT6+5iA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-04-15 10:45                       ` Jeff Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2011-04-15 10:45 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Fri, 15 Apr 2011 11:52:49 +0400
Pavel Shilovsky <piastryyy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:


> >>> > +static void
> >>> > +cifs_writev_complete(struct work_struct *work)
> >>> > +{
> >>> > +       struct cifs_writedata *wdata = container_of(work,
> >>> > +                                               struct cifs_writedata, work);
> >>> > +       struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
> >>>                              ^^^ this variable is unused.
> >>>
> >>                                Not exactly.
> >>
> >>> > +       struct inode *inode = wdata->cfile->dentry->d_inode;
> >>> > +       struct page *page;
> >>> > +       int i = 0;
> >>> > +
> >>> > +       if (wdata->result == 0) {
> >>> > +               cifs_update_eof(CIFS_I(inode), wdata->offset, wdata->bytes);
> >>> > +               cifs_stats_bytes_written(tcon, wdata->bytes);
> >>                                             ^^^^^
> >>                                        The tcon is used here
> >
> > Strange, the compiler (gcc-4.4.4-1ubuntu2) reports it:
> > /home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c: In function
> > ‘cifs_writev_complete’:
> > /home/piastry/git/cifs-2.6/fs/cifs/cifssmb.c:1633: warning: unused
> > variable ‘tcon’
> >
> > but you are right - it is used.
> 
> figured it out: I compiled module without CONFIG_CIFS_STATS.
> 

Ahh yeah, that function gets compiled out to nothing when
CONFIG_CIFS_STATS isn't set. I guess I can get rid of the separate tcon
stack var and just inline the tlink_tcon call there. I'll fix it in my
next set.

Thanks.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-04-15 10:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-02 12:23 [PATCH 0/5] cifs: asynchronous writepages support (try #1) Jeff Layton
     [not found] ` <1301747039-31411-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-02 12:23   ` [PATCH 1/5] cifs: consolidate SendReceive response checks Jeff Layton
     [not found]     ` <1301747039-31411-2-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-12  9:38       ` Pavel Shilovsky
2011-04-02 12:23   ` [PATCH 2/5] cifs: make cifs_send_async take a kvec array Jeff Layton
     [not found]     ` <1301747039-31411-3-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-12  9:39       ` Pavel Shilovsky
2011-04-02 12:23   ` [PATCH 3/5] cifs: don't call mid_q_entry->callback under the Global_MidLock Jeff Layton
     [not found]     ` <1301747039-31411-4-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-12  9:39       ` Pavel Shilovsky
2011-04-02 12:23   ` [PATCH 4/5] cifs: add cifs_async_writev Jeff Layton
     [not found]     ` <1301747039-31411-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-12  9:43       ` Pavel Shilovsky
     [not found]         ` <BANLkTimHi_w55+bqFstXZZdjaMzw1mw_yA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-12 11:11           ` Jeff Layton
     [not found]             ` <20110412071120.75e45b38-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2011-04-12 11:24               ` Pavel Shilovsky
     [not found]                 ` <BANLkTi=G1U591vh4cc6XxT=FRe0puir2Ew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-15  7:52                   ` Pavel Shilovsky
     [not found]                     ` <BANLkTimhDSgsq=Obq-TRSpxQrJZpT6+5iA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-15 10:45                       ` Jeff Layton
2011-04-02 12:23   ` [PATCH 5/5] cifs: convert cifs_writepages to use async writes Jeff Layton
     [not found]     ` <1301747039-31411-6-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-04-12 10:07       ` Pavel Shilovsky
     [not found]         ` <BANLkTi=HPkAC3kY3g5=XYHzfTLcde9C7ww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-12 11:15           ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.