All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] RDS rds_iovec fixes
@ 2010-10-29  1:40 Andy Grover
  2010-10-29  1:40 ` [PATCH 1/5] net: fix rds_iovec page count overflow Andy Grover
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Hi David and everyone,

This patchset includes Linus' rds_iovec overflow fix, as well as making
a copy of the rds_iovec array to prevent userspace from changing them after
we've checked them.

Please especially review patch 5. We go from 3 to 2 iovec copies, but it's not
trivial to go to one, due to some code that wants to preallocate space for SGs.
Until that code can be changed, patch 5 attempts to safely handle the scenario.

Thanks -- Regards -- Andy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] net: fix rds_iovec page count overflow
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
@ 2010-10-29  1:40 ` Andy Grover
  2010-10-29  1:40 ` [PATCH 2/5] RDS: Return -EINVAL if rds_rdma_pages returns an error Andy Grover
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel, Linus Torvalds

From: Linus Torvalds <torvalds@linux-foundation.org>

As reported by Thomas Pollet, the rdma page counting can overflow.  We
get the rdma sizes in 64-bit unsigned entities, but then limit it to
UINT_MAX bytes and shift them down to pages (so with a possible "+1" for
an unaligned address).

So each individual page count fits comfortably in an 'unsigned int' (not
even close to overflowing into signed), but as they are added up, they
might end up resulting in a signed return value. Which would be wrong.

Catch the case of tot_pages turning negative, and return the appropriate
error code.

Reported-by: Thomas Pollet <thomas.pollet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/rdma.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 1a41deb..0df02c8 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -502,6 +502,13 @@ static int rds_rdma_pages(struct rds_rdma_args *args)
 			return -EINVAL;
 
 		tot_pages += nr_pages;
+
+		/*
+		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
+		 * so tot_pages cannot overflow without first going negative.
+		 */
+		if ((int)tot_pages < 0)
+			return -EINVAL;
 	}
 
 	return tot_pages;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] RDS: Return -EINVAL if rds_rdma_pages returns an error
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
  2010-10-29  1:40 ` [PATCH 1/5] net: fix rds_iovec page count overflow Andy Grover
@ 2010-10-29  1:40 ` Andy Grover
  2010-10-29  1:40 ` [PATCH 3/5] RDS: Clean up error handling in rds_cmsg_rdma_args Andy Grover
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

rds_cmsg_rdma_args would still return success even if rds_rdma_pages
returned an error (or overflowed).

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/rdma.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 0df02c8..d0ba2ca 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -554,8 +554,10 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 	}
 
 	nr_pages = rds_rdma_pages(args);
-	if (nr_pages < 0)
+	if (nr_pages < 0) {
+		ret = -EINVAL;
 		goto out;
+	}
 
 	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
 	if (!pages) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] RDS: Clean up error handling in rds_cmsg_rdma_args
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
  2010-10-29  1:40 ` [PATCH 1/5] net: fix rds_iovec page count overflow Andy Grover
  2010-10-29  1:40 ` [PATCH 2/5] RDS: Return -EINVAL if rds_rdma_pages returns an error Andy Grover
@ 2010-10-29  1:40 ` Andy Grover
  2010-10-29  1:40 ` [PATCH 4/5] RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace Andy Grover
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

We don't need to set ret = 0 at the end -- it's initialized to 0.

Also, don't increment s_send_rdma stat if we're exiting with an
error.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/rdma.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index d0ba2ca..334acdd 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -664,13 +664,12 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 	}
 	op->op_bytes = nr_bytes;
 
-	ret = 0;
 out:
 	kfree(pages);
 	if (ret)
 		rds_rdma_free_op(op);
-
-	rds_stats_inc(s_send_rdma);
+	else
+		rds_stats_inc(s_send_rdma);
 
 	return ret;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
                   ` (2 preceding siblings ...)
  2010-10-29  1:40 ` [PATCH 3/5] RDS: Clean up error handling in rds_cmsg_rdma_args Andy Grover
@ 2010-10-29  1:40 ` Andy Grover
  2010-10-29  1:40 ` [PATCH 5/5] RDS: Let rds_message_alloc_sgs() return NULL Andy Grover
  2010-10-30 23:34 ` [PATCH 0/5] RDS rds_iovec fixes David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Change rds_rdma_pages to take a passed-in rds_iovec array instead
of doing copy_from_user itself.

Change rds_cmsg_rdma_args to copy rds_iovec array once only. This
eliminates the possibility of userspace changing it after our
sanity checks.

Implement stack-based storage for small numbers of iovecs, based
on net/socket.c, to save an alloc in the extremely common case.

Although this patch reduces iovec copies in cmsg_rdma_args to 1,
we still do another one in rds_rdma_extra_size. Getting rid of
that one will be trickier, so it'll be a separate patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/rdma.c |  104 +++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 65 insertions(+), 39 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 334acdd..caa4d98 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -479,13 +479,38 @@ void rds_atomic_free_op(struct rm_atomic_op *ao)
 
 
 /*
- * Count the number of pages needed to describe an incoming iovec.
+ * Count the number of pages needed to describe an incoming iovec array.
  */
-static int rds_rdma_pages(struct rds_rdma_args *args)
+static int rds_rdma_pages(struct rds_iovec iov[], int nr_iovecs)
+{
+	int tot_pages = 0;
+	unsigned int nr_pages;
+	unsigned int i;
+
+	/* figure out the number of pages in the vector */
+	for (i = 0; i < nr_iovecs; i++) {
+		nr_pages = rds_pages_in_vec(&iov[i]);
+		if (nr_pages == 0)
+			return -EINVAL;
+
+		tot_pages += nr_pages;
+
+		/*
+		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
+		 * so tot_pages cannot overflow without first going negative.
+		 */
+		if (tot_pages < 0)
+			return -EINVAL;
+	}
+
+	return tot_pages;
+}
+
+int rds_rdma_extra_size(struct rds_rdma_args *args)
 {
 	struct rds_iovec vec;
 	struct rds_iovec __user *local_vec;
-	unsigned int tot_pages = 0;
+	int tot_pages = 0;
 	unsigned int nr_pages;
 	unsigned int i;
 
@@ -507,16 +532,11 @@ static int rds_rdma_pages(struct rds_rdma_args *args)
 		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
 		 * so tot_pages cannot overflow without first going negative.
 		 */
-		if ((int)tot_pages < 0)
+		if (tot_pages < 0)
 			return -EINVAL;
 	}
 
-	return tot_pages;
-}
-
-int rds_rdma_extra_size(struct rds_rdma_args *args)
-{
-	return rds_rdma_pages(args) * sizeof(struct scatterlist);
+	return tot_pages * sizeof(struct scatterlist);
 }
 
 /*
@@ -527,13 +547,12 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 			  struct cmsghdr *cmsg)
 {
 	struct rds_rdma_args *args;
-	struct rds_iovec vec;
 	struct rm_rdma_op *op = &rm->rdma;
 	int nr_pages;
 	unsigned int nr_bytes;
 	struct page **pages = NULL;
-	struct rds_iovec __user *local_vec;
-	unsigned int nr;
+	struct rds_iovec iovstack[UIO_FASTIOV], *iovs = iovstack;
+	int iov_size;
 	unsigned int i, j;
 	int ret = 0;
 
@@ -553,7 +572,22 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 		goto out;
 	}
 
-	nr_pages = rds_rdma_pages(args);
+	/* Check whether to allocate the iovec area */
+	iov_size = args->nr_local * sizeof(struct rds_iovec);
+	if (args->nr_local > UIO_FASTIOV) {
+		iovs = sock_kmalloc(rds_rs_to_sk(rs), iov_size, GFP_KERNEL);
+		if (!iovs) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	if (copy_from_user(iovs, (struct rds_iovec __user *)(unsigned long) args->local_vec_addr, iov_size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	nr_pages = rds_rdma_pages(iovs, args->nr_local);
 	if (nr_pages < 0) {
 		ret = -EINVAL;
 		goto out;
@@ -606,50 +640,40 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 	       (unsigned long long)args->remote_vec.addr,
 	       op->op_rkey);
 
-	local_vec = (struct rds_iovec __user *)(unsigned long) args->local_vec_addr;
-
 	for (i = 0; i < args->nr_local; i++) {
-		if (copy_from_user(&vec, &local_vec[i],
-				   sizeof(struct rds_iovec))) {
-			ret = -EFAULT;
-			goto out;
-		}
-
-		nr = rds_pages_in_vec(&vec);
-		if (nr == 0) {
-			ret = -EINVAL;
-			goto out;
-		}
+		struct rds_iovec *iov = &iovs[i];
+		/* don't need to check, rds_rdma_pages() verified nr will be +nonzero */
+		unsigned int nr = rds_pages_in_vec(iov);
 
-		rs->rs_user_addr = vec.addr;
-		rs->rs_user_bytes = vec.bytes;
+		rs->rs_user_addr = iov->addr;
+		rs->rs_user_bytes = iov->bytes;
 
 		/* If it's a WRITE operation, we want to pin the pages for reading.
 		 * If it's a READ operation, we need to pin the pages for writing.
 		 */
-		ret = rds_pin_pages(vec.addr, nr, pages, !op->op_write);
+		ret = rds_pin_pages(iov->addr, nr, pages, !op->op_write);
 		if (ret < 0)
 			goto out;
 
-		rdsdebug("RDS: nr_bytes %u nr %u vec.bytes %llu vec.addr %llx\n",
-		       nr_bytes, nr, vec.bytes, vec.addr);
+		rdsdebug("RDS: nr_bytes %u nr %u iov->bytes %llu iov->addr %llx\n",
+			 nr_bytes, nr, iov->bytes, iov->addr);
 
-		nr_bytes += vec.bytes;
+		nr_bytes += iov->bytes;
 
 		for (j = 0; j < nr; j++) {
-			unsigned int offset = vec.addr & ~PAGE_MASK;
+			unsigned int offset = iov->addr & ~PAGE_MASK;
 			struct scatterlist *sg;
 
 			sg = &op->op_sg[op->op_nents + j];
 			sg_set_page(sg, pages[j],
-					min_t(unsigned int, vec.bytes, PAGE_SIZE - offset),
+					min_t(unsigned int, iov->bytes, PAGE_SIZE - offset),
 					offset);
 
-			rdsdebug("RDS: sg->offset %x sg->len %x vec.addr %llx vec.bytes %llu\n",
-			       sg->offset, sg->length, vec.addr, vec.bytes);
+			rdsdebug("RDS: sg->offset %x sg->len %x iov->addr %llx iov->bytes %llu\n",
+			       sg->offset, sg->length, iov->addr, iov->bytes);
 
-			vec.addr += sg->length;
-			vec.bytes -= sg->length;
+			iov->addr += sg->length;
+			iov->bytes -= sg->length;
 		}
 
 		op->op_nents += nr;
@@ -665,6 +689,8 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 	op->op_bytes = nr_bytes;
 
 out:
+	if (iovs != iovstack)
+		sock_kfree_s(rds_rs_to_sk(rs), iovs, iov_size);
 	kfree(pages);
 	if (ret)
 		rds_rdma_free_op(op);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] RDS: Let rds_message_alloc_sgs() return NULL
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
                   ` (3 preceding siblings ...)
  2010-10-29  1:40 ` [PATCH 4/5] RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace Andy Grover
@ 2010-10-29  1:40 ` Andy Grover
  2010-10-30 23:34 ` [PATCH 0/5] RDS rds_iovec fixes David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Grover @ 2010-10-29  1:40 UTC (permalink / raw)
  To: netdev; +Cc: rds-devel

Even with the previous fix, we still are reading the iovecs once
to determine SGs needed, and then again later on. Preallocating
space for sg lists as part of rds_message seemed like a good idea
but it might be better to not do this. While working to redo that
code, this patch attempts to protect against userspace rewriting
the rds_iovec array between the first and second accesses.

The consequences of this would be either a too-small or too-large
sg list array. Too large is not an issue. This patch changes all
callers of message_alloc_sgs to handle running out of preallocated
sgs, and fail gracefully.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 net/rds/message.c |    5 +++++
 net/rds/rdma.c    |    8 ++++++++
 net/rds/send.c    |    4 ++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/net/rds/message.c b/net/rds/message.c
index a84545d..848cff4 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -224,6 +224,9 @@ struct scatterlist *rds_message_alloc_sgs(struct rds_message *rm, int nents)
 	WARN_ON(rm->m_used_sgs + nents > rm->m_total_sgs);
 	WARN_ON(!nents);
 
+	if (rm->m_used_sgs + nents > rm->m_total_sgs)
+		return NULL;
+
 	sg_ret = &sg_first[rm->m_used_sgs];
 	sg_init_table(sg_ret, nents);
 	rm->m_used_sgs += nents;
@@ -246,6 +249,8 @@ struct rds_message *rds_message_map_pages(unsigned long *page_addrs, unsigned in
 	rm->m_inc.i_hdr.h_len = cpu_to_be32(total_len);
 	rm->data.op_nents = ceil(total_len, PAGE_SIZE);
 	rm->data.op_sg = rds_message_alloc_sgs(rm, num_sgs);
+	if (!rm->data.op_sg)
+		return ERR_PTR(-ENOMEM);
 
 	for (i = 0; i < rm->data.op_nents; ++i) {
 		sg_set_page(&rm->data.op_sg[i],
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index caa4d98..8920f2a 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -607,6 +607,10 @@ int rds_cmsg_rdma_args(struct rds_sock *rs, struct rds_message *rm,
 	op->op_recverr = rs->rs_recverr;
 	WARN_ON(!nr_pages);
 	op->op_sg = rds_message_alloc_sgs(rm, nr_pages);
+	if (!op->op_sg) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	if (op->op_notify || op->op_recverr) {
 		/* We allocate an uninitialized notifier here, because
@@ -807,6 +811,10 @@ int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm,
 	rm->atomic.op_active = 1;
 	rm->atomic.op_recverr = rs->rs_recverr;
 	rm->atomic.op_sg = rds_message_alloc_sgs(rm, 1);
+	if (!rm->atomic.op_sg) {
+		ret = -ENOMEM;
+		goto err;
+	}
 
 	/* verify 8 byte-aligned */
 	if (args->local_addr & 0x7) {
diff --git a/net/rds/send.c b/net/rds/send.c
index 0bc9db1..35b9c2e 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -973,6 +973,10 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	/* Attach data to the rm */
 	if (payload_len) {
 		rm->data.op_sg = rds_message_alloc_sgs(rm, ceil(payload_len, PAGE_SIZE));
+		if (!rm->data.op_sg) {
+			ret = -ENOMEM;
+			goto out;
+		}
 		ret = rds_message_copy_from_user(rm, msg->msg_iov, payload_len);
 		if (ret)
 			goto out;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5] RDS rds_iovec fixes
  2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
                   ` (4 preceding siblings ...)
  2010-10-29  1:40 ` [PATCH 5/5] RDS: Let rds_message_alloc_sgs() return NULL Andy Grover
@ 2010-10-30 23:34 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2010-10-30 23:34 UTC (permalink / raw)
  To: andy.grover; +Cc: netdev, rds-devel

From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 28 Oct 2010 18:40:54 -0700

> This patchset includes Linus' rds_iovec overflow fix, as well as making
> a copy of the rds_iovec array to prevent userspace from changing them after
> we've checked them.
> 
> Please especially review patch 5. We go from 3 to 2 iovec copies, but it's not
> trivial to go to one, due to some code that wants to preallocate space for SGs.
> Until that code can be changed, patch 5 attempts to safely handle the scenario.

I looked carefully over these and they look ok.

Applied, thanks Andy.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-30 23:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29  1:40 [PATCH 0/5] RDS rds_iovec fixes Andy Grover
2010-10-29  1:40 ` [PATCH 1/5] net: fix rds_iovec page count overflow Andy Grover
2010-10-29  1:40 ` [PATCH 2/5] RDS: Return -EINVAL if rds_rdma_pages returns an error Andy Grover
2010-10-29  1:40 ` [PATCH 3/5] RDS: Clean up error handling in rds_cmsg_rdma_args Andy Grover
2010-10-29  1:40 ` [PATCH 4/5] RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace Andy Grover
2010-10-29  1:40 ` [PATCH 5/5] RDS: Let rds_message_alloc_sgs() return NULL Andy Grover
2010-10-30 23:34 ` [PATCH 0/5] RDS rds_iovec fixes David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.