All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
@ 2010-10-28 18:22 David Miller
  2010-10-28 18:33 ` Linus Torvalds
  2010-10-29  6:40 ` Linus Torvalds
  0 siblings, 2 replies; 16+ messages in thread
From: David Miller @ 2010-10-28 18:22 UTC (permalink / raw)
  To: torvalds; +Cc: netdev, drosenberg, jon.maloy, allan.stephens


This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers.  Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.

This works because:

1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
   writes are allowed and the application will just continue
   with another write to send the rest of the data.

2) For datagram oriented sockets, where there must be a
   one-to-one correspondance between write() calls and
   packets on the wire, INT_MAX is going to be far larger
   than the packet size limit the protocol is going to
   check for and signal with -EMSGSIZE.

Based upon a patch by Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Ok, this is the patch I am testing right now.  It ought to
plug the TIPC holes wrt. handling iovecs given by the
user.

I'll look at the recently discovered RDS crap next :-/

 include/linux/socket.h |    2 +-
 net/compat.c           |   12 +++++++-----
 net/core/iovec.c       |   19 +++++++++----------
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5146b50..86b652f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
 					  int offset, 
 					  unsigned int len, __wsum *csump);
 
-extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
+extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
 extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
 extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
 			     int offset, int len);
diff --git a/net/compat.c b/net/compat.c
index 63d260e..71bfd8e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -34,17 +34,19 @@ static inline int iov_from_user_compat_to_kern(struct iovec *kiov,
 					  struct compat_iovec __user *uiov32,
 					  int niov)
 {
-	int tot_len = 0;
+	size_t tot_len = 0;
 
 	while (niov > 0) {
 		compat_uptr_t buf;
 		compat_size_t len;
 
 		if (get_user(len, &uiov32->iov_len) ||
-		   get_user(buf, &uiov32->iov_base)) {
-			tot_len = -EFAULT;
-			break;
-		}
+		    get_user(buf, &uiov32->iov_base))
+			return -EFAULT;
+
+		if (len > INT_MAX - tot_len)
+			len = INT_MAX - tot_len;
+
 		tot_len += len;
 		kiov->iov_base = compat_ptr(buf);
 		kiov->iov_len = (__kernel_size_t) len;
diff --git a/net/core/iovec.c b/net/core/iovec.c
index 72aceb1..e7f5b29 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -35,10 +35,10 @@
  *	in any case.
  */
 
-long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
+int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
 {
 	int size, ct;
-	long err;
+	size_t err;
 
 	if (m->msg_namelen) {
 		if (mode == VERIFY_READ) {
@@ -62,14 +62,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
 	err = 0;
 
 	for (ct = 0; ct < m->msg_iovlen; ct++) {
-		err += iov[ct].iov_len;
-		/*
-		 * Goal is not to verify user data, but to prevent returning
-		 * negative value, which is interpreted as errno.
-		 * Overflow is still possible, but it is harmless.
-		 */
-		if (err < 0)
-			return -EMSGSIZE;
+		size_t len = iov[ct].iov_len;
+
+		if (len > INT_MAX - err) {
+			len = INT_MAX - err;
+			iov[ct].iov_len = len;
+		}
+		err += len;
 	}
 
 	return err;
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-28 18:22 [PATCH] net: Limit socket I/O iovec total length to INT_MAX David Miller
@ 2010-10-28 18:33 ` Linus Torvalds
  2010-10-28 18:37   ` David Miller
  2010-10-29  6:40 ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-10-28 18:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, drosenberg, jon.maloy, allan.stephens

On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>
> -       int tot_len = 0;
> +       size_t tot_len = 0;

I would actually keep "tot_len" as an "int".

The whole point of this:

> +               if (len > INT_MAX - tot_len)
> +                       len = INT_MAX - tot_len;
> +
>                tot_len += len;

Is that "tot_len" can _never_ become larger than INT_MAX, because we
never add a "len" that would make it bigger than that.

So "len" itself should be the correct unsigned size_t (so that the
"len > INT_MAX - tot_len" thing is done as an unsigned comparison),
but "tot_len" itself is very much designed to fit in "int".

> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>  {
>        int size, ct;
> -       long err;
> +       size_t err;

Same thing here. Making "err" be an "int" is actually the right thing
to do, because then it matches the return type (iow, if it was any
other type, there would be an implicit cast, and if it didn't fit in
"int", that would be a bug anyway).

                     Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-28 18:33 ` Linus Torvalds
@ 2010-10-28 18:37   ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2010-10-28 18:37 UTC (permalink / raw)
  To: torvalds; +Cc: netdev, drosenberg, jon.maloy, allan.stephens

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 28 Oct 2010 11:33:56 -0700

> On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>>
>> -       int tot_len = 0;
>> +       size_t tot_len = 0;
> 
> I would actually keep "tot_len" as an "int".
 ...
>> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>>  {
>>        int size, ct;
>> -       long err;
>> +       size_t err;
> 
> Same thing here. Making "err" be an "int" is actually the right thing
> to do, because then it matches the return type (iow, if it was any
> other type, there would be an implicit cast, and if it didn't fit in
> "int", that would be a bug anyway).

Yep, agreed on all counts, I'll make those changes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-28 18:22 [PATCH] net: Limit socket I/O iovec total length to INT_MAX David Miller
  2010-10-28 18:33 ` Linus Torvalds
@ 2010-10-29  6:40 ` Linus Torvalds
  2010-10-29 14:00   ` Dan Rosenberg
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29  6:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, drosenberg, jon.maloy, allan.stephens

Oh, btw, noticed another small detail..

I don't know if this matters, but the regular read/write routines
don't actually use INT_MAX as the limit, but instead a "maximally
page-aligned value that fits in an int":

   #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)

because the code does _not_ want to turn a nice set of huge
page-aligned big writes into a write of an odd number (2GB-1).

This may not make much of a difference to networking - you guys are
already used to working with odd sizes like 1500 bytes of data payload
per packet etc. Most regular filesystems are much more sensitive to
things like block (and particularly page-cache sized) boundaries
because of the vagaries of disk and cache granularities. But MAX_INT
is a _really_ odd size, and things like csum_and_copy still tends to
want to get things at least word-aligned, no? And if nothing else, the
memory copies tend to be better with cacheline boundaries.

It would be sad if a 4GB aligned write turns into
 - one 2GB-1 aligned write
 - one pessimally unaligned 2G-1 write where every read from user
space is unaligned
 - finally a single 2-byte write.

I suspect it would be better off using the same kind of (MAX_INT &
PAGE_CACHE_MASK) logic - that 4GB write would still get split into
three partial writes (and _lots_ of packets ;), but at least they'd
all be word-aligned.

Does it matter? I dunno. Once you do 2GB+ writes, these kinds of small
details may be totally hidden in all the noise.

                    Linus

On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>
> This helps protect us from overflow issues down in the
> individual protocol sendmsg/recvmsg handlers.  Once
> we hit INT_MAX we truncate out the rest of the iovec
> by setting the iov_len members to zero.
>
> This works because:
>
> 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
>   writes are allowed and the application will just continue
>   with another write to send the rest of the data.
>
> 2) For datagram oriented sockets, where there must be a
>   one-to-one correspondance between write() calls and
>   packets on the wire, INT_MAX is going to be far larger
>   than the packet size limit the protocol is going to
>   check for and signal with -EMSGSIZE.
>
> Based upon a patch by Linus Torvalds.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>
> Ok, this is the patch I am testing right now.  It ought to
> plug the TIPC holes wrt. handling iovecs given by the
> user.
>
> I'll look at the recently discovered RDS crap next :-/
>
>  include/linux/socket.h |    2 +-
>  net/compat.c           |   12 +++++++-----
>  net/core/iovec.c       |   19 +++++++++----------
>  3 files changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index 5146b50..86b652f 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
>                                          int offset,
>                                          unsigned int len, __wsum *csump);
>
> -extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
> +extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
>  extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
>  extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
>                             int offset, int len);
> diff --git a/net/compat.c b/net/compat.c
> index 63d260e..71bfd8e 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -34,17 +34,19 @@ static inline int iov_from_user_compat_to_kern(struct iovec *kiov,
>                                          struct compat_iovec __user *uiov32,
>                                          int niov)
>  {
> -       int tot_len = 0;
> +       size_t tot_len = 0;
>
>        while (niov > 0) {
>                compat_uptr_t buf;
>                compat_size_t len;
>
>                if (get_user(len, &uiov32->iov_len) ||
> -                  get_user(buf, &uiov32->iov_base)) {
> -                       tot_len = -EFAULT;
> -                       break;
> -               }
> +                   get_user(buf, &uiov32->iov_base))
> +                       return -EFAULT;
> +
> +               if (len > INT_MAX - tot_len)
> +                       len = INT_MAX - tot_len;
> +
>                tot_len += len;
>                kiov->iov_base = compat_ptr(buf);
>                kiov->iov_len = (__kernel_size_t) len;
> diff --git a/net/core/iovec.c b/net/core/iovec.c
> index 72aceb1..e7f5b29 100644
> --- a/net/core/iovec.c
> +++ b/net/core/iovec.c
> @@ -35,10 +35,10 @@
>  *     in any case.
>  */
>
> -long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>  {
>        int size, ct;
> -       long err;
> +       size_t err;
>
>        if (m->msg_namelen) {
>                if (mode == VERIFY_READ) {
> @@ -62,14 +62,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
>        err = 0;
>
>        for (ct = 0; ct < m->msg_iovlen; ct++) {
> -               err += iov[ct].iov_len;
> -               /*
> -                * Goal is not to verify user data, but to prevent returning
> -                * negative value, which is interpreted as errno.
> -                * Overflow is still possible, but it is harmless.
> -                */
> -               if (err < 0)
> -                       return -EMSGSIZE;
> +               size_t len = iov[ct].iov_len;
> +
> +               if (len > INT_MAX - err) {
> +                       len = INT_MAX - err;
> +                       iov[ct].iov_len = len;
> +               }
> +               err += len;
>        }
>
>        return err;
> --
> 1.7.3.2
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29  6:40 ` Linus Torvalds
@ 2010-10-29 14:00   ` Dan Rosenberg
  2010-10-29 15:28     ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Rosenberg @ 2010-10-29 14:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, netdev, jon.maloy, allan.stephens

While you guys are at it, you might consider preventing sendto(), etc.
calls from requesting >= 2GB data in one go.  Several families have no
restrictions on total size (or even worse, assign the size to a signed
int type and then do a signed comparison as a check).  This can result
in all kinds of ugliness when allocating sk_buffs based on that size,
some of which result in kernel panics (due to bad sk_buff tail position)
or heap corruption.

If you'd rather I dig up specific examples, I can do that as well, but I
think making changes to core code to protect individual modules from
their own inevitable stupid decisions is the best choice.

-Dan

On Thu, 2010-10-28 at 23:40 -0700, Linus Torvalds wrote:
> Oh, btw, noticed another small detail..
> 
> I don't know if this matters, but the regular read/write routines
> don't actually use INT_MAX as the limit, but instead a "maximally
> page-aligned value that fits in an int":
> 
>    #define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
> 
> because the code does _not_ want to turn a nice set of huge
> page-aligned big writes into a write of an odd number (2GB-1).
> 
> This may not make much of a difference to networking - you guys are
> already used to working with odd sizes like 1500 bytes of data payload
> per packet etc. Most regular filesystems are much more sensitive to
> things like block (and particularly page-cache sized) boundaries
> because of the vagaries of disk and cache granularities. But MAX_INT
> is a _really_ odd size, and things like csum_and_copy still tends to
> want to get things at least word-aligned, no? And if nothing else, the
> memory copies tend to be better with cacheline boundaries.
> 
> It would be sad if a 4GB aligned write turns into
>  - one 2GB-1 aligned write
>  - one pessimally unaligned 2G-1 write where every read from user
> space is unaligned
>  - finally a single 2-byte write.
> 
> I suspect it would be better off using the same kind of (MAX_INT &
> PAGE_CACHE_MASK) logic - that 4GB write would still get split into
> three partial writes (and _lots_ of packets ;), but at least they'd
> all be word-aligned.
> 
> Does it matter? I dunno. Once you do 2GB+ writes, these kinds of small
> details may be totally hidden in all the noise.
> 
>                     Linus
> 
> On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
> >
> > This helps protect us from overflow issues down in the
> > individual protocol sendmsg/recvmsg handlers.  Once
> > we hit INT_MAX we truncate out the rest of the iovec
> > by setting the iov_len members to zero.
> >
> > This works because:
> >
> > 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
> >   writes are allowed and the application will just continue
> >   with another write to send the rest of the data.
> >
> > 2) For datagram oriented sockets, where there must be a
> >   one-to-one correspondance between write() calls and
> >   packets on the wire, INT_MAX is going to be far larger
> >   than the packet size limit the protocol is going to
> >   check for and signal with -EMSGSIZE.
> >
> > Based upon a patch by Linus Torvalds.
> >
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > ---
> >
> > Ok, this is the patch I am testing right now.  It ought to
> > plug the TIPC holes wrt. handling iovecs given by the
> > user.
> >
> > I'll look at the recently discovered RDS crap next :-/
> >
> >  include/linux/socket.h |    2 +-
> >  net/compat.c           |   12 +++++++-----
> >  net/core/iovec.c       |   19 +++++++++----------
> >  3 files changed, 17 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/linux/socket.h b/include/linux/socket.h
> > index 5146b50..86b652f 100644
> > --- a/include/linux/socket.h
> > +++ b/include/linux/socket.h
> > @@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
> >                                          int offset,
> >                                          unsigned int len, __wsum *csump);
> >
> > -extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
> > +extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
> >  extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
> >  extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
> >                             int offset, int len);
> > diff --git a/net/compat.c b/net/compat.c
> > index 63d260e..71bfd8e 100644
> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -34,17 +34,19 @@ static inline int iov_from_user_compat_to_kern(struct iovec *kiov,
> >                                          struct compat_iovec __user *uiov32,
> >                                          int niov)
> >  {
> > -       int tot_len = 0;
> > +       size_t tot_len = 0;
> >
> >        while (niov > 0) {
> >                compat_uptr_t buf;
> >                compat_size_t len;
> >
> >                if (get_user(len, &uiov32->iov_len) ||
> > -                  get_user(buf, &uiov32->iov_base)) {
> > -                       tot_len = -EFAULT;
> > -                       break;
> > -               }
> > +                   get_user(buf, &uiov32->iov_base))
> > +                       return -EFAULT;
> > +
> > +               if (len > INT_MAX - tot_len)
> > +                       len = INT_MAX - tot_len;
> > +
> >                tot_len += len;
> >                kiov->iov_base = compat_ptr(buf);
> >                kiov->iov_len = (__kernel_size_t) len;
> > diff --git a/net/core/iovec.c b/net/core/iovec.c
> > index 72aceb1..e7f5b29 100644
> > --- a/net/core/iovec.c
> > +++ b/net/core/iovec.c
> > @@ -35,10 +35,10 @@
> >  *     in any case.
> >  */
> >
> > -long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> > +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
> >  {
> >        int size, ct;
> > -       long err;
> > +       size_t err;
> >
> >        if (m->msg_namelen) {
> >                if (mode == VERIFY_READ) {
> > @@ -62,14 +62,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
> >        err = 0;
> >
> >        for (ct = 0; ct < m->msg_iovlen; ct++) {
> > -               err += iov[ct].iov_len;
> > -               /*
> > -                * Goal is not to verify user data, but to prevent returning
> > -                * negative value, which is interpreted as errno.
> > -                * Overflow is still possible, but it is harmless.
> > -                */
> > -               if (err < 0)
> > -                       return -EMSGSIZE;
> > +               size_t len = iov[ct].iov_len;
> > +
> > +               if (len > INT_MAX - err) {
> > +                       len = INT_MAX - err;
> > +                       iov[ct].iov_len = len;
> > +               }
> > +               err += len;
> >        }
> >
> >        return err;
> > --
> > 1.7.3.2
> >
> >



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 14:00   ` Dan Rosenberg
@ 2010-10-29 15:28     ` Linus Torvalds
  2010-10-29 16:21       ` Linus Torvalds
  2010-10-29 18:51       ` Rick Jones
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29 15:28 UTC (permalink / raw)
  To: Dan Rosenberg; +Cc: David Miller, netdev, jon.maloy, allan.stephens, Al Viro

On Fri, Oct 29, 2010 at 7:00 AM, Dan Rosenberg <drosenberg@vsecurity.com> wrote:
>
> While you guys are at it, you might consider preventing sendto(), etc.
> calls from requesting >= 2GB data in one go.

Indeed. David - I think we have to, because that thing converts its
arguments to an iovec and then does a sendmsg, but since it's already
in kernel space it doesn't go through the verify_iovec() path.

So sendto/recvfrom (and possibly others that build their own msg
struct in kernel space) should be limited to MAX_INT too, so that
there's no back way to create a big iovec..

In fs/read_write.c, do_sync_read/write() do that iovec thing too, but
at least for the regular vfs_read()/vfs_write cases they will have
gone through rw_verify_area() first, which does the size limiting for
them.

We do need to fix the readv/writev path, though. It does the
rw_verify_area(), but it doesn't seem to limit the size to the
returned length, but still uses the original one. Hmm.

I think I'll take care of the readv/writev thing, and send it by Al to verify.

                              Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 15:28     ` Linus Torvalds
@ 2010-10-29 16:21       ` Linus Torvalds
  2010-10-29 16:45         ` Al Viro
  2010-10-29 19:32         ` David Miller
  2010-10-29 18:51       ` Rick Jones
  1 sibling, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29 16:21 UTC (permalink / raw)
  To: Al Viro; +Cc: David Miller, netdev, jon.maloy, allan.stephens, Dan Rosenberg

[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]

On Fri, Oct 29, 2010 at 8:28 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We do need to fix the readv/writev path, though. It does the
> rw_verify_area(), but it doesn't seem to limit the size to the
> returned length, but still uses the original one. Hmm.
>
> I think I'll take care of the readv/writev thing, and send it by Al to verify.

Ok, something like the attached should do the same as the suggested
networking iovec verification (except the VFS code considers the iov
'len' to be a ssize_t, and a negative value to be an error, and claims
that that is what SuS says. I don't much care, since we ignore SuS wrt
the overflow anyway, and now limit IO to MAX_RW_COUNT the same way we
do for regular reads and writes, but it's possibly worth thinking
about unifying the networking verify_iovec and the VFS layer one)

Al? What do you think? You've not seen some of the background, but it
all boils down to simple "some protocols overflow in 'int'", the exact
same way we had filesystems that had int counters for access lengths
etc. So for the same reason we did the MAX_RW_COUNT thing for regular
read/write calls, the networking code is now going to do it for all
the sendmsg etc iovec things.

Even outside of any networking issues, this unifies the handling of
read/write vs readv/writev in the "cap IO size" area, so I think we
should have done this originally (the readv/writev cases actually
already did the capping by calling rw_verify_area(), but then the code
threw away the capped value, and only looked at the error return, and
used the uncapped size)

NOTE! UNTESTED! It compiles, but I didn't actually boot to check.
Maybe it has some stupid thinko in it. Also, the patch ended up being
a bit bigger than it needed to be, because I found some annoying
whitespace issues in rw_copy_check_uvector (spaces at the beginning of
lines that had tabs in them) that I couldn't keep myself from not
fixing.

Comments?

                                          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4756 bytes --]

 fs/compat.c        |   12 +++++-----
 fs/read_write.c    |   62 +++++++++++++++++++++++++++------------------------
 include/linux/fs.h |    1 +
 3 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 52cfeb6..ff66c0d 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -606,14 +606,14 @@ ssize_t compat_rw_copy_check_uvector(int type,
 	/*
 	 * Single unix specification:
 	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.  The total length is fitting an ssize_t
+	 * ssize_t.
 	 *
-	 * Be careful here because iov_len is a size_t not an ssize_t
+	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
+	 * no overflow possibility.
 	 */
 	tot_len = 0;
 	ret = -EINVAL;
 	for (seg = 0; seg < nr_segs; seg++) {
-		compat_ssize_t tmp = tot_len;
 		compat_uptr_t buf;
 		compat_ssize_t len;
 
@@ -624,13 +624,13 @@ ssize_t compat_rw_copy_check_uvector(int type,
 		}
 		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
 			goto out;
-		tot_len += len;
-		if (tot_len < tmp) /* maths overflow on the compat_ssize_t */
-			goto out;
 		if (!access_ok(vrfy_dir(type), compat_ptr(buf), len)) {
 			ret = -EFAULT;
 			goto out;
 		}
+		if (len > MAX_RW_COUNT - tot_len)
+			len = MAX_RW_COUNT - tot_len;
+		tot_len += len;
 		iov->iov_base = compat_ptr(buf);
 		iov->iov_len = (compat_size_t) len;
 		uvector++;
diff --git a/fs/read_write.c b/fs/read_write.c
index 9cd9d14..431a0ed 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -243,8 +243,6 @@ bad:
  * them to something that fits in "int" so that others
  * won't have to do range checks all the time.
  */
-#define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
-
 int rw_verify_area(int read_write, struct file *file, loff_t *ppos, size_t count)
 {
 	struct inode *inode;
@@ -584,65 +582,71 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			      unsigned long nr_segs, unsigned long fast_segs,
 			      struct iovec *fast_pointer,
 			      struct iovec **ret_pointer)
-  {
+{
 	unsigned long seg;
-  	ssize_t ret;
+	ssize_t ret;
 	struct iovec *iov = fast_pointer;
 
-  	/*
-  	 * SuS says "The readv() function *may* fail if the iovcnt argument
-  	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-  	 * traditionally returned zero for zero segments, so...
-  	 */
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
 	if (nr_segs == 0) {
 		ret = 0;
-  		goto out;
+		goto out;
 	}
 
-  	/*
-  	 * First get the "struct iovec" from user memory and
-  	 * verify all the pointers
-  	 */
+	/*
+	 * First get the "struct iovec" from user memory and
+	 * verify all the pointers
+	 */
 	if (nr_segs > UIO_MAXIOV) {
 		ret = -EINVAL;
-  		goto out;
+		goto out;
 	}
 	if (nr_segs > fast_segs) {
-  		iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
+		iov = kmalloc(nr_segs*sizeof(struct iovec), GFP_KERNEL);
 		if (iov == NULL) {
 			ret = -ENOMEM;
-  			goto out;
+			goto out;
 		}
-  	}
+	}
 	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
 		ret = -EFAULT;
-  		goto out;
+		goto out;
 	}
 
-  	/*
+	/*
 	 * According to the Single Unix Specification we should return EINVAL
 	 * if an element length is < 0 when cast to ssize_t or if the
 	 * total length would overflow the ssize_t return value of the
 	 * system call.
-  	 */
+	 *
+	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
+	 * overflow case.
+	 */
 	ret = 0;
-  	for (seg = 0; seg < nr_segs; seg++) {
-  		void __user *buf = iov[seg].iov_base;
-  		ssize_t len = (ssize_t)iov[seg].iov_len;
+	for (seg = 0; seg < nr_segs; seg++) {
+		void __user *buf = iov[seg].iov_base;
+		ssize_t len = (ssize_t)iov[seg].iov_len;
 
 		/* see if we we're about to use an invalid len or if
 		 * it's about to overflow ssize_t */
-		if (len < 0 || (ret + len < ret)) {
+		if (len < 0) {
 			ret = -EINVAL;
-  			goto out;
+			goto out;
 		}
 		if (unlikely(!access_ok(vrfy_dir(type), buf, len))) {
 			ret = -EFAULT;
-  			goto out;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - ret) {
+			len = MAX_RW_COUNT - ret;
+			iov[seg].iov_len = len;
 		}
-
 		ret += len;
-  	}
+	}
 out:
 	*ret_pointer = iov;
 	return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4d07902..7b7b507 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1867,6 +1867,7 @@ extern int current_umask(void);
 /* /sys/fs */
 extern struct kobject *fs_kobj;
 
+#define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
 extern int rw_verify_area(int, struct file *, loff_t *, size_t);
 
 #define FLOCK_VERIFY_READ  1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 16:21       ` Linus Torvalds
@ 2010-10-29 16:45         ` Al Viro
  2010-10-29 17:01           ` Linus Torvalds
  2010-10-29 19:32         ` David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Al Viro @ 2010-10-29 16:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, netdev, jon.maloy, allan.stephens, Dan Rosenberg

On Fri, Oct 29, 2010 at 09:21:16AM -0700, Linus Torvalds wrote:
>  	ret = -EINVAL;
>  	for (seg = 0; seg < nr_segs; seg++) {
> -		compat_ssize_t tmp = tot_len;
>  		compat_uptr_t buf;
>  		compat_ssize_t len;
>  
> @@ -624,13 +624,13 @@ ssize_t compat_rw_copy_check_uvector(int type,
>  		}
>  		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
>  			goto out;
> -		tot_len += len;
> -		if (tot_len < tmp) /* maths overflow on the compat_ssize_t */
> -			goto out;
>  		if (!access_ok(vrfy_dir(type), compat_ptr(buf), len)) {
>  			ret = -EFAULT;
>  			goto out;
>  		}
> +		if (len > MAX_RW_COUNT - tot_len)
> +			len = MAX_RW_COUNT - tot_len;
> +		tot_len += len;
>  		iov->iov_base = compat_ptr(buf);
>  		iov->iov_len = (compat_size_t) len;
>  		uvector++;

Interesting...  Had anybody tested vectors with 0 iov_len in the end and/or
middle?  Looks like something rarely hit in practice...

I don't see anything obviously broken (and we obviously have allowed
iov_len == 0 cases all along, so if anything, breakage won't be new).
However, I wonder if things like sendmsg() for datagrams have warranties
against silent truncation.  Davem?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 16:45         ` Al Viro
@ 2010-10-29 17:01           ` Linus Torvalds
  2010-10-29 17:32             ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29 17:01 UTC (permalink / raw)
  To: Al Viro; +Cc: David Miller, netdev, jon.maloy, allan.stephens, Dan Rosenberg

On Fri, Oct 29, 2010 at 9:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I don't see anything obviously broken (and we obviously have allowed
> iov_len == 0 cases all along, so if anything, breakage won't be new).
> However, I wonder if things like sendmsg() for datagrams have warranties
> against silent truncation.  Davem?

You missed that discussion - my argument is that anybody who thinks
that they can send a single packet that is 2GB+ in size are already
screwed. And the packet protocol will have some inherent upper limit
anyway (possibly introduced by just allocation issues, but quite
likely inherent to the protocol itself)

And yes, the iov_len = 0 case has always been possible and accepted so
my patch doesn't really change anything. In fact, I think it even
happens (simple example: the easiest way for user space to resume a
partial writev() is to basically subtract out the return value from
the iovec and then re-submit it - so getting zero iovec entries at the
beginning in particular would not necessarily even be odd)

                            Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 17:01           ` Linus Torvalds
@ 2010-10-29 17:32             ` Al Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2010-10-29 17:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, netdev, jon.maloy, allan.stephens, Dan Rosenberg

On Fri, Oct 29, 2010 at 10:01:19AM -0700, Linus Torvalds wrote:
> On Fri, Oct 29, 2010 at 9:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > I don't see anything obviously broken (and we obviously have allowed
> > iov_len == 0 cases all along, so if anything, breakage won't be new).
> > However, I wonder if things like sendmsg() for datagrams have warranties
> > against silent truncation. ?Davem?
> 
> You missed that discussion - my argument is that anybody who thinks
> that they can send a single packet that is 2GB+ in size are already
> screwed. And the packet protocol will have some inherent upper limit
> anyway (possibly introduced by just allocation issues, but quite
> likely inherent to the protocol itself)

Sure, but... do we want to send something truncated in that case or
should we just fail?  Note that with your change previously deliberately
b0rken iovecs (anything with sum of lengths equal to 1<<31 on 32bit)
will get a chance to be accepted *OR* (much more likely) get rejected with
unexpected error value.  It may well be OK, but I'd like to hear from
network folks...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 15:28     ` Linus Torvalds
  2010-10-29 16:21       ` Linus Torvalds
@ 2010-10-29 18:51       ` Rick Jones
  2010-10-29 18:59         ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: Rick Jones @ 2010-10-29 18:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Rosenberg, David Miller, netdev, jon.maloy, allan.stephens, Al Viro

It may be but a paint splatter on the bikeshed, or considered a case of "Doctor! 
Doctor! It hurts when I do this" "Then don't do that!" but elsewhere (not in the 
context of Linux) I've seen mention made of ISV software posting some 
particularly large receives and such - one case I saw was over 1GB, where that 
was tied to the size of a log buffer the creation of which I believe was not 
limited to ~INT_MAX by the application software.

rick jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 18:51       ` Rick Jones
@ 2010-10-29 18:59         ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29 18:59 UTC (permalink / raw)
  To: Rick Jones
  Cc: Dan Rosenberg, David Miller, netdev, jon.maloy, allan.stephens, Al Viro

On Fri, Oct 29, 2010 at 11:51 AM, Rick Jones <rick.jones2@hp.com> wrote:
> It may be but a paint splatter on the bikeshed, or considered a case of
> "Doctor! Doctor! It hurts when I do this" "Then don't do that!" but
> elsewhere (not in the context of Linux) I've seen mention made of ISV
> software posting some particularly large receives and such - one case I saw
> was over 1GB, where that was tied to the size of a log buffer the creation
> of which I believe was not limited to ~INT_MAX by the application software.

Sure. And that is why I very much don't think it's a good idea to
disallow large reads or writes. Returning an error would be bad,
because it's not entirely unreasonable for some user to just have a
really big object (presumably for unrelated reasons), and do IO on it
in one go.

But expecting anybody to _fill_ that really big object in one go is
unreasonable. Even for regular files, anybody who has ever worked with
NFS and interruptible mounts knows that they have to be able to handle
partial IO. And with non-files you obviously have that all the time.

So I think it's perfectly fine to do a terabyte read() or write(). The
fact that the kernel will then internally limit it, and won't ever
actually then write more than 2GB-1 at a time ends up being just an
"implementation issue" that happens to protect the kernel against
subsystems that happen to have issues.

And an application that cannot handle partial reads or writes, yet
works with gigabyte+ data sets is _not_ an application that I consider
reasonable. That's a user space bug, pure and simple, and no amount of
"but but ..." makes any difference what-so-ever.

                        Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 16:21       ` Linus Torvalds
  2010-10-29 16:45         ` Al Viro
@ 2010-10-29 19:32         ` David Miller
  2010-10-29 19:37           ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2010-10-29 19:32 UTC (permalink / raw)
  To: torvalds; +Cc: viro, netdev, jon.maloy, allan.stephens, drosenberg

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 29 Oct 2010 09:21:16 -0700

> NOTE! UNTESTED! It compiles, but I didn't actually boot to check.
> Maybe it has some stupid thinko in it. Also, the patch ended up being
> a bit bigger than it needed to be, because I found some annoying
> whitespace issues in rw_copy_check_uvector (spaces at the beginning of
> lines that had tabs in them) that I couldn't keep myself from not
> fixing.
> 
> Comments?

I just got out of a long dentist appointment, will look at this right
now, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 19:32         ` David Miller
@ 2010-10-29 19:37           ` Linus Torvalds
  2010-10-29 19:55             ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2010-10-29 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: viro, netdev, jon.maloy, allan.stephens, drosenberg

On Fri, Oct 29, 2010 at 12:32 PM, David Miller <davem@davemloft.net> wrote:
>
> I just got out of a long dentist appointment, will look at this right
> now, thanks!

I booted with it and committed it as "obvious". Let's see if there is
any fallout. I doubt it, but I also doubt we'll find any until we have
lots of testers, unless I made some subtly totally buggy change that
just didn't happen to show up during a normal boot.

                      Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 19:37           ` Linus Torvalds
@ 2010-10-29 19:55             ` David Miller
  2010-10-29 20:22               ` Dan Rosenberg
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2010-10-29 19:55 UTC (permalink / raw)
  To: torvalds; +Cc: viro, netdev, jon.maloy, allan.stephens, drosenberg

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 29 Oct 2010 12:37:29 -0700

> On Fri, Oct 29, 2010 at 12:32 PM, David Miller <davem@davemloft.net> wrote:
>>
>> I just got out of a long dentist appointment, will look at this right
>> now, thanks!
> 
> I booted with it and committed it as "obvious". Let's see if there is
> any fallout. I doubt it, but I also doubt we'll find any until we have
> lots of testers, unless I made some subtly totally buggy change that
> just didn't happen to show up during a normal boot.

It ought to be ok.

Let me send you a pull request so you can get the verify_iovec() change.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
  2010-10-29 19:55             ` David Miller
@ 2010-10-29 20:22               ` Dan Rosenberg
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Rosenberg @ 2010-10-29 20:22 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, viro, netdev, jon.maloy, allan.stephens

Thanks for your work on this.  Just a friendly reminder not to forget
the compat code.  :)

-Dan

On Fri, 2010-10-29 at 12:55 -0700, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 29 Oct 2010 12:37:29 -0700
> 
> > On Fri, Oct 29, 2010 at 12:32 PM, David Miller <davem@davemloft.net> wrote:
> >>
> >> I just got out of a long dentist appointment, will look at this right
> >> now, thanks!
> > 
> > I booted with it and committed it as "obvious". Let's see if there is
> > any fallout. I doubt it, but I also doubt we'll find any until we have
> > lots of testers, unless I made some subtly totally buggy change that
> > just didn't happen to show up during a normal boot.
> 
> It ought to be ok.
> 
> Let me send you a pull request so you can get the verify_iovec() change.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-10-29 20:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-28 18:22 [PATCH] net: Limit socket I/O iovec total length to INT_MAX David Miller
2010-10-28 18:33 ` Linus Torvalds
2010-10-28 18:37   ` David Miller
2010-10-29  6:40 ` Linus Torvalds
2010-10-29 14:00   ` Dan Rosenberg
2010-10-29 15:28     ` Linus Torvalds
2010-10-29 16:21       ` Linus Torvalds
2010-10-29 16:45         ` Al Viro
2010-10-29 17:01           ` Linus Torvalds
2010-10-29 17:32             ` Al Viro
2010-10-29 19:32         ` David Miller
2010-10-29 19:37           ` Linus Torvalds
2010-10-29 19:55             ` David Miller
2010-10-29 20:22               ` Dan Rosenberg
2010-10-29 18:51       ` Rick Jones
2010-10-29 18:59         ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.