linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* recvmmsg() timeout behavior strangeness [RESEND]
@ 2014-04-30 13:59 Michael Kerrisk (man-pages)
  2014-05-03 10:28 ` Michael Kerrisk (man-pages)
  2014-05-12 10:15 ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-04-30 13:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, lkml
  Cc: mtk.manpages, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Arnaldo,

I raised this issue somewhat more than a year ago, here:
http://thread.gmane.org/gmane.linux.man/3477
but got no reply from you. (Chris Friesen in that thread agreed 
that there is a problem though.)

Here, a slightly revised version of that mail, since I've just bumper 
into a related problem in a different context...

As part of his attempt to better document the recvmmsg() syscall that
you added in commit a2e2725541fad72416326798c2d7fa4dafb7d337, Elie de
Brauwer alerted to me to some strangeness in the timeout behavior of
the syscall. I suspect there's a bug that needs fixing, as detailed
below.

AFAICT, the timeout argument was added to this syscall as a result of
the discussion here:
http://markmail.org/message/m5l2ap4hiiimut6k#query:+page:1+mid:m5l2ap4hiiimut6k+state:results
(20-21 May 2009, "[RFC 1/2] net: Introduce recvmmsg...")

If I understand correctly, the *intended* purpose of the timeout
argument is to set a limit on how long to wait for additional
datagrams after the arrival of an initial datagram. However, the
syscall behaves in quite a different way. Instead, it potentially
blocks forever, regardless of the timeout. The way the timeout seems
to work is as follows:

1. The timeout, T, is armed on receipt of first diagram, starting at time X.
2. After each further datagram is received, a check is made if we have
reached time X+T. If we have reached that time, then the syscall
returns.

Since the timeout is only checked after the arrival of each datagram,
we can have scenarios like the following:

0. Assume a timeout of 10 seconds, and that vlen is 5.
1. First datagram arrives at time X.
2. Second datagram arrives at time X+2 secs
3. No more datagrams arrive.

In this case, the call blocks forever. Is that intended behavior?
(Basically, if up to vlen-1 datagrams arrive before X+T, but then no 
more datagrams arrive, the call will remain blocked forever.) If it's
intended behavior, could you elaborate the use case, since it would be
good to add that to the man page. If not, a fix seems to be needed,
since otherwise, it's hard to see how the recvmmsg() timeout argument
can sanely be used.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-04-30 13:59 recvmmsg() timeout behavior strangeness [RESEND] Michael Kerrisk (man-pages)
@ 2014-05-03 10:28 ` Michael Kerrisk (man-pages)
  2014-05-03 11:29   ` Florian Westphal
  2014-05-12 10:15 ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-03 10:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, lkml
  Cc: Michael Kerrisk, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Arnaldo,

On Wed, Apr 30, 2014 at 3:59 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Arnaldo,
>
> I raised this issue somewhat more than a year ago, here:
> http://thread.gmane.org/gmane.linux.man/3477
> but got no reply from you. (Chris Friesen in that thread agreed
> that there is a problem though.)
>
> Here, a slightly revised version of that mail, since I've just bumper
> into a related problem in a different context...
>
> As part of his attempt to better document the recvmmsg() syscall that
> you added in commit a2e2725541fad72416326798c2d7fa4dafb7d337, Elie de
> Brauwer alerted to me to some strangeness in the timeout behavior of
> the syscall. I suspect there's a bug that needs fixing, as detailed
> below.
>
> AFAICT, the timeout argument was added to this syscall as a result of
> the discussion here:
> http://markmail.org/message/m5l2ap4hiiimut6k#query:+page:1+mid:m5l2ap4hiiimut6k+state:results
> (20-21 May 2009, "[RFC 1/2] net: Introduce recvmmsg...")
>
> If I understand correctly, the *intended* purpose of the timeout
> argument is to set a limit on how long to wait for additional
> datagrams after the arrival of an initial datagram. However, the
> syscall behaves in quite a different way. Instead, it potentially
> blocks forever, regardless of the timeout. The way the timeout seems
> to work is as follows:

So that the report does not get lost, I've created
https://bugzilla.kernel.org/show_bug.cgi?id=75371
to track it

Reinvestigating the problem, I see that I got my description of the
behavior slightly wrong, although the fundamental problem remains.
Here's my improved formulation:

int recvmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,

unsigned int flags, struct timespec *timeout);

As currently implemented, the recvmmsg() timeout feature appears to be
fit for no sane use. The timeout argument is implemented as follows:

    Timer is armed at the time of the call

    while (datagrams-received < vlen) {
        Wait for the next datagram

        Check if timeout has been exceeded; if yes, break out of loop
    }


Since the timeout is only checked after the arrival of each datagram,
we can have scenarios like the following:

0. Assume a timeout of 10 (T) seconds, that vlen is 5, and the call
   is made at time X

1. First datagram arrives at time X+2.

2. Second datagram arrives at time X+4 secs

3. Third datagram arrives at time X+6 secs

4. No more datagrams arrive.

In this case, the call blocks forever. It hardly seems that this could
be intended behavior. The problem, of course is that the timeout is
checked only after receipt of a datagram.

Basically, if up to vlen-1 datagrams arrive before X+T, but then no
more datagrams arrive, the call will remain blocked forever. If it's
intended behavior (seems unlikely), the rationale needs to be
elaborated so that it can be documented in the man page. If not, a fix
seems to be needed, since otherwise, it's hard to see how the
recvmmsg() timeout argument can sanely be used.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-03 10:28 ` Michael Kerrisk (man-pages)
@ 2014-05-03 11:29   ` Florian Westphal
  2014-05-03 11:39     ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 37+ messages in thread
From: Florian Westphal @ 2014-05-03 11:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Arnaldo Carvalho de Melo, lkml, linux-man, netdev,
	Ondřej Bílka, Caitlin Bestler, Neil Horman,
	Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
> Reinvestigating the problem, I see that I got my description of the
> behavior slightly wrong, although the fundamental problem remains.
> Here's my improved formulation:
[..]

> Since the timeout is only checked after the arrival of each datagram,
> we can have scenarios like the following:
> 
> 0. Assume a timeout of 10 (T) seconds, that vlen is 5, and the call
>    is made at time X
> 
> 1. First datagram arrives at time X+2.
> 
> 2. Second datagram arrives at time X+4 secs
> 
> 3. Third datagram arrives at time X+6 secs
> 
> 4. No more datagrams arrive.
> 
> In this case, the call blocks forever. It hardly seems that this could
> be intended behavior. The problem, of course is that the timeout is
> checked only after receipt of a datagram.

Isn't that what MSG_WAITFORONE is supposed to solve?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-03 11:29   ` Florian Westphal
@ 2014-05-03 11:39     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-03 11:39 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Arnaldo Carvalho de Melo, lkml, linux-man, netdev,
	Ondřej Bílka, Caitlin Bestler, Neil Horman,
	Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

On Sat, May 3, 2014 at 1:29 PM, Florian Westphal <fw@strlen.de> wrote:
> Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
>> Reinvestigating the problem, I see that I got my description of the
>> behavior slightly wrong, although the fundamental problem remains.
>> Here's my improved formulation:
> [..]
>
>> Since the timeout is only checked after the arrival of each datagram,
>> we can have scenarios like the following:
>>
>> 0. Assume a timeout of 10 (T) seconds, that vlen is 5, and the call
>>    is made at time X
>>
>> 1. First datagram arrives at time X+2.
>>
>> 2. Second datagram arrives at time X+4 secs
>>
>> 3. Third datagram arrives at time X+6 secs
>>
>> 4. No more datagrams arrive.
>>
>> In this case, the call blocks forever. It hardly seems that this could
>> be intended behavior. The problem, of course is that the timeout is
>> checked only after receipt of a datagram.
>
> Isn't that what MSG_WAITFORONE is supposed to solve?

I don't think so. I understand the idea of the timeout to be: get as
many datagrams as you can within a certain interval. MSG_WAITFORONE is
orthogonal to that goal (you can specify MSG_WAITFORONE without an
infiniite timeout, for example).

Also, consider the algorithm above: if no datagrams arrive, the
timeout is in effect ignored.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-04-30 13:59 recvmmsg() timeout behavior strangeness [RESEND] Michael Kerrisk (man-pages)
  2014-05-03 10:28 ` Michael Kerrisk (man-pages)
@ 2014-05-12 10:15 ` Michael Kerrisk (man-pages)
  2014-05-12 14:34   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-12 10:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, lkml
  Cc: Michael Kerrisk, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Hi Arnaldo,

Ping!

Cheers,

Michael


On Wed, Apr 30, 2014 at 3:59 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Arnaldo,
>
> I raised this issue somewhat more than a year ago, here:
> http://thread.gmane.org/gmane.linux.man/3477
> but got no reply from you. (Chris Friesen in that thread agreed
> that there is a problem though.)
>
> Here, a slightly revised version of that mail, since I've just bumper
> into a related problem in a different context...
>
> As part of his attempt to better document the recvmmsg() syscall that
> you added in commit a2e2725541fad72416326798c2d7fa4dafb7d337, Elie de
> Brauwer alerted to me to some strangeness in the timeout behavior of
> the syscall. I suspect there's a bug that needs fixing, as detailed
> below.
>
> AFAICT, the timeout argument was added to this syscall as a result of
> the discussion here:
> http://markmail.org/message/m5l2ap4hiiimut6k#query:+page:1+mid:m5l2ap4hiiimut6k+state:results
> (20-21 May 2009, "[RFC 1/2] net: Introduce recvmmsg...")
>
> If I understand correctly, the *intended* purpose of the timeout
> argument is to set a limit on how long to wait for additional
> datagrams after the arrival of an initial datagram. However, the
> syscall behaves in quite a different way. Instead, it potentially
> blocks forever, regardless of the timeout. The way the timeout seems
> to work is as follows:
>
> 1. The timeout, T, is armed on receipt of first diagram, starting at time X.
> 2. After each further datagram is received, a check is made if we have
> reached time X+T. If we have reached that time, then the syscall
> returns.
>
> Since the timeout is only checked after the arrival of each datagram,
> we can have scenarios like the following:
>
> 0. Assume a timeout of 10 seconds, and that vlen is 5.
> 1. First datagram arrives at time X.
> 2. Second datagram arrives at time X+2 secs
> 3. No more datagrams arrive.
>
> In this case, the call blocks forever. Is that intended behavior?
> (Basically, if up to vlen-1 datagrams arrive before X+T, but then no
> more datagrams arrive, the call will remain blocked forever.) If it's
> intended behavior, could you elaborate the use case, since it would be
> good to add that to the man page. If not, a fix seems to be needed,
> since otherwise, it's hard to see how the recvmmsg() timeout argument
> can sanely be used.
>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-12 10:15 ` Michael Kerrisk (man-pages)
@ 2014-05-12 14:34   ` Arnaldo Carvalho de Melo
  2014-05-21 21:05     ` [PATCH/RFC] " Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-12 14:34 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
> 
> Ping!

I acknowledge the problem, the timeout has to be passed to the
underlying ->recvmsg() implementations that should return the time spent
waiting for each packet, so that we can accrue that at recvmmsg level.

We can do either passing an extra timeout parameter to the recvmsg
implementations or using some struct sock member to specify that
timeout.

The first approach is intrusive, touches tons of files, so I'll try
making it all mostly transparent by hooking into sock_rcvtimeo()
somehow.

- Arnaldo
 
> Cheers,
> 
> Michael
> 
> 
> On Wed, Apr 30, 2014 at 3:59 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
> > Arnaldo,
> >
> > I raised this issue somewhat more than a year ago, here:
> > http://thread.gmane.org/gmane.linux.man/3477
> > but got no reply from you. (Chris Friesen in that thread agreed
> > that there is a problem though.)
> >
> > Here, a slightly revised version of that mail, since I've just bumper
> > into a related problem in a different context...
> >
> > As part of his attempt to better document the recvmmsg() syscall that
> > you added in commit a2e2725541fad72416326798c2d7fa4dafb7d337, Elie de
> > Brauwer alerted to me to some strangeness in the timeout behavior of
> > the syscall. I suspect there's a bug that needs fixing, as detailed
> > below.
> >
> > AFAICT, the timeout argument was added to this syscall as a result of
> > the discussion here:
> > http://markmail.org/message/m5l2ap4hiiimut6k#query:+page:1+mid:m5l2ap4hiiimut6k+state:results
> > (20-21 May 2009, "[RFC 1/2] net: Introduce recvmmsg...")
> >
> > If I understand correctly, the *intended* purpose of the timeout
> > argument is to set a limit on how long to wait for additional
> > datagrams after the arrival of an initial datagram. However, the
> > syscall behaves in quite a different way. Instead, it potentially
> > blocks forever, regardless of the timeout. The way the timeout seems
> > to work is as follows:
> >
> > 1. The timeout, T, is armed on receipt of first diagram, starting at time X.
> > 2. After each further datagram is received, a check is made if we have
> > reached time X+T. If we have reached that time, then the syscall
> > returns.
> >
> > Since the timeout is only checked after the arrival of each datagram,
> > we can have scenarios like the following:
> >
> > 0. Assume a timeout of 10 seconds, and that vlen is 5.
> > 1. First datagram arrives at time X.
> > 2. Second datagram arrives at time X+2 secs
> > 3. No more datagrams arrive.
> >
> > In this case, the call blocks forever. Is that intended behavior?
> > (Basically, if up to vlen-1 datagrams arrive before X+T, but then no
> > more datagrams arrive, the call will remain blocked forever.) If it's
> > intended behavior, could you elaborate the use case, since it would be
> > good to add that to the man page. If not, a fix seems to be needed,
> > since otherwise, it's hard to see how the recvmmsg() timeout argument
> > can sanely be used.
> >
> > Thanks,
> >
> > Michael
> >
> > --
> > Michael Kerrisk
> > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> > Linux/UNIX System Programming Training: http://man7.org/training/
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-12 14:34   ` Arnaldo Carvalho de Melo
@ 2014-05-21 21:05     ` Arnaldo Carvalho de Melo
  2014-05-22 14:27       ` Michael Kerrisk (man-pages)
  2014-05-23 19:00       ` David Miller
  0 siblings, 2 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-21 21:05 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu:
> > Hi Arnaldo,
 
> > Ping!

> I acknowledge the problem, the timeout has to be passed to the
> underlying ->recvmsg() implementations that should return the time spent
> waiting for each packet, so that we can accrue that at recvmmsg level.
 
> We can do either passing an extra timeout parameter to the recvmsg
> implementations or using some struct sock member to specify that
> timeout.
 
> The first approach is intrusive, touches tons of files, so I'll try
> making it all mostly transparent by hooking into sock_rcvtimeo()
> somehow.

But after thinking a bit more, looks like we need to do that, please
take a look at the attached patch to see if it addresses the problem.

Mostly it adds a new timeop to the per protocol recvmsg()
implementations, that, if not NULL, should be used instead of
SO_RCVTIMEO.

since the underlying recvmsg implementations already check that timeout,
return what is remaining, that will then be used in subsequent recvmsg
calls, at the end we just convert it back to timespec format.

In most cases it is just passed to skb_recv_datagram, that will check
the pointer, use it and update if not NULL.

Should have no problems, but I only did a boot with a system with this
patch applied, no problems noticed on a normal desktop session, ssh,
etc.

 Arnaldo

[-- Attachment #2: recvmmsg-timeout.patch --]
[-- Type: text/plain, Size: 72596 bytes --]

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 850246206b12..e5d36f815083 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -151,7 +151,7 @@ unlock:
 }
 
 static int hash_recvmsg(struct kiocb *unused, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags)
+			struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a19c027b29bd..4bde01591174 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -419,7 +419,7 @@ unlock:
 }
 
 static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock,
-			    struct msghdr *msg, size_t ignored, int flags)
+			    struct msghdr *msg, size_t ignored, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be82284cf9d..254515f71793 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -113,7 +113,7 @@ mISDN_sock_cmsg(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
 
 static int
 mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-		   struct msghdr *msg, size_t len, int flags)
+		   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sk_buff		*skb;
 	struct sock		*sk = sock->sk;
@@ -130,7 +130,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == MISDN_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3381c4f91a8c..13d12ef322f2 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1111,7 +1111,7 @@ static int macvtap_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int macvtap_recvmsg(struct kiocb *iocb, struct socket *sock,
 			   struct msghdr *m, size_t total_len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct macvtap_queue *q = container_of(sock, struct macvtap_queue, sock);
 	int ret;
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 2ea7efd11857..30194c6e3fe8 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -963,7 +963,7 @@ static const struct ppp_channel_ops pppoe_chan_ops = {
 };
 
 static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *m, size_t total_len, int flags)
+		  struct msghdr *m, size_t total_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -975,7 +975,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &error);
+				flags & MSG_DONTWAIT, &error, timeop);
 	if (error < 0)
 		goto end;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ee328ba101e7..fda4b3ac215c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1474,7 +1474,7 @@ static int tun_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = __tun_get(tfile);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index be414d2b2b22..46a706378d79 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -601,7 +601,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(headcount > UIO_MAXIOV)) {
 			msg.msg_iovlen = 1;
 			err = sock->ops->recvmsg(NULL, sock, &msg,
-						 1, MSG_DONTWAIT | MSG_TRUNC);
+						 1, MSG_DONTWAIT | MSG_TRUNC, NULL);
 			pr_debug("Discarded rx packet: len %zd\n", sock_len);
 			continue;
 		}
@@ -627,7 +627,7 @@ static void handle_rx(struct vhost_net *net)
 			copy_iovec_hdr(vq->iov, nvq->hdr, sock_hlen, in);
 		msg.msg_iovlen = in;
 		err = sock->ops->recvmsg(NULL, sock, &msg,
-					 sock_len, MSG_DONTWAIT | MSG_TRUNC);
+					 sock_len, MSG_DONTWAIT | MSG_TRUNC, NULL);
 		/* Userspace might have consumed the packet meanwhile:
 		 * it's not supposed to do this usually, but might be hard
 		 * to prevent. Discard data we got (if any) and keep going. */
diff --git a/include/linux/net.h b/include/linux/net.h
index 94734a6259a4..6cf620f2b8f0 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -171,10 +171,13 @@ struct proto_ops {
 	 * returning uninitialized memory to user space.  The recvfrom
 	 * handlers can assume that msg.msg_name is either NULL or has
 	 * a minimum size of sizeof(struct sockaddr_storage).
+	 * timeop contains a per call timeout (as opposed as per socket,
+	 * used by recvmmsg, set it to NULL to disable it. It should return
+	 * the remaining time, if not NULL.
 	 */
 	int		(*recvmsg)   (struct kiocb *iocb, struct socket *sock,
 				      struct msghdr *m, size_t total_len,
-				      int flags);
+				      int flags, long *timeop);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
@@ -215,7 +218,7 @@ int sock_create_lite(int family, int type, int proto, struct socket **res);
 void sock_release(struct socket *sock);
 int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
-		 int flags);
+		 int flags, long *timeop);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
 struct socket *sockfd_lookup(int fd, int *err);
 struct socket *sock_from_file(struct file *file, int *err);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7a9beeb1c458..cdfdd1bd6358 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2479,9 +2479,9 @@ static inline void skb_frag_add_head(struct sk_buff *skb, struct sk_buff *frag)
 	for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next)
 
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
-				    int *peeked, int *off, int *err);
+				    int *peeked, int *off, int *err, long *timeop);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
-				  int *err);
+				  int *err, long *timeop);
 unsigned int datagram_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait);
 int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 428277869400..6c007bd57f39 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -101,7 +101,7 @@ struct vsock_transport {
 	/* DGRAM. */
 	int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
 	int (*dgram_dequeue)(struct kiocb *kiocb, struct vsock_sock *vsk,
-			     struct msghdr *msg, size_t len, int flags);
+			     struct msghdr *msg, size_t len, int flags, long *timeop);
 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
 			     struct iovec *, size_t len);
 	bool (*dgram_allow)(u32 cid, u32 port);
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index 904777c1cd24..ee75e6875aab 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -246,9 +246,9 @@ void bt_sock_unregister(int proto);
 void bt_sock_link(struct bt_sock_list *l, struct sock *s);
 void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
 int  bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags);
+				struct msghdr *msg, size_t len, int flags, long *timeop);
 int  bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags);
+			struct msghdr *msg, size_t len, int flags, long *timeop);
 uint bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index fe7994c48b75..f80071949b98 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -26,7 +26,7 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 		      size_t size, int flags);
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags);
+		 size_t size, int flags, long *timeop);
 int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
diff --git a/include/net/ping.h b/include/net/ping.h
index 026479b61a2d..c259ba72c811 100644
--- a/include/net/ping.h
+++ b/include/net/ping.h
@@ -76,7 +76,7 @@ int  ping_getfrag(void *from, char *to, int offset, int fraglen, int odd,
 		  struct sk_buff *);
 
 int  ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int  ping_common_sendmsg(int family, struct msghdr *msg, size_t len,
 			 void *user_icmph, size_t icmph_len);
 int  ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
diff --git a/include/net/sock.h b/include/net/sock.h
index 21569cf456ed..3427cde277e3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -959,7 +959,7 @@ struct proto {
 	int			(*recvmsg)(struct kiocb *iocb, struct sock *sk,
 					   struct msghdr *msg,
 					   size_t len, int noblock, int flags,
-					   int *addr_len);
+					   int *addr_len, long *timeop);
 	int			(*sendpage)(struct sock *sk, struct page *page,
 					int offset, size_t size, int flags);
 	int			(*bind)(struct sock *sk,
@@ -1591,7 +1591,7 @@ int sock_no_getsockopt(struct socket *, int , int, char __user *, int __user *);
 int sock_no_setsockopt(struct socket *, int, int, char __user *, unsigned int);
 int sock_no_sendmsg(struct kiocb *, struct socket *, struct msghdr *, size_t);
 int sock_no_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		    int);
+		    int, long *);
 int sock_no_mmap(struct file *file, struct socket *sock,
 		 struct vm_area_struct *vma);
 ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
@@ -1604,7 +1604,7 @@ ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
 int sock_common_getsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, int __user *optlen);
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags);
+			       struct msghdr *msg, size_t size, int flags, long *timeop);
 int sock_common_setsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, unsigned int optlen);
 int compat_sock_common_getsockopt(struct socket *sock, int level,
@@ -2102,6 +2102,11 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
 	return noblock ? 0 : sk->sk_rcvtimeo;
 }
 
+static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock)
+{
+	return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
+}
+
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
 {
 	return noblock ? 0 : sk->sk_sndtimeo;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f5d6ca4a9d28..f3605e832499 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -437,7 +437,7 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname,
 void tcp_set_keepalive(struct sock *sk, int val);
 void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req);
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len);
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop);
 void tcp_parse_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 786ee2f83d5f..c35e56442fe7 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1732,7 +1732,7 @@ out:
 }
 
 static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-			 size_t size, int flags)
+			 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ddpehdr *ddp;
@@ -1742,7 +1742,7 @@ static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
 	struct sk_buff *skb;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-						flags & MSG_DONTWAIT, &err);
+						flags & MSG_DONTWAIT, &err, timeop);
 	lock_sock(sk);
 
 	if (!skb)
diff --git a/net/atm/common.c b/net/atm/common.c
index 7b491006eaf4..8def66eaed87 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -524,7 +524,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci)
 }
 
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags)
+		size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct atm_vcc *vcc;
@@ -544,7 +544,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	    !test_bit(ATM_VF_READY, &vcc->flags))
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/atm/common.h b/net/atm/common.h
index cc3c2dae4d79..b370ffd78a39 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -14,7 +14,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family);
 int vcc_release(struct socket *sock);
 int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags);
+		size_t size, int flags, long *timeop);
 int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 		size_t total_len);
 unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f48fc0f..ee0411920216 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1600,7 +1600,7 @@ out:
 }
 
 static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1619,7 +1619,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* Now we can treat all alike */
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 2021c481cdb6..4896bd954293 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -209,7 +209,7 @@ struct sock *bt_accept_dequeue(struct sock *parent, struct socket *newsock)
 EXPORT_SYMBOL(bt_accept_dequeue);
 
 int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags)
+				struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -222,7 +222,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
@@ -282,7 +282,7 @@ static long bt_sock_data_wait(struct sock *sk, long timeo)
 }
 
 int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int err = 0;
@@ -297,7 +297,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	lock_sock(sk);
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo  = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo  = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		struct sk_buff *skb;
@@ -381,6 +381,8 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	} while (size);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return copied ? : err;
 }
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index f608bffdb8b9..f24413835e2c 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -829,7 +829,7 @@ static void hci_sock_cmsg(struct sock *sk, struct msghdr *msg,
 }
 
 static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -844,7 +844,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == BT_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index ef5e5b04f34f..19a90e0d8172 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -976,7 +976,7 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t len, int flags)
+			      struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct l2cap_pinfo *pi = l2cap_pi(sk);
@@ -1003,9 +1003,9 @@ static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	release_sock(sk);
 
 	if (sock->type == SOCK_STREAM)
-		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags, timeop);
 	else
-		err = bt_sock_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 
 	if (pi->chan->mode != L2CAP_MODE_ERTM)
 		return err;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index c603a5eb4720..a3cbf8c4daf5 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -617,7 +617,7 @@ done:
 }
 
 static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rfcomm_dlc *d = rfcomm_pi(sk)->dlc;
@@ -628,7 +628,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return 0;
 	}
 
-	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags);
+	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags, timeop);
 
 	lock_sock(sk);
 	if (!(flags & MSG_PEEK) && len > 0)
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index c06dbd3938e8..bfaa16bdc366 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -700,7 +700,7 @@ static void sco_conn_defer_accept(struct hci_conn *conn, u16 setting)
 }
 
 static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sco_pinfo *pi = sco_pi(sk);
@@ -718,7 +718,7 @@ static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	release_sock(sk);
 
-	return bt_sock_recvmsg(iocb, sock, msg, len, flags);
+	return bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 }
 
 static int sco_sock_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index e8437094d15f..069eb2ffde29 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -272,7 +272,7 @@ static void caif_check_flow_release(struct sock *sk)
  * changed locking, address handling and added MSG_TRUNC.
  */
 static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *m, size_t len, int flags)
+			       struct msghdr *m, size_t len, int flags, long *timeop)
 
 {
 	struct sock *sk = sock->sk;
@@ -284,7 +284,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (m->msg_flags&MSG_OOB)
 		goto read_error;
 
-	skb = skb_recv_datagram(sk, flags, 0 , &ret);
+	skb = skb_recv_datagram(sk, flags, 0 , &ret, timeop);
 	if (!skb)
 		goto read_error;
 	copylen = skb->len;
@@ -345,7 +345,7 @@ static long caif_stream_data_wait(struct sock *sk, long timeo)
  */
 static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int copied = 0;
@@ -367,7 +367,7 @@ static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	caif_read_lock(sk);
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, flags&MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags&MSG_DONTWAIT);
 
 	do {
 		int chunk;
@@ -450,6 +450,8 @@ unlock:
 	caif_read_unlock(sk);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return copied ? : err;
 }
 
diff --git a/net/can/bcm.c b/net/can/bcm.c
index dcb75c0e66c1..dc12c80ec5cd 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1541,7 +1541,7 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
 }
 
 static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1551,7 +1551,7 @@ static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
-	skb = skb_recv_datagram(sk, flags, noblock, &error);
+	skb = skb_recv_datagram(sk, flags, noblock, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/can/raw.c b/net/can/raw.c
index 081e81fd017f..0a4aa9d98e5e 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -731,7 +731,7 @@ send_failed:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -741,7 +741,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a16ed7bbe376..a08c4c9dcd23 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -138,6 +138,9 @@ out_noerr:
  *	@off: an offset in bytes to peek skb from. Returns an offset
  *	      within an skb where data actually starts
  *	@err: error code returned
+ *	@timeop: per call timeout (as opposed as per socket via SO_RCVTIMEO),
+ *		 will return the remaining time, used in recvmmsg, ignored
+ *		 if set to NULL.
  *
  *	Get a datagram skbuff, understands the peeking, nonblocking wakeups
  *	and possible races. This replaces identical code in packet, raw and
@@ -162,7 +165,7 @@ out_noerr:
  *	the standard around please.
  */
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
-				    int *peeked, int *off, int *err)
+				    int *peeked, int *off, int *err, long *timeop)
 {
 	struct sk_buff *skb, *last;
 	long timeo;
@@ -174,7 +177,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 	if (error)
 		goto no_packet;
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		/* Again only user level code calls this function, so nothing
@@ -205,6 +208,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
+			if (timeop)
+				*timeop = timeo;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
@@ -229,12 +234,12 @@ no_packet:
 EXPORT_SYMBOL(__skb_recv_datagram);
 
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags,
-				  int noblock, int *err)
+				  int noblock, int *err, long *timeop)
 {
 	int peeked, off = 0;
 
 	return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				   &peeked, &off, err);
+				   &peeked, &off, err, timeop);
 }
 EXPORT_SYMBOL(skb_recv_datagram);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 664ee4295b6f..898fd9b5fd0b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2191,7 +2191,7 @@ int sock_no_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 EXPORT_SYMBOL(sock_no_sendmsg);
 
 int sock_no_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
-		    size_t len, int flags)
+		    size_t len, int flags, long *timeop)
 {
 	return -EOPNOTSUPP;
 }
@@ -2577,14 +2577,14 @@ EXPORT_SYMBOL(compat_sock_common_getsockopt);
 #endif
 
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
 	int err;
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index c67816647cce..fbf4cc113ffe 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -314,7 +314,7 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		 size_t size);
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		 struct msghdr *msg, size_t len, int nonblock, int flags,
-		 int *addr_len);
+		 int *addr_len, long *timeop);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
 unsigned int dccp_poll(struct file *file, struct socket *sock,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index de2c1e719305..92ae3d37c7f0 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -808,7 +808,7 @@ out_discard:
 EXPORT_SYMBOL_GPL(dccp_sendmsg);
 
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int nonblock, int flags, int *addr_len)
+		 size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	const struct dccp_hdr *dh;
 	long timeo;
@@ -820,7 +820,7 @@ int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	do {
 		struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
@@ -910,6 +910,8 @@ verify_sock_status:
 	} while (1);
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return len;
 }
 
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 4c04848953bd..d18eba03643f 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1669,7 +1669,7 @@ static int dn_data_ready(struct sock *sk, struct sk_buff_head *q, int flags, int
 
 
 static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
@@ -1680,7 +1680,7 @@ static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
 	struct sk_buff *skb, *n;
 	struct dn_skb_cb *cb = NULL;
 	unsigned char eor = 0;
-	long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	long timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	lock_sock(sk);
 
@@ -1814,7 +1814,8 @@ out:
 	}
 
 	release_sock(sk);
-
+	if (timeop)
+		*timeop = timeo;
 	return rv;
 }
 
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 4f0ed8780194..dd7de8959d07 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -305,14 +305,14 @@ out:
 
 static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
 		struct msghdr *msg, size_t len, int noblock, int flags,
-		int *addr_len)
+		int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 	DECLARE_SOCKADDR(struct sockaddr_ieee802154 *, saddr, msg->msg_name);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 74d54fae33d7..0303aa66a9e2 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -179,13 +179,13 @@ out:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 279132bcadd9..dfc8b9ff41bd 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -760,7 +760,7 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 EXPORT_SYMBOL(inet_sendpage);
 
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
@@ -769,7 +769,7 @@ int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	sock_rps_record_flow(sk);
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 044a0ddf6a79..791be60b38f1 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -840,7 +840,7 @@ do_confirm:
 }
 
 int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int noblock, int flags, int *addr_len)
+		 size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *isk = inet_sk(sk);
 	int family = sk->sk_family;
@@ -864,7 +864,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a9dbe58bdfe7..32aee7472bb3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -685,7 +685,7 @@ out:	return ret;
  */
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -701,7 +701,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index eb1dde37e678..bc506ffbc8d0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1601,7 +1601,7 @@ EXPORT_SYMBOL(tcp_read_sock);
  */
 
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len)
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int copied = 0;
@@ -1626,7 +1626,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	/* Urgent data needs to be handled specially. */
 	if (flags & MSG_OOB)
@@ -1993,20 +1993,18 @@ skip_copy:
 
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
-
-	release_sock(sk);
-	return copied;
-
 out:
 	release_sock(sk);
-	return err;
+	if (timeop)
+		*timeop = timeo;
+	return copied;
 
 recv_urg:
-	err = tcp_recv_urg(sk, msg, len, flags);
+	copied = tcp_recv_urg(sk, msg, len, flags);
 	goto out;
 
 recv_sndq:
-	err = tcp_peek_sndq(sk, msg, len);
+	copied = tcp_peek_sndq(sk, msg, len);
 	goto out;
 }
 EXPORT_SYMBOL(tcp_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 590532a7bd2d..6585abd935c8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1224,7 +1224,7 @@ EXPORT_SYMBOL(udp_ioctl);
  */
 
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len)
+		size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
@@ -1240,7 +1240,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index f3c27899f62b..a39aa9996b72 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -22,7 +22,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
 			  char __user *optval, int __user *optlen);
 #endif
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len);
+		size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 		 int flags);
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index dddfb5fa2b7a..56a58ed107f0 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -458,7 +458,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
 
 static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
@@ -475,7 +475,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
 		return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7edf096867c4..b5364be6b2b6 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -380,7 +380,7 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct inet_sock *inet = inet_sk(sk);
@@ -400,7 +400,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index c779c3c90b9d..cd414d719977 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -26,7 +26,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, int optname,
 int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		  size_t len);
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 void udpv6_destroy_sock(struct sock *sk);
 
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 41e4e93cb3aa..f68d874bae32 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1756,7 +1756,7 @@ out:
 
 
 static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
-		struct msghdr *msg, size_t size, int flags)
+		struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ipx_sock *ipxs = ipx_sk(sk);
@@ -1791,7 +1791,7 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &rc);
+				flags & MSG_DONTWAIT, &rc, timeop);
 	if (!skb) {
 		if (rc == -EAGAIN && (sk->sk_shutdown & RCV_SHUTDOWN))
 			rc = 0;
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 54747c25c86c..feaacaa0c970 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1373,7 +1373,7 @@ out:
  *    after being read, regardless of how much the user actually read
  */
 static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t size, int flags)
+			      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1384,7 +1384,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
 	IRDA_DEBUG(4, "%s()\n", __func__);
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
@@ -1422,7 +1422,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
  * Function irda_recvmsg_stream (iocb, sock, msg, size, flags)
  */
 static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1445,7 +1445,7 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	do {
 		int chunk;
@@ -1534,6 +1534,8 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 		}
 	}
 
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 }
 
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 01e77b0ae075..c90c8f88f5f2 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1314,7 +1314,7 @@ static void iucv_process_message_q(struct sock *sk)
 }
 
 static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -1335,7 +1335,7 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* receive/dequeue next skb:
 	 * the function understands MSG_PEEK and, thus, does not dequeue skb */
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index f3c83073afc4..4983307d3ba5 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3653,7 +3653,7 @@ out:
 
 static int pfkey_recvmsg(struct kiocb *kiocb,
 			 struct socket *sock, struct msghdr *msg, size_t len,
-			 int flags)
+			 int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct pfkey_sock *pfk = pfkey_sk(sk);
@@ -3664,7 +3664,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
 	if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT))
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 3397fe6897c0..48dc625c258c 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -507,7 +507,7 @@ no_route:
 }
 
 static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-			   size_t len, int noblock, int flags, int *addr_len)
+			   size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -518,7 +518,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 	if (flags & MSG_OOB)
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index e472d44a3b91..24a62ea7fa9d 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -645,7 +645,7 @@ do_confirm:
 
 static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			    struct msghdr *msg, size_t len, int noblock,
-			    int flags, int *addr_len)
+			    int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
@@ -662,7 +662,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (flags & MSG_ERRQUEUE)
 		return ipv6_recv_error(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 950909f04ee6..9e6db6946e4f 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -187,7 +187,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff *skb)
  */
 static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 			    struct msghdr *msg, size_t len,
-			    int flags)
+			    int flags, long *timeop)
 {
 	int err;
 	struct sk_buff *skb;
@@ -199,7 +199,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		goto end;
 
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 0080d2b0a8ae..b5edf838f9fa 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -705,7 +705,7 @@ out:
  *	Returns non-negative upon success, negative otherwise.
  */
 static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	DECLARE_SOCKADDR(struct sockaddr_llc *, uaddr, msg->msg_name);
 	const int nonblock = flags & MSG_DONTWAIT;
@@ -725,7 +725,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	seq = &llc->copied_seq;
 	if (flags & MSG_PEEK) {
@@ -851,6 +851,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 copy_uaddr:
 	if (uaddr != NULL && skb != NULL) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e0ccd84d4d67..d0b39b90d41a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2399,7 +2399,7 @@ out:
 
 static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			   struct msghdr *msg, size_t len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
 	struct scm_cookie scm;
@@ -2415,7 +2415,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 
 	copied = 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ede50d197e10..4a9078e2bf7a 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1134,7 +1134,7 @@ out:
 }
 
 static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
-		      struct msghdr *msg, size_t size, int flags)
+		      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	DECLARE_SOCKADDR(struct sockaddr_ax25 *, sax, msg->msg_name);
@@ -1154,7 +1154,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL) {
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL) {
 		release_sock(sk);
 		return er;
 	}
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a92fa9..0b233d1f1a57 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -794,7 +794,7 @@ static int llcp_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -817,7 +817,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		pr_err("Recv datagram failed state %d %d %d",
 		       sk->sk_state, err, sock_error(sk));
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index c27a6e86cae4..665d9523ce5c 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -228,7 +228,7 @@ static int rawsock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			   struct msghdr *msg, size_t len, int flags)
+			   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -238,7 +238,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	pr_debug("sock=%p sk=%p len=%zu flags=%d\n", sock, sk, len, flags);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rc);
+	skb = skb_recv_datagram(sk, flags, noblock, &rc, timeop);
 	if (!skb)
 		return rc;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b85c67ccb797..f56d816340e2 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2852,7 +2852,7 @@ out:
  */
 
 static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -2884,7 +2884,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	 *	but then it will block.
 	 */
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 
 	/*
 	 *	An error occurred so return it. Because skb_recv_datagram()
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c0e6b4..77eff48eeb83 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -127,7 +127,7 @@ static int pn_sendmsg(struct kiocb *iocb, struct sock *sk,
 
 static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb = NULL;
 	struct sockaddr_pn sa;
@@ -138,7 +138,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			MSG_CMSG_COMPAT))
 		goto out_nofree;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rval);
+	skb = skb_recv_datagram(sk, flags, noblock, &rval, timeop);
 	if (skb == NULL)
 		goto out_nofree;
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547ea5177..c5832e1958f8 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -783,7 +783,7 @@ static struct sock *pep_sock_accept(struct sock *sk, int flags, int *errp)
 	u8 pipe_handle, enabled, n_sb;
 	u8 aligned = 0;
 
-	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp);
+	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp, NULL);
 	if (!skb)
 		return NULL;
 
@@ -1248,7 +1248,7 @@ struct sk_buff *pep_read(struct sock *sk)
 
 static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb;
 	int err;
@@ -1277,7 +1277,7 @@ static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			return -EINVAL;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	lock_sock(sk);
 	if (skb == NULL) {
 		if (err == -ENOTCONN && sk->sk_state == TCP_CLOSE_WAIT)
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 48f8ffc60f8f..e511e569bbc9 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -706,7 +706,7 @@ void rds_inc_put(struct rds_incoming *inc);
 void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 		       struct rds_incoming *inc, gfp_t gfp);
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags);
+		size_t size, int msg_flags, long *timeop);
 void rds_clear_recv_queue(struct rds_sock *rs);
 int rds_notify_queue_get(struct rds_sock *rs, struct msghdr *msg);
 void rds_inc_info_copy(struct rds_incoming *inc,
diff --git a/net/rds/recv.c b/net/rds/recv.c
index bd82522534fc..6223a4b0fded 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -396,7 +396,7 @@ static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg)
 }
 
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags)
+		size_t size, int msg_flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rds_sock *rs = rds_sk_to_rs(sk);
@@ -406,7 +406,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	struct rds_incoming *inc = NULL;
 
 	/* udp_recvmsg()->sock_recvtimeo() gets away without locking too.. */
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);
 
@@ -493,6 +493,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 		rds_inc_put(inc);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return ret;
 }
 
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 8451c8cdc9de..2cfc75a1cbbb 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1212,7 +1212,7 @@ static int rose_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 
 static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rose_sock *rose = rose_sk(sk);
@@ -1229,7 +1229,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return -ENOTCONN;
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL)
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL)
 		return er;
 
 	qbit = (skb->data[0] & ROSE_Q_BIT) == ROSE_Q_BIT;
diff --git a/net/rxrpc/ar-input.c b/net/rxrpc/ar-input.c
index 63b21e580de9..2319fae4b1f6 100644
--- a/net/rxrpc/ar-input.c
+++ b/net/rxrpc/ar-input.c
@@ -655,7 +655,7 @@ void rxrpc_data_ready(struct sock *sk)
 		return;
 	}
 
-	skb = skb_recv_datagram(sk, 0, 1, &ret);
+	skb = skb_recv_datagram(sk, 0, 1, &ret, NULL);
 	if (!skb) {
 		rxrpc_put_local(local);
 		if (ret == -EAGAIN)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ba9fd36d3f15..a21e51937e27 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -573,7 +573,7 @@ extern const struct file_operations rxrpc_connection_seq_fops;
  */
 void rxrpc_remove_user_ID(struct rxrpc_sock *, struct rxrpc_call *);
 int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		  int);
+		  int, long *);
 
 /*
  * ar-security.c
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e9aaa65c0778..e8b8bb3d50ab 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -44,7 +44,7 @@ void rxrpc_remove_user_ID(struct rxrpc_sock *rx, struct rxrpc_call *call)
  *   simultaneously
  */
 int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *msg, size_t len, int flags)
+		  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct rxrpc_skb_priv *sp;
 	struct rxrpc_call *call = NULL, *continue_call = NULL;
@@ -63,7 +63,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	ullen = msg->msg_flags & MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
 
-	timeo = sock_rcvtimeo(&rx->sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(&rx->sk, timeop, flags & MSG_DONTWAIT);
 	msg->msg_flags |= MSG_MORE;
 
 	lock_sock(&rx->sk);
@@ -251,6 +251,8 @@ out:
 		rxrpc_put_call(call);
 	if (continue_call)
 		rxrpc_put_call(continue_call);
+	if (timeop)
+		*timeop = timeo;
 	_leave(" = %d [data]", copied);
 	return copied;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 2af76eaba8f7..3a1e70a22594 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2042,11 +2042,11 @@ static int sctp_skb_pull(struct sk_buff *skb, int len)
  *  flags   - flags sent or received with the user message, see Section
  *            5 for complete description of the flags.
  */
-static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
+static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *, long *);
 
 static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sctp_ulpevent *event = NULL;
 	struct sctp_sock *sp = sctp_sk(sk);
@@ -2066,7 +2066,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		goto out;
 	}
 
-	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err);
+	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
@@ -6519,13 +6519,13 @@ out:
  * with a few changes to make lksctp work.
  */
 static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
-					      int noblock, int *err)
+					      int noblock, int *err, long *timeop)
 {
 	int error;
 	struct sk_buff *skb;
 	long timeo;
 
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	pr_debug("%s: timeo:%ld, max:%ld\n", __func__, timeo,
 		 MAX_SCHEDULE_TIMEOUT);
@@ -6548,8 +6548,11 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 			skb = skb_dequeue(&sk->sk_receive_queue);
 		}
 
-		if (skb)
+		if (skb) {
+			if (timeop)
+				*timeop = timeo;
 			return skb;
+		}
 
 		/* Caller is allowed not to check sk->sk_err before calling. */
 		error = sock_error(sk);
diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..310a50971769 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -772,7 +772,7 @@ void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops);
 
 static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
-				       struct msghdr *msg, size_t size, int flags)
+				       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
@@ -782,19 +782,19 @@ static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 	si->size = size;
 	si->flags = flags;
 
-	return sock->ops->recvmsg(iocb, sock, msg, size, flags);
+	return sock->ops->recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				 struct msghdr *msg, size_t size, int flags)
+				 struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	int err = security_socket_recvmsg(sock, msg, size, flags);
 
-	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags);
+	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags, timeop);
 }
 
 int sock_recvmsg(struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -802,7 +802,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -810,7 +810,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 EXPORT_SYMBOL(sock_recvmsg);
 
 static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
-			      size_t size, int flags)
+			      size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -818,7 +818,7 @@ static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -851,7 +851,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
 	 * iovec are identical, yielding the same in-core layout and alignment
 	 */
 	msg->msg_iov = (struct iovec *)vec, msg->msg_iovlen = num;
-	result = sock_recvmsg(sock, msg, size, flags);
+	result = sock_recvmsg(sock, msg, size, flags, NULL);
 	set_fs(oldfs);
 	return result;
 }
@@ -914,7 +914,7 @@ static ssize_t do_sock_read(struct msghdr *msg, struct kiocb *iocb,
 	msg->msg_iovlen = nr_segs;
 	msg->msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
 
-	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags);
+	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags, NULL);
 }
 
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
@@ -1862,7 +1862,7 @@ SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
 	msg.msg_namelen = 0;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
-	err = sock_recvmsg(sock, &msg, size, flags);
+	err = sock_recvmsg(sock, &msg, size, flags, NULL);
 
 	if (err >= 0 && addr != NULL) {
 		err2 = move_addr_to_user(&address,
@@ -2207,7 +2207,7 @@ SYSCALL_DEFINE4(sendmmsg, int, fd, struct mmsghdr __user *, mmsg,
 }
 
 static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
-			 struct msghdr *msg_sys, unsigned int flags, int nosec)
+			 struct msghdr *msg_sys, unsigned int flags, int nosec, long *timeop)
 {
 	struct compat_msghdr __user *msg_compat =
 	    (struct compat_msghdr __user *)msg;
@@ -2265,7 +2265,7 @@ static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys,
-							  total_len, flags);
+							  total_len, flags, timeop);
 	if (err < 0)
 		goto out_freeiov;
 	len = err;
@@ -2312,7 +2312,7 @@ long __sys_recvmsg(int fd, struct msghdr __user *msg, unsigned flags)
 	if (!sock)
 		goto out;
 
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0, NULL);
 
 	fput_light(sock->file, fput_needed);
 out:
@@ -2327,6 +2327,30 @@ SYSCALL_DEFINE3(recvmsg, int, fd, struct msghdr __user *, msg,
 	return __sys_recvmsg(fd, msg, flags);
 }
 
+static int sock_set_timeout_ts(long *timeo_p, struct timespec *ts)
+{
+	if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC)
+		return -EDOM;
+
+	if (ts->tv_sec < 0) {
+		static int warned __read_mostly;
+
+		*timeo_p = 0;
+		if (warned < 10 && net_ratelimit()) {
+			warned++;
+			pr_info("%s: `%s' (pid %d) tries to set negative timeout\n",
+				__func__, current->comm, task_pid_nr(current));
+		}
+		return 0;
+	}
+	*timeo_p = MAX_SCHEDULE_TIMEOUT;
+	if (ts->tv_sec == 0 && ts->tv_nsec == 0)
+		return 0;
+	if (ts->tv_sec < (MAX_SCHEDULE_TIMEOUT / HZ - 1))
+		*timeo_p = ts->tv_sec * HZ + (ts->tv_nsec + (NSEC_PER_SEC / HZ - 1)) / (NSEC_PER_SEC / HZ);
+	return 0;
+}
+
 /*
  *     Linux recvmmsg interface
  */
@@ -2339,12 +2363,14 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
-	struct timespec end_time;
+	long timeout_hz, *timeop = NULL;
 
-	if (timeout &&
-	    poll_select_set_timeout(&end_time, timeout->tv_sec,
-				    timeout->tv_nsec))
-		return -EINVAL;
+	if (timeout) {
+		err = sock_set_timeout_ts(&timeout_hz, timeout);
+		if (err)
+			return err;
+		timeop = &timeout_hz;
+	}
 
 	datagrams = 0;
 
@@ -2366,7 +2392,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (MSG_CMSG_COMPAT & flags) {
 			err = ___sys_recvmsg(sock, (struct msghdr __user *)compat_entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = __put_user(err, &compat_entry->msg_len);
@@ -2375,7 +2401,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 			err = ___sys_recvmsg(sock,
 					     (struct msghdr __user *)entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = put_user(err, &entry->msg_len);
@@ -2390,17 +2416,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (flags & MSG_WAITFORONE)
 			flags |= MSG_DONTWAIT;
 
-		if (timeout) {
-			ktime_get_ts(timeout);
-			*timeout = timespec_sub(end_time, *timeout);
-			if (timeout->tv_sec < 0) {
-				timeout->tv_sec = timeout->tv_nsec = 0;
-				break;
-			}
-
+		if (timeout && timeout_hz == 0) {
 			/* Timeout, return less than vlen datagrams */
-			if (timeout->tv_nsec == 0 && timeout->tv_sec == 0)
-				break;
+			timeout->tv_sec = timeout->tv_nsec = 0;
+			timeop = NULL;
+			break;
 		}
 
 		/* Out of band data, return right away */
@@ -2411,6 +2431,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 out_put:
 	fput_light(sock->file, fput_needed);
 
+	if (timeop) {
+		timeout->tv_sec	 = timeout_hz / HZ;
+		timeout->tv_nsec = (timeout_hz % HZ) * (NSEC_PER_SEC / HZ);
+	}
+
 	if (err == 0)
 		return datagrams;
 
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 43bcb4699d69..e1e61082f45d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -545,7 +545,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
 			     0, 0, MSG_PEEK | MSG_DONTWAIT);
 	if (err >= 0)
-		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err);
+		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err, NULL);
 
 	if (skb == NULL) {
 		if (err != -EAGAIN) {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 25a3dcf15cae..16b2194f2add 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -967,7 +967,7 @@ static void xs_local_data_ready(struct sock *sk)
 	if (xprt == NULL)
 		goto out;
 
-	skb = skb_recv_datagram(sk, 0, 1, &err);
+	skb = skb_recv_datagram(sk, 0, 1, &err, NULL);
 	if (skb == NULL)
 		goto out;
 
@@ -1029,7 +1029,7 @@ static void xs_udp_data_ready(struct sock *sk)
 	if (!(xprt = xprt_from_sock(sk)))
 		goto out;
 
-	if ((skb = skb_recv_datagram(sk, 0, 1, &err)) == NULL)
+	if ((skb = skb_recv_datagram(sk, 0, 1, &err, NULL)) == NULL)
 		goto out;
 
 	repsize = skb->len - sizeof(struct udphdr);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 08d87fc80b10..b4f7d923c9e2 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1031,7 +1031,7 @@ static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *m, size_t buf_len, int flags)
+			struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
@@ -1054,7 +1054,7 @@ static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 restart:
 
 	/* Look for a message in receive queue; wait if necessary */
@@ -1109,6 +1109,8 @@ restart:
 		advance_rx_queue(sk);
 	}
 exit:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return res;
 }
@@ -1126,7 +1128,7 @@ exit:
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recv_stream(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *m, size_t buf_len, int flags)
+			    struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b9114e0a5b1..721904c37359 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -519,17 +519,17 @@ static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct kiocb *, struct socket *,
 			       struct msghdr *, size_t);
 static int unix_stream_recvmsg(struct kiocb *, struct socket *,
-			       struct msghdr *, size_t, int);
+			       struct msghdr *, size_t, int, long *);
 static int unix_dgram_sendmsg(struct kiocb *, struct socket *,
 			      struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct kiocb *, struct socket *,
-			      struct msghdr *, size_t, int);
+			      struct msghdr *, size_t, int, long *);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
 			      int, int);
 static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *,
 				  struct msghdr *, size_t);
 static int unix_seqpacket_recvmsg(struct kiocb *, struct socket *,
-				  struct msghdr *, size_t, int);
+				  struct msghdr *, size_t, int, long *);
 
 static int unix_set_peek_off(struct sock *sk, int val)
 {
@@ -1283,7 +1283,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags)
 	 * so that no locks are necessary.
 	 */
 
-	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err);
+	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err, NULL);
 	if (!skb) {
 		/* This means receive shutdown. */
 		if (err == 0)
@@ -1755,14 +1755,14 @@ static int unix_seqpacket_sendmsg(struct kiocb *kiocb, struct socket *sock,
 
 static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 
 	if (sk->sk_state != TCP_ESTABLISHED)
 		return -ENOTCONN;
 
-	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
+	return unix_dgram_recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
@@ -1777,7 +1777,7 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 
 static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1803,7 +1803,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	skip = sk_peek_offset(sk, flags);
 
-	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err);
+	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err, timeop);
 	if (!skb) {
 		unix_state_lock(sk);
 		/* Signal EOF on disconnected non-blocking SEQPACKET socket. */
@@ -1914,7 +1914,7 @@ static unsigned int unix_skb_len(const struct sk_buff *skb)
 
 static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1938,7 +1938,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	/* Lock the socket to prevent queue disordering
 	 * while sleeps in memcpy_tomsg
@@ -2070,6 +2070,8 @@ again:
 
 	mutex_unlock(&u->readlock);
 	scm_recv(sock, msg, siocb->scm, flags);
+	if (timeop)
+		*timeop = timeo;
 out:
 	return copied ? : err;
 }
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 85d232bed87d..2e784d976133 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1063,10 +1063,10 @@ out:
 }
 
 static int vsock_dgram_recvmsg(struct kiocb *kiocb, struct socket *sock,
-			       struct msghdr *msg, size_t len, int flags)
+			       struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	return transport->dgram_dequeue(kiocb, vsock_sk(sock->sk), msg, len,
-					flags);
+					flags, timeop);
 }
 
 static const struct proto_ops vsock_dgram_ops = {
@@ -1646,7 +1646,7 @@ out:
 static int
 vsock_stream_recvmsg(struct kiocb *kiocb,
 		     struct socket *sock,
-		     struct msghdr *msg, size_t len, int flags)
+		     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
@@ -1711,7 +1711,7 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 		err = -ENOMEM;
 		goto out;
 	}
-	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeout = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 	copied = 0;
 
 	err = transport->notify_recv_init(vsk, target, &recv_data);
@@ -1820,6 +1820,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 
 out_wait:
 	finish_wait(sk_sleep(sk), &wait);
+	if (timeop)
+		*timeop = timeout;
 out:
 	release_sock(sk);
 	return err;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 9bb63ffec4f2..9c9e43c17b34 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1733,7 +1733,7 @@ static int vmci_transport_dgram_enqueue(
 static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 					struct vsock_sock *vsk,
 					struct msghdr *msg, size_t len,
-					int flags)
+					int flags, long *timeop)
 {
 	int err;
 	int noblock;
@@ -1748,7 +1748,7 @@ static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 
 	/* Retrieve the head sk_buff from the socket's receive queue. */
 	err = 0;
-	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err);
+	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err, timeop);
 	if (err)
 		return err;
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418ef093..da22c042469a 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1254,7 +1254,7 @@ out_kfree_skb:
 
 static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *msg, size_t size,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct x25_sock *x25 = x25_sk(sk);
@@ -1306,7 +1306,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		/* Now we can treat all alike */
 		release_sock(sk);
 		skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-					flags & MSG_DONTWAIT, &rc);
+					flags & MSG_DONTWAIT, &rc, timeop);
 		lock_sock(sk);
 		if (!skb)
 			goto out;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-21 21:05     ` [PATCH/RFC] " Arnaldo Carvalho de Melo
@ 2014-05-22 14:27       ` Michael Kerrisk (man-pages)
  2014-05-24  6:13         ` Michael Kerrisk (man-pages)
  2014-05-26 13:46         ` Arnaldo Carvalho de Melo
  2014-05-23 19:00       ` David Miller
  1 sibling, 2 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-22 14:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Hi Arnaldo,

On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu:
>>> Hi Arnaldo,
>  
>>> Ping!
> 
>> I acknowledge the problem, the timeout has to be passed to the
>> underlying ->recvmsg() implementations that should return the time spent
>> waiting for each packet, so that we can accrue that at recvmmsg level.
>  
>> We can do either passing an extra timeout parameter to the recvmsg
>> implementations or using some struct sock member to specify that
>> timeout.
>  
>> The first approach is intrusive, touches tons of files, so I'll try
>> making it all mostly transparent by hooking into sock_rcvtimeo()
>> somehow.
> 
> But after thinking a bit more, looks like we need to do that, please
> take a look at the attached patch to see if it addresses the problem.
> 
> Mostly it adds a new timeop to the per protocol recvmsg()
> implementations, that, if not NULL, should be used instead of
> SO_RCVTIMEO.
> 
> since the underlying recvmsg implementations already check that timeout,
> return what is remaining, that will then be used in subsequent recvmsg
> calls, at the end we just convert it back to timespec format.
> 
> In most cases it is just passed to skb_recv_datagram, that will check
> the pointer, use it and update if not NULL.
> 
> Should have no problems, but I only did a boot with a system with this
> patch applied, no problems noticed on a normal desktop session, ssh,
> etc.

Thanks! I applied this patch against 3.15-rc6.

recvmmsg() now (mostly) does what I expect: 
* it waits until either the timeout expires or vlen messages 
  have been received
* If no message is received before timeout, it returns -1/EAGAIN.
* If vlen messages are received before the timeout expires, then
  the remaining time is returned in timeout.

One question: in the event that the call is interrupted by a signal 
handler, it fails (as expected) with EINTR, but the 'timeout' value is 
not updated with the remaining time on the timer. Would it be desirable 
to emulate the behavior of select() (and other syscalls) in this 
respect, and instead return the remaining time if interrupted by 
a signal?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-21 21:05     ` [PATCH/RFC] " Arnaldo Carvalho de Melo
  2014-05-22 14:27       ` Michael Kerrisk (man-pages)
@ 2014-05-23 19:00       ` David Miller
  2014-05-23 19:55         ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2014-05-23 19:00 UTC (permalink / raw)
  To: acme
  Cc: mtk.manpages, linux-kernel, linux-man, netdev, neleai,
	caitlin.bestler, nhorman, eliedebrauwer, steve,
	remi.denis-courmont, paul, chris.friesen

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Wed, 21 May 2014 18:05:35 -0300

> But after thinking a bit more, looks like we need to do that, please
> take a look at the attached patch to see if it addresses the problem.
> 
> Mostly it adds a new timeop to the per protocol recvmsg()
> implementations, that, if not NULL, should be used instead of
> SO_RCVTIMEO.
> 
> since the underlying recvmsg implementations already check that timeout,
> return what is remaining, that will then be used in subsequent recvmsg
> calls, at the end we just convert it back to timespec format.
> 
> In most cases it is just passed to skb_recv_datagram, that will check
> the pointer, use it and update if not NULL.
> 
> Should have no problems, but I only did a boot with a system with this
> patch applied, no problems noticed on a normal desktop session, ssh,
> etc.

This looks fine to me, but I have a small request:

+	return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;

I keep forgetting which way these expressions associate, so if you could
parenthesize the innermost ?: I'd appreciate it. :)

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-23 19:00       ` David Miller
@ 2014-05-23 19:55         ` Arnaldo Carvalho de Melo
  2014-05-24  6:13           ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-23 19:55 UTC (permalink / raw)
  To: David Miller
  Cc: mtk.manpages, linux-kernel, linux-man, netdev, neleai,
	caitlin.bestler, nhorman, eliedebrauwer, steve,
	remi.denis-courmont, paul, chris.friesen

Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
> From: Arnaldo Carvalho de Melo <acme@kernel.org>
> Date: Wed, 21 May 2014 18:05:35 -0300

> > But after thinking a bit more, looks like we need to do that, please
> > take a look at the attached patch to see if it addresses the problem.

> > Mostly it adds a new timeop to the per protocol recvmsg()
> > implementations, that, if not NULL, should be used instead of
> > SO_RCVTIMEO.

> > since the underlying recvmsg implementations already check that timeout,
> > return what is remaining, that will then be used in subsequent recvmsg
> > calls, at the end we just convert it back to timespec format.

> > In most cases it is just passed to skb_recv_datagram, that will check
> > the pointer, use it and update if not NULL.

> > Should have no problems, but I only did a boot with a system with this
> > patch applied, no problems noticed on a normal desktop session, ssh,
> > etc.
 
> This looks fine to me, but I have a small request:
 
> +	return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
 
> I keep forgetting which way these expressions associate, so if you could
> parenthesize the innermost ?: I'd appreciate it. :)

Ok, I actually wrote a sample program to verify that these ternaries did
what I meant 8)

I'll finish the cset log and do this clarification change.

Would be great to get Acked-by tags from the original reporter, Michael
and whoever had a look at this change, if possible. Michael, Elie?
 
> Thanks!

Thanks a lot for reviewing it!

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-23 19:55         ` Arnaldo Carvalho de Melo
@ 2014-05-24  6:13           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-24  6:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, David Miller
  Cc: mtk.manpages, linux-kernel, linux-man, netdev, neleai,
	caitlin.bestler, nhorman, eliedebrauwer, steve,
	remi.denis-courmont, paul, chris.friesen

On 05/23/2014 09:55 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 23, 2014 at 03:00:55PM -0400, David Miller escreveu:
>> From: Arnaldo Carvalho de Melo <acme@kernel.org>
>> Date: Wed, 21 May 2014 18:05:35 -0300
> 
>>> But after thinking a bit more, looks like we need to do that, please
>>> take a look at the attached patch to see if it addresses the problem.
> 
>>> Mostly it adds a new timeop to the per protocol recvmsg()
>>> implementations, that, if not NULL, should be used instead of
>>> SO_RCVTIMEO.
> 
>>> since the underlying recvmsg implementations already check that timeout,
>>> return what is remaining, that will then be used in subsequent recvmsg
>>> calls, at the end we just convert it back to timespec format.
> 
>>> In most cases it is just passed to skb_recv_datagram, that will check
>>> the pointer, use it and update if not NULL.
> 
>>> Should have no problems, but I only did a boot with a system with this
>>> patch applied, no problems noticed on a normal desktop session, ssh,
>>> etc.
>  
>> This looks fine to me, but I have a small request:
>  
>> +	return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
>  
>> I keep forgetting which way these expressions associate, so if you could
>> parenthesize the innermost ?: I'd appreciate it. :)
> 
> Ok, I actually wrote a sample program to verify that these ternaries did
> what I meant 8)
> 
> I'll finish the cset log and do this clarification change.
> 
> Would be great to get Acked-by tags from the original reporter, Michael
> and whoever had a look at this change, if possible. Michael, Elie?

Arnaldo, I already sent you a reply (will reping on that one),
but got no response. My light testing got the expected results,
but I still had one question about the semantics.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-22 14:27       ` Michael Kerrisk (man-pages)
@ 2014-05-24  6:13         ` Michael Kerrisk (man-pages)
  2014-05-26 13:46         ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-24  6:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Ping!

On 05/22/2014 04:27 PM, Michael Kerrisk (man-pages) wrote:
> Hi Arnaldo,
> 
> On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
>> Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu:
>>>> Hi Arnaldo,
>>  
>>>> Ping!
>>
>>> I acknowledge the problem, the timeout has to be passed to the
>>> underlying ->recvmsg() implementations that should return the time spent
>>> waiting for each packet, so that we can accrue that at recvmmsg level.
>>  
>>> We can do either passing an extra timeout parameter to the recvmsg
>>> implementations or using some struct sock member to specify that
>>> timeout.
>>  
>>> The first approach is intrusive, touches tons of files, so I'll try
>>> making it all mostly transparent by hooking into sock_rcvtimeo()
>>> somehow.
>>
>> But after thinking a bit more, looks like we need to do that, please
>> take a look at the attached patch to see if it addresses the problem.
>>
>> Mostly it adds a new timeop to the per protocol recvmsg()
>> implementations, that, if not NULL, should be used instead of
>> SO_RCVTIMEO.
>>
>> since the underlying recvmsg implementations already check that timeout,
>> return what is remaining, that will then be used in subsequent recvmsg
>> calls, at the end we just convert it back to timespec format.
>>
>> In most cases it is just passed to skb_recv_datagram, that will check
>> the pointer, use it and update if not NULL.
>>
>> Should have no problems, but I only did a boot with a system with this
>> patch applied, no problems noticed on a normal desktop session, ssh,
>> etc.
> 
> Thanks! I applied this patch against 3.15-rc6.
> 
> recvmmsg() now (mostly) does what I expect: 
> * it waits until either the timeout expires or vlen messages 
>   have been received
> * If no message is received before timeout, it returns -1/EAGAIN.
> * If vlen messages are received before the timeout expires, then
>   the remaining time is returned in timeout.
> 
> One question: in the event that the call is interrupted by a signal 
> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> not updated with the remaining time on the timer. Would it be desirable 
> to emulate the behavior of select() (and other syscalls) in this 
> respect, and instead return the remaining time if interrupted by 
> a signal?
> 
> Cheers,
> 
> Michael
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-22 14:27       ` Michael Kerrisk (man-pages)
  2014-05-24  6:13         ` Michael Kerrisk (man-pages)
@ 2014-05-26 13:46         ` Arnaldo Carvalho de Melo
  2014-05-26 21:17           ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-26 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
> 
> On 05/21/2014 11:05 PM, Arnaldo Carvalho de Melo wrote:
> > Em Mon, May 12, 2014 at 11:34:51AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Mon, May 12, 2014 at 12:15:25PM +0200, Michael Kerrisk (man-pages) escreveu:
> >>> Hi Arnaldo,
> >  
> >>> Ping!
> > 
> >> I acknowledge the problem, the timeout has to be passed to the
> >> underlying ->recvmsg() implementations that should return the time spent
> >> waiting for each packet, so that we can accrue that at recvmmsg level.
> >  
> >> We can do either passing an extra timeout parameter to the recvmsg
> >> implementations or using some struct sock member to specify that
> >> timeout.
> >  
> >> The first approach is intrusive, touches tons of files, so I'll try
> >> making it all mostly transparent by hooking into sock_rcvtimeo()
> >> somehow.
> > 
> > But after thinking a bit more, looks like we need to do that, please
> > take a look at the attached patch to see if it addresses the problem.
> > 
> > Mostly it adds a new timeop to the per protocol recvmsg()
> > implementations, that, if not NULL, should be used instead of
> > SO_RCVTIMEO.
> > 
> > since the underlying recvmsg implementations already check that timeout,
> > return what is remaining, that will then be used in subsequent recvmsg
> > calls, at the end we just convert it back to timespec format.
> > 
> > In most cases it is just passed to skb_recv_datagram, that will check
> > the pointer, use it and update if not NULL.
> > 
> > Should have no problems, but I only did a boot with a system with this
> > patch applied, no problems noticed on a normal desktop session, ssh,
> > etc.
> 
> Thanks! I applied this patch against 3.15-rc6.
> 
> recvmmsg() now (mostly) does what I expect: 
> * it waits until either the timeout expires or vlen messages 
>   have been received
> * If no message is received before timeout, it returns -1/EAGAIN.
> * If vlen messages are received before the timeout expires, then
>   the remaining time is returned in timeout.
> 
> One question: in the event that the call is interrupted by a signal 
> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> not updated with the remaining time on the timer. Would it be desirable 
> to emulate the behavior of select() (and other syscalls) in this 
> respect, and instead return the remaining time if interrupted by 
> a signal?

I think so, will check how to achieve that!
 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-26 13:46         ` Arnaldo Carvalho de Melo
@ 2014-05-26 21:17           ` Arnaldo Carvalho de Melo
  2014-05-27 16:35             ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-26 21:17 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu:
> > Thanks! I applied this patch against 3.15-rc6.

> > recvmmsg() now (mostly) does what I expect: 
> > * it waits until either the timeout expires or vlen messages 
> >   have been received
> > * If no message is received before timeout, it returns -1/EAGAIN.
> > * If vlen messages are received before the timeout expires, then
> >   the remaining time is returned in timeout.

> > One question: in the event that the call is interrupted by a signal 
> > handler, it fails (as expected) with EINTR, but the 'timeout' value is 
> > not updated with the remaining time on the timer. Would it be desirable 
> > to emulate the behavior of select() (and other syscalls) in this 
> > respect, and instead return the remaining time if interrupted by 
> > a signal?
 
> I think so, will check how to achieve that!

Can you try the attached patch on top of the first one?

It starts adding explicit parentheses on a ternary, as David requested,
and then should return the remaining timeouts in cases like signals,
etc.

Please let me know if this is enough.

- Arnaldo

P.S. compile testing while sending this message :-)

[-- Attachment #2: recvmsg-return-timeout-harder.patch --]
[-- Type: text/plain, Size: 5543 bytes --]

diff --git a/include/net/sock.h b/include/net/sock.h
index aef3d7f9c3fa..c48f61c79801 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2106,7 +2106,7 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
 
 static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock)
 {
-	return noblock ? 0 : timeop ? *timeop : sk->sk_rcvtimeo;
+	return noblock ? 0 : (timeop ? *timeop : sk->sk_rcvtimeo);
 }
 
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a08c4c9dcd23..0dd1715374fa 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -224,12 +224,14 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			goto no_packet;
 
 	} while (!wait_for_more_packets(sk, err, &timeo, last));
-
+out:
+	if (timeop)
+		*timeop = timeo;
 	return NULL;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 EXPORT_SYMBOL(__skb_recv_datagram);
 
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index feaacaa0c970..0991da69f39d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 			finish_wait(sk_sleep(sk), &wait);
 
-			if (err)
-				return err;
+			if (err) {
+				copied = err;
+				break;
+			}
 			if (sk->sk_shutdown & RCV_SHUTDOWN)
 				break;
 
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e8b8bb3d50ab..e9082ed598cd 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				if (continue_call)
 					rxrpc_put_call(continue_call);
-				return -ENODATA;
+				copied = -ENODATA;
+				goto out_copied;
 			}
 		}
 
@@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				rxrpc_put_call(continue_call);
 				_leave(" = %d [noncont]", copied);
-				return copied;
+				goto out_copied;
 			}
 		}
 
@@ -251,9 +252,10 @@ out:
 		rxrpc_put_call(call);
 	if (continue_call)
 		rxrpc_put_call(continue_call);
+	_leave(" = %d [data]", copied);
+out_copied:
 	if (timeop)
 		*timeop = timeo;
-	_leave(" = %d [data]", copied);
 	return copied;
 
 	/* handle non-DATA messages such as aborts, incoming connections and
@@ -330,7 +332,8 @@ terminal_message:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 copy_error:
 	_debug("copy error");
@@ -339,7 +342,8 @@ copy_error:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 wait_interrupted:
 	ret = sock_intr_errno(timeo);
@@ -350,8 +354,7 @@ wait_error:
 	if (copied)
 		copied = ret;
 	_leave(" = %d [waitfail %d]", copied, ret);
-	return copied;
-
+	goto out_copied;
 }
 
 /**
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d5d3f9b42bca..d05161a168bc 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6548,11 +6548,8 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 			skb = skb_dequeue(&sk->sk_receive_queue);
 		}
 
-		if (skb) {
-			if (timeop)
-				*timeop = timeo;
-			return skb;
-		}
+		if (skb)
+			break;
 
 		/* Caller is allowed not to check sk->sk_err before calling. */
 		error = sock_error(sk);
@@ -6572,11 +6569,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 			goto no_packet;
 	} while (sctp_wait_for_packet(sk, err, &timeo) == 0);
 
-	return NULL;
+out:
+	if (timeop)
+		*timeop = timeo;
+
+	return skb;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 
 /* If sndbuf has changed, wake up per association sndbuf waiters.  */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 721904c37359..3203defdb503 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1926,7 +1926,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int check_creds = 0;
 	int target;
 	int err = 0;
-	long timeo;
+	long timeo = sock_rcvtimeop(sk, timeop, noblock);
 	int skip;
 
 	err = -EINVAL;
@@ -1938,7 +1938,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	/* Lock the socket to prevent queue disordering
 	 * while sleeps in memcpy_tomsg
@@ -2070,9 +2069,9 @@ again:
 
 	mutex_unlock(&u->readlock);
 	scm_recv(sock, msg, siocb->scm, flags);
+out:
 	if (timeop)
 		*timeop = timeo;
-out:
 	return copied ? : err;
 }
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2e784d976133..73957d47dac7 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1653,7 +1653,7 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 	int err;
 	size_t target;
 	ssize_t copied;
-	long timeout;
+	long timeout = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 	struct vsock_transport_recv_notify_data recv_data;
 
 	DEFINE_WAIT(wait);
@@ -1711,7 +1711,6 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 		err = -ENOMEM;
 		goto out;
 	}
-	timeout = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 	copied = 0;
 
 	err = transport->notify_recv_init(vsk, target, &recv_data);
@@ -1820,9 +1819,9 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 
 out_wait:
 	finish_wait(sk_sleep(sk), &wait);
+out:
 	if (timeop)
 		*timeop = timeout;
-out:
 	release_sock(sk);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-26 21:17           ` Arnaldo Carvalho de Melo
@ 2014-05-27 16:35             ` Michael Kerrisk (man-pages)
  2014-05-27 19:21               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-27 16:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Hi Arnaldo,

On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 26, 2014 at 10:46:47AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Thu, May 22, 2014 at 04:27:45PM +0200, Michael Kerrisk (man-pages) escreveu:
>>> Thanks! I applied this patch against 3.15-rc6.
> 
>>> recvmmsg() now (mostly) does what I expect: 
>>> * it waits until either the timeout expires or vlen messages 
>>>   have been received
>>> * If no message is received before timeout, it returns -1/EAGAIN.
>>> * If vlen messages are received before the timeout expires, then
>>>   the remaining time is returned in timeout.
> 
>>> One question: in the event that the call is interrupted by a signal 
>>> handler, it fails (as expected) with EINTR, but the 'timeout' value is 
>>> not updated with the remaining time on the timer. Would it be desirable 
>>> to emulate the behavior of select() (and other syscalls) in this 
>>> respect, and instead return the remaining time if interrupted by 
>>> a signal?
>  
>> I think so, will check how to achieve that!
> 
> Can you try the attached patch on top of the first one?

Patches on patches is a way to make your testers work unnecessarily
harder. Also, it means that anyone else who was interested in this
thread likely got lost at this point, because they probably didn't 
save the first patch. All of this to say: it makes life much easier 
if you provide a complete new self-contained patch on each iteration.

> It starts adding explicit parentheses on a ternary, as David requested,
> and then should return the remaining timeouts in cases like signals,
> etc.
> 
> Please let me know if this is enough.

Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

> P.S. compile testing while sending this message :-)

Okay -- how about some real testing for the next version ;-). I've appended
my test program below. You can use it as follows:

./t_recvmmsg <port> <timeout-in-secs> <bufsize>...

(The timeout can also be '-' meaning use NULL as the timeout argument.)

Cheers,

Michael


/* t_recvmmsg.c

   A simple test program for the Linux-specific recvmmsg() system call.
*/
#define _GNU_SOURCE
#include <sys/time.h>
#include <signal.h>
#include <sys/socket.h>
#include <netdb.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>


#define errExit(msg) 	do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)


static int		/* Public interfaces: inetBind() and inetListen() */
createBoundSocket(const char *service, int type, socklen_t *addrlen)
{
    struct addrinfo hints;
    struct addrinfo *result, *rp;
    int sfd, optval, s;

    memset(&hints, 0, sizeof(struct addrinfo));
    hints.ai_canonname = NULL;
    hints.ai_addr = NULL;
    hints.ai_next = NULL;
    hints.ai_socktype = type;
    hints.ai_family = AF_UNSPEC;	/* Allows IPv4 or IPv6 */
    hints.ai_flags = AI_PASSIVE;	/* Use wildcard IP address */

    s = getaddrinfo(NULL, service, &hints, &result);
    if (s != 0)
        return -1;

    /* Walk through returned list until we find an address structure
       that can be used to successfully create and bind a socket */

    optval = 1;
    for (rp = result; rp != NULL; rp = rp->ai_next) {
        sfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
        if (sfd == -1)
            continue;			/* On error, try next address */

        if (bind(sfd, rp->ai_addr, rp->ai_addrlen) == 0)
            break;			/* Success */

        /* bind() failed: close this socket and try next address */

        close(sfd);
    }

    if (rp != NULL && addrlen != NULL)
        *addrlen = rp->ai_addrlen;  	/* Return address structure size */

    freeaddrinfo(result);

    return (rp == NULL) ? -1 : sfd;
}


static void
handler()
{
    /* Just interrupt a syscall */
}


int
main(int argc, char *argv[])
{
    int sfd, vlen, j, s;
    struct mmsghdr *msgvecp;
    struct timespec ts;
    struct timespec *tsp;
    struct sigaction sa;

    if (argc < 4) {
        fprintf(stderr, "Usage: %s port tmo-secs buf-len...\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    sfd = createBoundSocket(argv[1], SOCK_DGRAM, NULL);
    if (sfd == -1) {
        fprintf(stderr, "Could not create server socket (%s)",
                strerror(errno));
        exit(EXIT_FAILURE);
    }

    /* Handle a signal, so we can test behaviour when recvmmsg()
       is interrupted by a signal */
    
    sa.sa_handler = handler;
    sa.sa_flags = 0;
    sigemptyset(&sa.sa_mask);
    if (sigaction (SIGQUIT, &sa, NULL) == -1)
        errExit("sigaction");

    /* argv[2] specifies recvmmsg() timeout in seconds, or is '-', meaning
       using NULL argument to get infinite timeout */

    if (argv[2][0] == '-') {
        tsp = NULL;

    } else {
        ts.tv_sec = atoi(argv[2]);
        ts.tv_nsec = 0;
        tsp = &ts;
    }

    /* Remaining command-line arguments specify the size of recvmmsg()
       buffers */

    /* The second argument to recvmmsg() is a pointer to an array of
       mmsghdr structures. Each element of that array has a field, 
       'struct msghdr msg_hdr', that is used to store information from a
       single received datagram. Among the fields in the msghdr structure
       is a 'struct iovec *msg_iov'--that is, a pointer to a scatter/gather
       I/O vector. To keep things simple for this example, our scatter/gather
       vectors always consists of a single element.
    */

    /* Allocate the mmssghdr vector, whose size corresponds to the
       number of remaining command-line arguments */

    vlen = argc - 3;

    msgvecp = calloc(vlen, sizeof(struct mmsghdr));
    if (msgvecp == NULL)
        errExit("calloc");

    for (j = 0; j < vlen; j++) {
        msgvecp[j].msg_hdr.msg_name = NULL;
        msgvecp[j].msg_hdr.msg_namelen = 0;
        msgvecp[j].msg_hdr.msg_control = NULL;
        msgvecp[j].msg_hdr.msg_controllen = 0;

        /* Allocate an iovec for this mmsghdr element. The vector
           contains just a single item. */

        msgvecp[j].msg_hdr.msg_iovlen = 1;

        msgvecp[j].msg_hdr.msg_iov = malloc(sizeof(struct iovec));
        if (msgvecp[j].msg_hdr.msg_iov == NULL)
            errExit("malloc");

        /* The single iovec element contains a pointer to a buffer
           that is sized according to the number given in the 
           corresponding command-line argument */

        s = atoi(argv[j + 3]);
        msgvecp[j].msg_hdr.msg_iov[0].iov_len = s;
        msgvecp[j].msg_hdr.msg_iov[0].iov_base = malloc(s);
    }

    if (tsp != NULL)
        printf("Timespec before call = %ld.%09ld\n",
                (long) tsp->tv_sec, (long) tsp->tv_nsec);

    /* Now we're ready to make the recvmmsg() call */

    s = recvmmsg(sfd, msgvecp, vlen, 0, tsp);
    if (s == -1) {
        if (errno == EINTR)
            printf("EINTR! (interrupted system call)\n");
        else
            errExit("recvmmsg");
    } 

    printf("recvmmsg() returned %d\n", s);

    if (tsp != NULL)
        printf("Timespec after call = %ld.%09ld\n",
                (long) tsp->tv_sec, (long) tsp->tv_nsec);

    /* Display datagrams retrieved by recvmmsg() */

    for (j = 0; j < s; j++) {
        printf("%d: %u - %.*s\n", j,
                msgvecp[j].msg_len, msgvecp[j].msg_len,
                (char *) msgvecp[j].msg_hdr.msg_iov[0].iov_base);
    }

    exit(EXIT_SUCCESS);
}

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 16:35             ` Michael Kerrisk (man-pages)
@ 2014-05-27 19:21               ` Arnaldo Carvalho de Melo
  2014-05-27 19:22                 ` Arnaldo Carvalho de Melo
  2014-05-27 19:28                 ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-27 19:21 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> > Can you try the attached patch on top of the first one?
 
> Patches on patches is a way to make your testers work unnecessarily
> harder. Also, it means that anyone else who was interested in this

It was meant to highlight the changes with regard to the previous patch,
i.e. to make things easier for reviewing.

> thread likely got lost at this point, because they probably didn't 
> save the first patch. All of this to say: it makes life much easier 
> if you provide a complete new self-contained patch on each iteration.

If you prefer it that way, find one attached, that I was about to send
(but you can wait till I use your program to test it ;-) )
 
> > It starts adding explicit parentheses on a ternary, as David requested,
> > and then should return the remaining timeouts in cases like signals,
> > etc.
> > 
> > Please let me know if this is enough.
> 
> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)

What was the problem experienced?
 
> > P.S. compile testing while sending this message :-)
> 
> Okay -- how about some real testing for the next version ;-). I've appended

Hey, you were provinding that real testing! thanks for that! :-)

> my test program below. You can use it as follows:
> 
> ./t_recvmmsg <port> <timeout-in-secs> <bufsize>...
> 
> (The timeout can also be '-' meaning use NULL as the timeout argument.)

Thanks for the test proggie, will use it.
 
> Cheers,
> 
> Michael

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 19:21               ` Arnaldo Carvalho de Melo
@ 2014-05-27 19:22                 ` Arnaldo Carvalho de Melo
  2014-05-27 19:28                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-27 19:22 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

Em Tue, May 27, 2014 at 04:21:15PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
> > On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> > > Can you try the attached patch on top of the first one?
>  
> > Patches on patches is a way to make your testers work unnecessarily
> > harder. Also, it means that anyone else who was interested in this
> 
> It was meant to highlight the changes with regard to the previous patch,
> i.e. to make things easier for reviewing.
> 
> > thread likely got lost at this point, because they probably didn't 
> > save the first patch. All of this to say: it makes life much easier 
> > if you provide a complete new self-contained patch on each iteration.
> 
> If you prefer it that way, find one attached, that I was about to send
> (but you can wait till I use your program to test it ;-) )

Really attached this time ;-\

- Arnaldo

[-- Attachment #2: recvmmsg-timeout-v2.patch --]
[-- Type: text/plain, Size: 77757 bytes --]

commit 370d6ac73d53c582086f959d659e2e021f58d122
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Tue May 20 15:36:40 2014 -0300

    net: Fix recvmmsg timeout handling
    
    As reported by Elie de Brauwer the timeout handling in the recvmmsg
    syscall had issues that boil down to it not properly passing the
    remaining time to each underlying recvmsg() call.
    
    Fix it by adding a timeout pointer to the recvmsg implementations, so
    that it can use that in a variation of sock_rcvtimeo() that overrides
    the value in SO_RCVTIMEO with the timeout passed and returns the
    remaining time in that pointer, this way each underlying recvmsg call
    receives the remaining time.
    
    It ends up in most cases being just a forward of this pointer from the
    per protocol recvmsg() implementations to skb_recv_datagram().
    
    Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
    Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
    Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Caitlin Bestler <caitlin.bestler@gmail.com>
    Cc: Chris Friesen <chris.friesen@windriver.com>
    Cc: Elie De Brauwer <eliedebrauwer@gmail.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Neil Horman <nhorman@tuxdriver.com>
    Cc: Ondřej Bílka <neleai@seznam.cz>
    Cc: Paul Moore <paul@paul-moore.com>
    Cc: Rémi Denis-Courmont <remi@remlab.net>
    Cc: Steven Whitehouse <steve@chygwyn.com>
    Link: http://lkml.kernel.org/n/net-next-c99v9e01bp1galgfvp4bagu5@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 850246206b12..e5d36f815083 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -151,7 +151,7 @@ unlock:
 }
 
 static int hash_recvmsg(struct kiocb *unused, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags)
+			struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a19c027b29bd..4bde01591174 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -419,7 +419,7 @@ unlock:
 }
 
 static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock,
-			    struct msghdr *msg, size_t ignored, int flags)
+			    struct msghdr *msg, size_t ignored, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be82284cf9d..254515f71793 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -113,7 +113,7 @@ mISDN_sock_cmsg(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
 
 static int
 mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-		   struct msghdr *msg, size_t len, int flags)
+		   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sk_buff		*skb;
 	struct sock		*sk = sock->sk;
@@ -130,7 +130,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == MISDN_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3381c4f91a8c..13d12ef322f2 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1111,7 +1111,7 @@ static int macvtap_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int macvtap_recvmsg(struct kiocb *iocb, struct socket *sock,
 			   struct msghdr *m, size_t total_len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct macvtap_queue *q = container_of(sock, struct macvtap_queue, sock);
 	int ret;
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 2ea7efd11857..30194c6e3fe8 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -963,7 +963,7 @@ static const struct ppp_channel_ops pppoe_chan_ops = {
 };
 
 static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *m, size_t total_len, int flags)
+		  struct msghdr *m, size_t total_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -975,7 +975,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &error);
+				flags & MSG_DONTWAIT, &error, timeop);
 	if (error < 0)
 		goto end;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 98bad1fb1bfb..6a41a841e7e8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1327,7 +1327,7 @@ done:
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
-			   const struct iovec *iv, ssize_t len, int noblock)
+			   const struct iovec *iv, ssize_t len, int noblock, long *timeop)
 {
 	struct sk_buff *skb;
 	ssize_t ret = 0;
@@ -1343,7 +1343,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 
 	/* Read frames from queue */
 	skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (skb) {
 		ret = tun_put_user(tun, tfile, skb, iv, len);
 		kfree_skb(skb);
@@ -1370,7 +1370,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	}
 
 	ret = tun_do_read(tun, tfile, iv, len,
-			  file->f_flags & O_NONBLOCK);
+			  file->f_flags & O_NONBLOCK, NULL);
 	ret = min_t(ssize_t, ret, len);
 	if (ret > 0)
 		iocb->ki_pos = ret;
@@ -1452,7 +1452,7 @@ static int tun_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = __tun_get(tfile);
@@ -1471,7 +1471,7 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 	}
 	ret = tun_do_read(tun, tfile, m->msg_iov, total_len,
-			  flags & MSG_DONTWAIT);
+			  flags & MSG_DONTWAIT, timeop);
 	if (ret > total_len) {
 		m->msg_flags |= MSG_TRUNC;
 		ret = flags & MSG_TRUNC ? ret : total_len;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index be414d2b2b22..46a706378d79 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -601,7 +601,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(headcount > UIO_MAXIOV)) {
 			msg.msg_iovlen = 1;
 			err = sock->ops->recvmsg(NULL, sock, &msg,
-						 1, MSG_DONTWAIT | MSG_TRUNC);
+						 1, MSG_DONTWAIT | MSG_TRUNC, NULL);
 			pr_debug("Discarded rx packet: len %zd\n", sock_len);
 			continue;
 		}
@@ -627,7 +627,7 @@ static void handle_rx(struct vhost_net *net)
 			copy_iovec_hdr(vq->iov, nvq->hdr, sock_hlen, in);
 		msg.msg_iovlen = in;
 		err = sock->ops->recvmsg(NULL, sock, &msg,
-					 sock_len, MSG_DONTWAIT | MSG_TRUNC);
+					 sock_len, MSG_DONTWAIT | MSG_TRUNC, NULL);
 		/* Userspace might have consumed the packet meanwhile:
 		 * it's not supposed to do this usually, but might be hard
 		 * to prevent. Discard data we got (if any) and keep going. */
diff --git a/include/linux/net.h b/include/linux/net.h
index 17d83393afcc..f908cdd8cdd3 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -171,10 +171,13 @@ struct proto_ops {
 	 * returning uninitialized memory to user space.  The recvfrom
 	 * handlers can assume that msg.msg_name is either NULL or has
 	 * a minimum size of sizeof(struct sockaddr_storage).
+	 * timeop contains a per call timeout (as opposed as per socket,
+	 * used by recvmmsg, set it to NULL to disable it. It should return
+	 * the remaining time, if not NULL, even when interrupted by a signal.
 	 */
 	int		(*recvmsg)   (struct kiocb *iocb, struct socket *sock,
 				      struct msghdr *m, size_t total_len,
-				      int flags);
+				      int flags, long *timeop);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
@@ -215,7 +218,7 @@ int sock_create_lite(int family, int type, int proto, struct socket **res);
 void sock_release(struct socket *sock);
 int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
-		 int flags);
+		 int flags, long *timeop);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
 struct socket *sockfd_lookup(int fd, int *err);
 struct socket *sock_from_file(struct file *file, int *err);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7a9beeb1c458..cdfdd1bd6358 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2479,9 +2479,9 @@ static inline void skb_frag_add_head(struct sk_buff *skb, struct sk_buff *frag)
 	for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next)
 
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
-				    int *peeked, int *off, int *err);
+				    int *peeked, int *off, int *err, long *timeop);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
-				  int *err);
+				  int *err, long *timeop);
 unsigned int datagram_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait);
 int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 428277869400..6c007bd57f39 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -101,7 +101,7 @@ struct vsock_transport {
 	/* DGRAM. */
 	int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
 	int (*dgram_dequeue)(struct kiocb *kiocb, struct vsock_sock *vsk,
-			     struct msghdr *msg, size_t len, int flags);
+			     struct msghdr *msg, size_t len, int flags, long *timeop);
 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
 			     struct iovec *, size_t len);
 	bool (*dgram_allow)(u32 cid, u32 port);
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index 904777c1cd24..ee75e6875aab 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -246,9 +246,9 @@ void bt_sock_unregister(int proto);
 void bt_sock_link(struct bt_sock_list *l, struct sock *s);
 void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
 int  bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags);
+				struct msghdr *msg, size_t len, int flags, long *timeop);
 int  bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags);
+			struct msghdr *msg, size_t len, int flags, long *timeop);
 uint bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index fe7994c48b75..f80071949b98 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -26,7 +26,7 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 		      size_t size, int flags);
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags);
+		 size_t size, int flags, long *timeop);
 int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
diff --git a/include/net/ping.h b/include/net/ping.h
index 026479b61a2d..c259ba72c811 100644
--- a/include/net/ping.h
+++ b/include/net/ping.h
@@ -76,7 +76,7 @@ int  ping_getfrag(void *from, char *to, int offset, int fraglen, int odd,
 		  struct sk_buff *);
 
 int  ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int  ping_common_sendmsg(int family, struct msghdr *msg, size_t len,
 			 void *user_icmph, size_t icmph_len);
 int  ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
diff --git a/include/net/sock.h b/include/net/sock.h
index 07b7fcd60d80..c48f61c79801 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -961,7 +961,7 @@ struct proto {
 	int			(*recvmsg)(struct kiocb *iocb, struct sock *sk,
 					   struct msghdr *msg,
 					   size_t len, int noblock, int flags,
-					   int *addr_len);
+					   int *addr_len, long *timeop);
 	int			(*sendpage)(struct sock *sk, struct page *page,
 					int offset, size_t size, int flags);
 	int			(*bind)(struct sock *sk,
@@ -1593,7 +1593,7 @@ int sock_no_getsockopt(struct socket *, int , int, char __user *, int __user *);
 int sock_no_setsockopt(struct socket *, int, int, char __user *, unsigned int);
 int sock_no_sendmsg(struct kiocb *, struct socket *, struct msghdr *, size_t);
 int sock_no_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		    int);
+		    int, long *);
 int sock_no_mmap(struct file *file, struct socket *sock,
 		 struct vm_area_struct *vma);
 ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
@@ -1606,7 +1606,7 @@ ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
 int sock_common_getsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, int __user *optlen);
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags);
+			       struct msghdr *msg, size_t size, int flags, long *timeop);
 int sock_common_setsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, unsigned int optlen);
 int compat_sock_common_getsockopt(struct socket *sock, int level,
@@ -2104,6 +2104,11 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
 	return noblock ? 0 : sk->sk_rcvtimeo;
 }
 
+static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock)
+{
+	return noblock ? 0 : (timeop ? *timeop : sk->sk_rcvtimeo);
+}
+
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
 {
 	return noblock ? 0 : sk->sk_sndtimeo;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e80abe4486cb..60b72ae2cdda 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -437,7 +437,7 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname,
 void tcp_set_keepalive(struct sock *sk, int val);
 void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req);
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len);
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop);
 void tcp_parse_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 01a1082e02b3..11893a32a3c0 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1732,7 +1732,7 @@ out:
 }
 
 static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-			 size_t size, int flags)
+			 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ddpehdr *ddp;
@@ -1742,7 +1742,7 @@ static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
 	struct sk_buff *skb;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-						flags & MSG_DONTWAIT, &err);
+						flags & MSG_DONTWAIT, &err, timeop);
 	lock_sock(sk);
 
 	if (!skb)
diff --git a/net/atm/common.c b/net/atm/common.c
index 7b491006eaf4..8def66eaed87 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -524,7 +524,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci)
 }
 
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags)
+		size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct atm_vcc *vcc;
@@ -544,7 +544,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	    !test_bit(ATM_VF_READY, &vcc->flags))
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/atm/common.h b/net/atm/common.h
index cc3c2dae4d79..b370ffd78a39 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -14,7 +14,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family);
 int vcc_release(struct socket *sock);
 int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags);
+		size_t size, int flags, long *timeop);
 int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 		size_t total_len);
 unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f48fc0f..ee0411920216 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1600,7 +1600,7 @@ out:
 }
 
 static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1619,7 +1619,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* Now we can treat all alike */
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 2021c481cdb6..4896bd954293 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -209,7 +209,7 @@ struct sock *bt_accept_dequeue(struct sock *parent, struct socket *newsock)
 EXPORT_SYMBOL(bt_accept_dequeue);
 
 int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags)
+				struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -222,7 +222,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
@@ -282,7 +282,7 @@ static long bt_sock_data_wait(struct sock *sk, long timeo)
 }
 
 int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int err = 0;
@@ -297,7 +297,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	lock_sock(sk);
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo  = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo  = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		struct sk_buff *skb;
@@ -381,6 +381,8 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	} while (size);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return copied ? : err;
 }
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index f608bffdb8b9..f24413835e2c 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -829,7 +829,7 @@ static void hci_sock_cmsg(struct sock *sk, struct msghdr *msg,
 }
 
 static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -844,7 +844,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == BT_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index ef5e5b04f34f..19a90e0d8172 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -976,7 +976,7 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t len, int flags)
+			      struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct l2cap_pinfo *pi = l2cap_pi(sk);
@@ -1003,9 +1003,9 @@ static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	release_sock(sk);
 
 	if (sock->type == SOCK_STREAM)
-		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags, timeop);
 	else
-		err = bt_sock_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 
 	if (pi->chan->mode != L2CAP_MODE_ERTM)
 		return err;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index c603a5eb4720..a3cbf8c4daf5 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -617,7 +617,7 @@ done:
 }
 
 static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rfcomm_dlc *d = rfcomm_pi(sk)->dlc;
@@ -628,7 +628,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return 0;
 	}
 
-	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags);
+	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags, timeop);
 
 	lock_sock(sk);
 	if (!(flags & MSG_PEEK) && len > 0)
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index c06dbd3938e8..bfaa16bdc366 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -700,7 +700,7 @@ static void sco_conn_defer_accept(struct hci_conn *conn, u16 setting)
 }
 
 static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sco_pinfo *pi = sco_pi(sk);
@@ -718,7 +718,7 @@ static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	release_sock(sk);
 
-	return bt_sock_recvmsg(iocb, sock, msg, len, flags);
+	return bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 }
 
 static int sco_sock_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index e8437094d15f..069eb2ffde29 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -272,7 +272,7 @@ static void caif_check_flow_release(struct sock *sk)
  * changed locking, address handling and added MSG_TRUNC.
  */
 static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *m, size_t len, int flags)
+			       struct msghdr *m, size_t len, int flags, long *timeop)
 
 {
 	struct sock *sk = sock->sk;
@@ -284,7 +284,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (m->msg_flags&MSG_OOB)
 		goto read_error;
 
-	skb = skb_recv_datagram(sk, flags, 0 , &ret);
+	skb = skb_recv_datagram(sk, flags, 0 , &ret, timeop);
 	if (!skb)
 		goto read_error;
 	copylen = skb->len;
@@ -345,7 +345,7 @@ static long caif_stream_data_wait(struct sock *sk, long timeo)
  */
 static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int copied = 0;
@@ -367,7 +367,7 @@ static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	caif_read_lock(sk);
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, flags&MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags&MSG_DONTWAIT);
 
 	do {
 		int chunk;
@@ -450,6 +450,8 @@ unlock:
 	caif_read_unlock(sk);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return copied ? : err;
 }
 
diff --git a/net/can/bcm.c b/net/can/bcm.c
index dcb75c0e66c1..dc12c80ec5cd 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1541,7 +1541,7 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
 }
 
 static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1551,7 +1551,7 @@ static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
-	skb = skb_recv_datagram(sk, flags, noblock, &error);
+	skb = skb_recv_datagram(sk, flags, noblock, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/can/raw.c b/net/can/raw.c
index 081e81fd017f..0a4aa9d98e5e 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -731,7 +731,7 @@ send_failed:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -741,7 +741,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a16ed7bbe376..0dd1715374fa 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -138,6 +138,9 @@ out_noerr:
  *	@off: an offset in bytes to peek skb from. Returns an offset
  *	      within an skb where data actually starts
  *	@err: error code returned
+ *	@timeop: per call timeout (as opposed as per socket via SO_RCVTIMEO),
+ *		 will return the remaining time, used in recvmmsg, ignored
+ *		 if set to NULL.
  *
  *	Get a datagram skbuff, understands the peeking, nonblocking wakeups
  *	and possible races. This replaces identical code in packet, raw and
@@ -162,7 +165,7 @@ out_noerr:
  *	the standard around please.
  */
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
-				    int *peeked, int *off, int *err)
+				    int *peeked, int *off, int *err, long *timeop)
 {
 	struct sk_buff *skb, *last;
 	long timeo;
@@ -174,7 +177,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 	if (error)
 		goto no_packet;
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		/* Again only user level code calls this function, so nothing
@@ -205,6 +208,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
+			if (timeop)
+				*timeop = timeo;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
@@ -219,22 +224,24 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			goto no_packet;
 
 	} while (!wait_for_more_packets(sk, err, &timeo, last));
-
+out:
+	if (timeop)
+		*timeop = timeo;
 	return NULL;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 EXPORT_SYMBOL(__skb_recv_datagram);
 
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags,
-				  int noblock, int *err)
+				  int noblock, int *err, long *timeop)
 {
 	int peeked, off = 0;
 
 	return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				   &peeked, &off, err);
+				   &peeked, &off, err, timeop);
 }
 EXPORT_SYMBOL(skb_recv_datagram);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 026e01f70274..b462e38785af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2191,7 +2191,7 @@ int sock_no_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 EXPORT_SYMBOL(sock_no_sendmsg);
 
 int sock_no_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
-		    size_t len, int flags)
+		    size_t len, int flags, long *timeop)
 {
 	return -EOPNOTSUPP;
 }
@@ -2577,14 +2577,14 @@ EXPORT_SYMBOL(compat_sock_common_getsockopt);
 #endif
 
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
 	int err;
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index c67816647cce..fbf4cc113ffe 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -314,7 +314,7 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		 size_t size);
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		 struct msghdr *msg, size_t len, int nonblock, int flags,
-		 int *addr_len);
+		 int *addr_len, long *timeop);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
 unsigned int dccp_poll(struct file *file, struct socket *sock,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index de2c1e719305..92ae3d37c7f0 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -808,7 +808,7 @@ out_discard:
 EXPORT_SYMBOL_GPL(dccp_sendmsg);
 
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int nonblock, int flags, int *addr_len)
+		 size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	const struct dccp_hdr *dh;
 	long timeo;
@@ -820,7 +820,7 @@ int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	do {
 		struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
@@ -910,6 +910,8 @@ verify_sock_status:
 	} while (1);
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return len;
 }
 
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index ae011b46c071..86dfcbe505de 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1669,7 +1669,7 @@ static int dn_data_ready(struct sock *sk, struct sk_buff_head *q, int flags, int
 
 
 static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
@@ -1680,7 +1680,7 @@ static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
 	struct sk_buff *skb, *n;
 	struct dn_skb_cb *cb = NULL;
 	unsigned char eor = 0;
-	long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	long timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	lock_sock(sk);
 
@@ -1814,7 +1814,8 @@ out:
 	}
 
 	release_sock(sk);
-
+	if (timeop)
+		*timeop = timeo;
 	return rv;
 }
 
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 4f0ed8780194..dd7de8959d07 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -305,14 +305,14 @@ out:
 
 static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
 		struct msghdr *msg, size_t len, int noblock, int flags,
-		int *addr_len)
+		int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 	DECLARE_SOCKADDR(struct sockaddr_ieee802154 *, saddr, msg->msg_name);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 74d54fae33d7..0303aa66a9e2 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -179,13 +179,13 @@ out:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0e9bb08a91e4..5e0a9dc931e6 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -757,7 +757,7 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 EXPORT_SYMBOL(inet_sendpage);
 
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
@@ -766,7 +766,7 @@ int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	sock_rps_record_flow(sk);
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 044a0ddf6a79..791be60b38f1 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -840,7 +840,7 @@ do_confirm:
 }
 
 int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int noblock, int flags, int *addr_len)
+		 size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *isk = inet_sk(sk);
 	int family = sk->sk_family;
@@ -864,7 +864,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a9dbe58bdfe7..32aee7472bb3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -685,7 +685,7 @@ out:	return ret;
  */
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -701,7 +701,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index eb1dde37e678..bc506ffbc8d0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1601,7 +1601,7 @@ EXPORT_SYMBOL(tcp_read_sock);
  */
 
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len)
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int copied = 0;
@@ -1626,7 +1626,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	/* Urgent data needs to be handled specially. */
 	if (flags & MSG_OOB)
@@ -1993,20 +1993,18 @@ skip_copy:
 
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
-
-	release_sock(sk);
-	return copied;
-
 out:
 	release_sock(sk);
-	return err;
+	if (timeop)
+		*timeop = timeo;
+	return copied;
 
 recv_urg:
-	err = tcp_recv_urg(sk, msg, len, flags);
+	copied = tcp_recv_urg(sk, msg, len, flags);
 	goto out;
 
 recv_sndq:
-	err = tcp_peek_sndq(sk, msg, len);
+	copied = tcp_peek_sndq(sk, msg, len);
 	goto out;
 }
 EXPORT_SYMBOL(tcp_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e07d52b8617a..039ac25be82f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1224,7 +1224,7 @@ EXPORT_SYMBOL(udp_ioctl);
  */
 
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len)
+		size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
@@ -1240,7 +1240,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index f3c27899f62b..a39aa9996b72 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -22,7 +22,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
 			  char __user *optval, int __user *optlen);
 #endif
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len);
+		size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 		 int flags);
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index b2dc60b0c764..1d267f89eb71 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -458,7 +458,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
 
 static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
@@ -475,7 +475,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
 		return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 60325236446a..d1f1b63cfbcd 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -380,7 +380,7 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct inet_sock *inet = inet_sk(sk);
@@ -400,7 +400,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index c779c3c90b9d..cd414d719977 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -26,7 +26,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, int optname,
 int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		  size_t len);
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 void udpv6_destroy_sock(struct sock *sk);
 
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 91729b807c7d..4964c1e0ab03 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1756,7 +1756,7 @@ out:
 
 
 static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
-		struct msghdr *msg, size_t size, int flags)
+		struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ipx_sock *ipxs = ipx_sk(sk);
@@ -1791,7 +1791,7 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &rc);
+				flags & MSG_DONTWAIT, &rc, timeop);
 	if (!skb) {
 		if (rc == -EAGAIN && (sk->sk_shutdown & RCV_SHUTDOWN))
 			rc = 0;
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 54747c25c86c..0991da69f39d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1373,7 +1373,7 @@ out:
  *    after being read, regardless of how much the user actually read
  */
 static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t size, int flags)
+			      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1384,7 +1384,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
 	IRDA_DEBUG(4, "%s()\n", __func__);
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
@@ -1422,7 +1422,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
  * Function irda_recvmsg_stream (iocb, sock, msg, size, flags)
  */
 static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1445,7 +1445,7 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	do {
 		int chunk;
@@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 			finish_wait(sk_sleep(sk), &wait);
 
-			if (err)
-				return err;
+			if (err) {
+				copied = err;
+				break;
+			}
 			if (sk->sk_shutdown & RCV_SHUTDOWN)
 				break;
 
@@ -1534,6 +1536,8 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 		}
 	}
 
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 }
 
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 8c9d7302c846..0714cf592dc2 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1314,7 +1314,7 @@ static void iucv_process_message_q(struct sock *sk)
 }
 
 static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -1335,7 +1335,7 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* receive/dequeue next skb:
 	 * the function understands MSG_PEEK and, thus, does not dequeue skb */
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index b47f8e542aae..925b45078c03 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3635,7 +3635,7 @@ out:
 
 static int pfkey_recvmsg(struct kiocb *kiocb,
 			 struct socket *sock, struct msghdr *msg, size_t len,
-			 int flags)
+			 int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct pfkey_sock *pfk = pfkey_sk(sk);
@@ -3646,7 +3646,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
 	if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT))
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 369a9822488c..4347233855cb 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -507,7 +507,7 @@ no_route:
 }
 
 static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-			   size_t len, int noblock, int flags, int *addr_len)
+			   size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -518,7 +518,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 	if (flags & MSG_OOB)
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index f3f98a156cee..6c839ba9d299 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -645,7 +645,7 @@ do_confirm:
 
 static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			    struct msghdr *msg, size_t len, int noblock,
-			    int flags, int *addr_len)
+			    int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
@@ -662,7 +662,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (flags & MSG_ERRQUEUE)
 		return ipv6_recv_error(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 950909f04ee6..9e6db6946e4f 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -187,7 +187,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff *skb)
  */
 static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 			    struct msghdr *msg, size_t len,
-			    int flags)
+			    int flags, long *timeop)
 {
 	int err;
 	struct sk_buff *skb;
@@ -199,7 +199,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		goto end;
 
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 0080d2b0a8ae..b5edf838f9fa 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -705,7 +705,7 @@ out:
  *	Returns non-negative upon success, negative otherwise.
  */
 static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	DECLARE_SOCKADDR(struct sockaddr_llc *, uaddr, msg->msg_name);
 	const int nonblock = flags & MSG_DONTWAIT;
@@ -725,7 +725,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	seq = &llc->copied_seq;
 	if (flags & MSG_PEEK) {
@@ -851,6 +851,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 copy_uaddr:
 	if (uaddr != NULL && skb != NULL) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e0ccd84d4d67..d0b39b90d41a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2399,7 +2399,7 @@ out:
 
 static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			   struct msghdr *msg, size_t len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
 	struct scm_cookie scm;
@@ -2415,7 +2415,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 
 	copied = 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ede50d197e10..4a9078e2bf7a 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1134,7 +1134,7 @@ out:
 }
 
 static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
-		      struct msghdr *msg, size_t size, int flags)
+		      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	DECLARE_SOCKADDR(struct sockaddr_ax25 *, sax, msg->msg_name);
@@ -1154,7 +1154,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL) {
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL) {
 		release_sock(sk);
 		return er;
 	}
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a92fa9..0b233d1f1a57 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -794,7 +794,7 @@ static int llcp_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -817,7 +817,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		pr_err("Recv datagram failed state %d %d %d",
 		       sk->sk_state, err, sock_error(sk));
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index c27a6e86cae4..665d9523ce5c 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -228,7 +228,7 @@ static int rawsock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			   struct msghdr *msg, size_t len, int flags)
+			   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -238,7 +238,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	pr_debug("sock=%p sk=%p len=%zu flags=%d\n", sock, sk, len, flags);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rc);
+	skb = skb_recv_datagram(sk, flags, noblock, &rc, timeop);
 	if (!skb)
 		return rc;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b85c67ccb797..f56d816340e2 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2852,7 +2852,7 @@ out:
  */
 
 static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -2884,7 +2884,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	 *	but then it will block.
 	 */
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 
 	/*
 	 *	An error occurred so return it. Because skb_recv_datagram()
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c0e6b4..77eff48eeb83 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -127,7 +127,7 @@ static int pn_sendmsg(struct kiocb *iocb, struct sock *sk,
 
 static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb = NULL;
 	struct sockaddr_pn sa;
@@ -138,7 +138,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			MSG_CMSG_COMPAT))
 		goto out_nofree;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rval);
+	skb = skb_recv_datagram(sk, flags, noblock, &rval, timeop);
 	if (skb == NULL)
 		goto out_nofree;
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547ea5177..c5832e1958f8 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -783,7 +783,7 @@ static struct sock *pep_sock_accept(struct sock *sk, int flags, int *errp)
 	u8 pipe_handle, enabled, n_sb;
 	u8 aligned = 0;
 
-	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp);
+	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp, NULL);
 	if (!skb)
 		return NULL;
 
@@ -1248,7 +1248,7 @@ struct sk_buff *pep_read(struct sock *sk)
 
 static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb;
 	int err;
@@ -1277,7 +1277,7 @@ static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			return -EINVAL;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	lock_sock(sk);
 	if (skb == NULL) {
 		if (err == -ENOTCONN && sk->sk_state == TCP_CLOSE_WAIT)
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 48f8ffc60f8f..e511e569bbc9 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -706,7 +706,7 @@ void rds_inc_put(struct rds_incoming *inc);
 void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 		       struct rds_incoming *inc, gfp_t gfp);
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags);
+		size_t size, int msg_flags, long *timeop);
 void rds_clear_recv_queue(struct rds_sock *rs);
 int rds_notify_queue_get(struct rds_sock *rs, struct msghdr *msg);
 void rds_inc_info_copy(struct rds_incoming *inc,
diff --git a/net/rds/recv.c b/net/rds/recv.c
index bd82522534fc..6223a4b0fded 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -396,7 +396,7 @@ static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg)
 }
 
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags)
+		size_t size, int msg_flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rds_sock *rs = rds_sk_to_rs(sk);
@@ -406,7 +406,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	struct rds_incoming *inc = NULL;
 
 	/* udp_recvmsg()->sock_recvtimeo() gets away without locking too.. */
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);
 
@@ -493,6 +493,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 		rds_inc_put(inc);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return ret;
 }
 
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 8451c8cdc9de..2cfc75a1cbbb 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1212,7 +1212,7 @@ static int rose_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 
 static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rose_sock *rose = rose_sk(sk);
@@ -1229,7 +1229,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return -ENOTCONN;
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL)
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL)
 		return er;
 
 	qbit = (skb->data[0] & ROSE_Q_BIT) == ROSE_Q_BIT;
diff --git a/net/rxrpc/ar-input.c b/net/rxrpc/ar-input.c
index 63b21e580de9..2319fae4b1f6 100644
--- a/net/rxrpc/ar-input.c
+++ b/net/rxrpc/ar-input.c
@@ -655,7 +655,7 @@ void rxrpc_data_ready(struct sock *sk)
 		return;
 	}
 
-	skb = skb_recv_datagram(sk, 0, 1, &ret);
+	skb = skb_recv_datagram(sk, 0, 1, &ret, NULL);
 	if (!skb) {
 		rxrpc_put_local(local);
 		if (ret == -EAGAIN)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ba9fd36d3f15..a21e51937e27 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -573,7 +573,7 @@ extern const struct file_operations rxrpc_connection_seq_fops;
  */
 void rxrpc_remove_user_ID(struct rxrpc_sock *, struct rxrpc_call *);
 int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		  int);
+		  int, long *);
 
 /*
  * ar-security.c
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e9aaa65c0778..e9082ed598cd 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -44,7 +44,7 @@ void rxrpc_remove_user_ID(struct rxrpc_sock *rx, struct rxrpc_call *call)
  *   simultaneously
  */
 int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *msg, size_t len, int flags)
+		  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct rxrpc_skb_priv *sp;
 	struct rxrpc_call *call = NULL, *continue_call = NULL;
@@ -63,7 +63,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	ullen = msg->msg_flags & MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
 
-	timeo = sock_rcvtimeo(&rx->sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(&rx->sk, timeop, flags & MSG_DONTWAIT);
 	msg->msg_flags |= MSG_MORE;
 
 	lock_sock(&rx->sk);
@@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				if (continue_call)
 					rxrpc_put_call(continue_call);
-				return -ENODATA;
+				copied = -ENODATA;
+				goto out_copied;
 			}
 		}
 
@@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				rxrpc_put_call(continue_call);
 				_leave(" = %d [noncont]", copied);
-				return copied;
+				goto out_copied;
 			}
 		}
 
@@ -252,6 +253,9 @@ out:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d [data]", copied);
+out_copied:
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 
 	/* handle non-DATA messages such as aborts, incoming connections and
@@ -328,7 +332,8 @@ terminal_message:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 copy_error:
 	_debug("copy error");
@@ -337,7 +342,8 @@ copy_error:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 wait_interrupted:
 	ret = sock_intr_errno(timeo);
@@ -348,8 +354,7 @@ wait_error:
 	if (copied)
 		copied = ret;
 	_leave(" = %d [waitfail %d]", copied, ret);
-	return copied;
-
+	goto out_copied;
 }
 
 /**
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 429899689408..d05161a168bc 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2042,11 +2042,11 @@ static int sctp_skb_pull(struct sk_buff *skb, int len)
  *  flags   - flags sent or received with the user message, see Section
  *            5 for complete description of the flags.
  */
-static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
+static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *, long *);
 
 static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sctp_ulpevent *event = NULL;
 	struct sctp_sock *sp = sctp_sk(sk);
@@ -2066,7 +2066,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		goto out;
 	}
 
-	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err);
+	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
@@ -6519,13 +6519,13 @@ out:
  * with a few changes to make lksctp work.
  */
 static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
-					      int noblock, int *err)
+					      int noblock, int *err, long *timeop)
 {
 	int error;
 	struct sk_buff *skb;
 	long timeo;
 
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	pr_debug("%s: timeo:%ld, max:%ld\n", __func__, timeo,
 		 MAX_SCHEDULE_TIMEOUT);
@@ -6549,7 +6549,7 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		}
 
 		if (skb)
-			return skb;
+			break;
 
 		/* Caller is allowed not to check sk->sk_err before calling. */
 		error = sock_error(sk);
@@ -6569,11 +6569,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 			goto no_packet;
 	} while (sctp_wait_for_packet(sk, err, &timeo) == 0);
 
-	return NULL;
+out:
+	if (timeop)
+		*timeop = timeo;
+
+	return skb;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 
 /* If sndbuf has changed, wake up per association sndbuf waiters.  */
diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..310a50971769 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -772,7 +772,7 @@ void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops);
 
 static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
-				       struct msghdr *msg, size_t size, int flags)
+				       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
@@ -782,19 +782,19 @@ static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 	si->size = size;
 	si->flags = flags;
 
-	return sock->ops->recvmsg(iocb, sock, msg, size, flags);
+	return sock->ops->recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				 struct msghdr *msg, size_t size, int flags)
+				 struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	int err = security_socket_recvmsg(sock, msg, size, flags);
 
-	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags);
+	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags, timeop);
 }
 
 int sock_recvmsg(struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -802,7 +802,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -810,7 +810,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 EXPORT_SYMBOL(sock_recvmsg);
 
 static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
-			      size_t size, int flags)
+			      size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -818,7 +818,7 @@ static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -851,7 +851,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
 	 * iovec are identical, yielding the same in-core layout and alignment
 	 */
 	msg->msg_iov = (struct iovec *)vec, msg->msg_iovlen = num;
-	result = sock_recvmsg(sock, msg, size, flags);
+	result = sock_recvmsg(sock, msg, size, flags, NULL);
 	set_fs(oldfs);
 	return result;
 }
@@ -914,7 +914,7 @@ static ssize_t do_sock_read(struct msghdr *msg, struct kiocb *iocb,
 	msg->msg_iovlen = nr_segs;
 	msg->msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
 
-	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags);
+	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags, NULL);
 }
 
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
@@ -1862,7 +1862,7 @@ SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
 	msg.msg_namelen = 0;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
-	err = sock_recvmsg(sock, &msg, size, flags);
+	err = sock_recvmsg(sock, &msg, size, flags, NULL);
 
 	if (err >= 0 && addr != NULL) {
 		err2 = move_addr_to_user(&address,
@@ -2207,7 +2207,7 @@ SYSCALL_DEFINE4(sendmmsg, int, fd, struct mmsghdr __user *, mmsg,
 }
 
 static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
-			 struct msghdr *msg_sys, unsigned int flags, int nosec)
+			 struct msghdr *msg_sys, unsigned int flags, int nosec, long *timeop)
 {
 	struct compat_msghdr __user *msg_compat =
 	    (struct compat_msghdr __user *)msg;
@@ -2265,7 +2265,7 @@ static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys,
-							  total_len, flags);
+							  total_len, flags, timeop);
 	if (err < 0)
 		goto out_freeiov;
 	len = err;
@@ -2312,7 +2312,7 @@ long __sys_recvmsg(int fd, struct msghdr __user *msg, unsigned flags)
 	if (!sock)
 		goto out;
 
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0, NULL);
 
 	fput_light(sock->file, fput_needed);
 out:
@@ -2327,6 +2327,30 @@ SYSCALL_DEFINE3(recvmsg, int, fd, struct msghdr __user *, msg,
 	return __sys_recvmsg(fd, msg, flags);
 }
 
+static int sock_set_timeout_ts(long *timeo_p, struct timespec *ts)
+{
+	if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC)
+		return -EDOM;
+
+	if (ts->tv_sec < 0) {
+		static int warned __read_mostly;
+
+		*timeo_p = 0;
+		if (warned < 10 && net_ratelimit()) {
+			warned++;
+			pr_info("%s: `%s' (pid %d) tries to set negative timeout\n",
+				__func__, current->comm, task_pid_nr(current));
+		}
+		return 0;
+	}
+	*timeo_p = MAX_SCHEDULE_TIMEOUT;
+	if (ts->tv_sec == 0 && ts->tv_nsec == 0)
+		return 0;
+	if (ts->tv_sec < (MAX_SCHEDULE_TIMEOUT / HZ - 1))
+		*timeo_p = ts->tv_sec * HZ + (ts->tv_nsec + (NSEC_PER_SEC / HZ - 1)) / (NSEC_PER_SEC / HZ);
+	return 0;
+}
+
 /*
  *     Linux recvmmsg interface
  */
@@ -2339,12 +2363,14 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
-	struct timespec end_time;
+	long timeout_hz, *timeop = NULL;
 
-	if (timeout &&
-	    poll_select_set_timeout(&end_time, timeout->tv_sec,
-				    timeout->tv_nsec))
-		return -EINVAL;
+	if (timeout) {
+		err = sock_set_timeout_ts(&timeout_hz, timeout);
+		if (err)
+			return err;
+		timeop = &timeout_hz;
+	}
 
 	datagrams = 0;
 
@@ -2366,7 +2392,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (MSG_CMSG_COMPAT & flags) {
 			err = ___sys_recvmsg(sock, (struct msghdr __user *)compat_entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = __put_user(err, &compat_entry->msg_len);
@@ -2375,7 +2401,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 			err = ___sys_recvmsg(sock,
 					     (struct msghdr __user *)entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = put_user(err, &entry->msg_len);
@@ -2390,17 +2416,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (flags & MSG_WAITFORONE)
 			flags |= MSG_DONTWAIT;
 
-		if (timeout) {
-			ktime_get_ts(timeout);
-			*timeout = timespec_sub(end_time, *timeout);
-			if (timeout->tv_sec < 0) {
-				timeout->tv_sec = timeout->tv_nsec = 0;
-				break;
-			}
-
+		if (timeout && timeout_hz == 0) {
 			/* Timeout, return less than vlen datagrams */
-			if (timeout->tv_nsec == 0 && timeout->tv_sec == 0)
-				break;
+			timeout->tv_sec = timeout->tv_nsec = 0;
+			timeop = NULL;
+			break;
 		}
 
 		/* Out of band data, return right away */
@@ -2411,6 +2431,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 out_put:
 	fput_light(sock->file, fput_needed);
 
+	if (timeop) {
+		timeout->tv_sec	 = timeout_hz / HZ;
+		timeout->tv_nsec = (timeout_hz % HZ) * (NSEC_PER_SEC / HZ);
+	}
+
 	if (err == 0)
 		return datagrams;
 
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 43bcb4699d69..e1e61082f45d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -545,7 +545,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
 			     0, 0, MSG_PEEK | MSG_DONTWAIT);
 	if (err >= 0)
-		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err);
+		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err, NULL);
 
 	if (skb == NULL) {
 		if (err != -EAGAIN) {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 1dec6043e4de..f0008257ca68 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -965,7 +965,7 @@ static void xs_local_data_ready(struct sock *sk)
 	if (xprt == NULL)
 		goto out;
 
-	skb = skb_recv_datagram(sk, 0, 1, &err);
+	skb = skb_recv_datagram(sk, 0, 1, &err, NULL);
 	if (skb == NULL)
 		goto out;
 
@@ -1027,7 +1027,7 @@ static void xs_udp_data_ready(struct sock *sk)
 	if (!(xprt = xprt_from_sock(sk)))
 		goto out;
 
-	if ((skb = skb_recv_datagram(sk, 0, 1, &err)) == NULL)
+	if ((skb = skb_recv_datagram(sk, 0, 1, &err, NULL)) == NULL)
 		goto out;
 
 	repsize = skb->len - sizeof(struct udphdr);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 08d87fc80b10..b4f7d923c9e2 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1031,7 +1031,7 @@ static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *m, size_t buf_len, int flags)
+			struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
@@ -1054,7 +1054,7 @@ static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 restart:
 
 	/* Look for a message in receive queue; wait if necessary */
@@ -1109,6 +1109,8 @@ restart:
 		advance_rx_queue(sk);
 	}
 exit:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return res;
 }
@@ -1126,7 +1128,7 @@ exit:
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recv_stream(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *m, size_t buf_len, int flags)
+			    struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b9114e0a5b1..3203defdb503 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -519,17 +519,17 @@ static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct kiocb *, struct socket *,
 			       struct msghdr *, size_t);
 static int unix_stream_recvmsg(struct kiocb *, struct socket *,
-			       struct msghdr *, size_t, int);
+			       struct msghdr *, size_t, int, long *);
 static int unix_dgram_sendmsg(struct kiocb *, struct socket *,
 			      struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct kiocb *, struct socket *,
-			      struct msghdr *, size_t, int);
+			      struct msghdr *, size_t, int, long *);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
 			      int, int);
 static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *,
 				  struct msghdr *, size_t);
 static int unix_seqpacket_recvmsg(struct kiocb *, struct socket *,
-				  struct msghdr *, size_t, int);
+				  struct msghdr *, size_t, int, long *);
 
 static int unix_set_peek_off(struct sock *sk, int val)
 {
@@ -1283,7 +1283,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags)
 	 * so that no locks are necessary.
 	 */
 
-	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err);
+	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err, NULL);
 	if (!skb) {
 		/* This means receive shutdown. */
 		if (err == 0)
@@ -1755,14 +1755,14 @@ static int unix_seqpacket_sendmsg(struct kiocb *kiocb, struct socket *sock,
 
 static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 
 	if (sk->sk_state != TCP_ESTABLISHED)
 		return -ENOTCONN;
 
-	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
+	return unix_dgram_recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
@@ -1777,7 +1777,7 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 
 static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1803,7 +1803,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	skip = sk_peek_offset(sk, flags);
 
-	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err);
+	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err, timeop);
 	if (!skb) {
 		unix_state_lock(sk);
 		/* Signal EOF on disconnected non-blocking SEQPACKET socket. */
@@ -1914,7 +1914,7 @@ static unsigned int unix_skb_len(const struct sk_buff *skb)
 
 static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1926,7 +1926,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int check_creds = 0;
 	int target;
 	int err = 0;
-	long timeo;
+	long timeo = sock_rcvtimeop(sk, timeop, noblock);
 	int skip;
 
 	err = -EINVAL;
@@ -1938,7 +1938,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
 
 	/* Lock the socket to prevent queue disordering
 	 * while sleeps in memcpy_tomsg
@@ -2071,6 +2070,8 @@ again:
 	mutex_unlock(&u->readlock);
 	scm_recv(sock, msg, siocb->scm, flags);
 out:
+	if (timeop)
+		*timeop = timeo;
 	return copied ? : err;
 }
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 85d232bed87d..10568565f57d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1063,10 +1063,10 @@ out:
 }
 
 static int vsock_dgram_recvmsg(struct kiocb *kiocb, struct socket *sock,
-			       struct msghdr *msg, size_t len, int flags)
+			       struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	return transport->dgram_dequeue(kiocb, vsock_sk(sock->sk), msg, len,
-					flags);
+					flags, timeop);
 }
 
 static const struct proto_ops vsock_dgram_ops = {
@@ -1646,7 +1646,7 @@ out:
 static int
 vsock_stream_recvmsg(struct kiocb *kiocb,
 		     struct socket *sock,
-		     struct msghdr *msg, size_t len, int flags)
+		     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
@@ -1661,6 +1661,7 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
 	err = 0;
+	timeout = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	lock_sock(sk);
 
@@ -1711,7 +1712,6 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 		err = -ENOMEM;
 		goto out;
 	}
-	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 	copied = 0;
 
 	err = transport->notify_recv_init(vsk, target, &recv_data);
@@ -1821,6 +1821,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 out_wait:
 	finish_wait(sk_sleep(sk), &wait);
 out:
+	if (timeop)
+		*timeop = timeout;
 	release_sock(sk);
 	return err;
 }
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 9bb63ffec4f2..9c9e43c17b34 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1733,7 +1733,7 @@ static int vmci_transport_dgram_enqueue(
 static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 					struct vsock_sock *vsk,
 					struct msghdr *msg, size_t len,
-					int flags)
+					int flags, long *timeop)
 {
 	int err;
 	int noblock;
@@ -1748,7 +1748,7 @@ static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 
 	/* Retrieve the head sk_buff from the socket's receive queue. */
 	err = 0;
-	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err);
+	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err, timeop);
 	if (err)
 		return err;
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418ef093..da22c042469a 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1254,7 +1254,7 @@ out_kfree_skb:
 
 static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *msg, size_t size,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct x25_sock *x25 = x25_sk(sk);
@@ -1306,7 +1306,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		/* Now we can treat all alike */
 		release_sock(sk);
 		skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-					flags & MSG_DONTWAIT, &rc);
+					flags & MSG_DONTWAIT, &rc, timeop);
 		lock_sock(sk);
 		if (!skb)
 			goto out;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 19:21               ` Arnaldo Carvalho de Melo
  2014-05-27 19:22                 ` Arnaldo Carvalho de Melo
@ 2014-05-27 19:28                 ` Michael Kerrisk (man-pages)
  2014-05-27 20:30                   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-27 19:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen,
	Arnaldo Carvalho de Melo

On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
<acme@ghostprotocols.net> wrote:
> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
>> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>> > Can you try the attached patch on top of the first one?
>
>> Patches on patches is a way to make your testers work unnecessarily
>> harder. Also, it means that anyone else who was interested in this
>
> It was meant to highlight the changes with regard to the previous patch,
> i.e. to make things easier for reviewing.

(I don't think that works...)

>> thread likely got lost at this point, because they probably didn't
>> save the first patch. All of this to say: it makes life much easier
>> if you provide a complete new self-contained patch on each iteration.
>
> If you prefer it that way, find one attached, that I was about to send
> (but you can wait till I use your program to test it ;-) )
>
>> > It starts adding explicit parentheses on a ternary, as David requested,
>> > and then should return the remaining timeouts in cases like signals,
>> > etc.
>> >
>> > Please let me know if this is enough.
>>
>> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)
>
> What was the problem experienced?

The problem is that after EINTR, the timeout is not updated with the
remaining time until expiry. (This was true with just patch 1 applied,
and is also true with both patch 1 and patch 2 applied.)

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 19:28                 ` Michael Kerrisk (man-pages)
@ 2014-05-27 20:30                   ` Arnaldo Carvalho de Melo
  2014-05-28  5:00                     ` Michael Kerrisk (man-pages)
  2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
  0 siblings, 2 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-27 20:30 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu:
> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
> <acme@ghostprotocols.net> wrote:
> > Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
> >> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> >> > Can you try the attached patch on top of the first one?
> >
> >> Patches on patches is a way to make your testers work unnecessarily
> >> harder. Also, it means that anyone else who was interested in this
> >
> > It was meant to highlight the changes with regard to the previous patch,
> > i.e. to make things easier for reviewing.
> 
> (I don't think that works...)

Lets try both then, attached goes the updated patch, and this is the
diff to the last combined one:

diff --git a/net/socket.c b/net/socket.c
index 310a50971769..379be43879db 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
 
 	datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys);
 
-	if (datagrams > 0 &&
-	    copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
+	if (copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
 		datagrams = -EFAULT;
 
 	return datagrams;
 
------------------------------------------

This is a quick thing just to show where the problem lies, need to think
how to report an -EFAULT at this point properly, i.e. look at
__sys_recvmmsg for something related (returning the number of
successfully copied datagrams to userspace while storing the error for
subsequent reporting):

        if (err == 0)
                return datagrams;

        if (datagrams != 0) {
                /*
                 * We may return less entries than requested (vlen) if
                 * the
                 * sock is non block and there aren't enough
                 * datagrams...
                 */
                if (err != -EAGAIN) {
                        /*
                         * ... or  if recvmsg returns an error after we
                         * received some datagrams, where we record the
                         * error to return on the next call or if the
                         * app asks about it using getsockopt(SO_ERROR).
                         */
                        sock->sk->sk_err = -err;
                }

                return datagrams;
        }

I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
more about it, sidetracked now, will be back to this.

Anyway, attached goes the current combined patch.

- Arnaldo

> >> thread likely got lost at this point, because they probably didn't
> >> save the first patch. All of this to say: it makes life much easier
> >> if you provide a complete new self-contained patch on each iteration.
> >
> > If you prefer it that way, find one attached, that I was about to send
> > (but you can wait till I use your program to test it ;-) )
> >
> >> > It starts adding explicit parentheses on a ternary, as David requested,
> >> > and then should return the remaining timeouts in cases like signals,
> >> > etc.
> >> >
> >> > Please let me know if this is enough.
> >>
> >> Nope, it doesn't fix the problem. (I applied both patches against 3.15-rc7)
> >
> > What was the problem experienced?
> 
> The problem is that after EINTR, the timeout is not updated with the
> remaining time until expiry. (This was true with just patch 1 applied,
> and is also true with both patch 1 and patch 2 applied.)
> 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: recvmmsg-timeout-v3.patch --]
[-- Type: text/plain, Size: 78113 bytes --]

commit 27a6d6949bcbc80577aec9c9795e2b5547f83fb3
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Tue May 20 15:36:40 2014 -0300

    net: Fix recvmmsg timeout handling
    
    As reported by Elie de Brauwer the timeout handling in the recvmmsg
    syscall had issues that boil down to it not properly passing the
    remaining time to each underlying recvmsg() call.
    
    Fix it by adding a timeout pointer to the recvmsg implementations, so
    that it can use that in a variation of sock_rcvtimeo() that overrides
    the value in SO_RCVTIMEO with the timeout passed and returns the
    remaining time in that pointer, this way each underlying recvmsg call
    receives the remaining time.
    
    It ends up in most cases being just a forward of this pointer from the
    per protocol recvmsg() implementations to skb_recv_datagram().
    
    Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
    Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
    Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Caitlin Bestler <caitlin.bestler@gmail.com>
    Cc: Chris Friesen <chris.friesen@windriver.com>
    Cc: Elie De Brauwer <eliedebrauwer@gmail.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Neil Horman <nhorman@tuxdriver.com>
    Cc: Ondřej Bílka <neleai@seznam.cz>
    Cc: Paul Moore <paul@paul-moore.com>
    Cc: Rémi Denis-Courmont <remi@remlab.net>
    Cc: Steven Whitehouse <steve@chygwyn.com>
    Link: http://lkml.kernel.org/n/net-next-vppnur0ix2gzpbbasnkr4p8p@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 850246206b12..e5d36f815083 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -151,7 +151,7 @@ unlock:
 }
 
 static int hash_recvmsg(struct kiocb *unused, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags)
+			struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a19c027b29bd..4bde01591174 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -419,7 +419,7 @@ unlock:
 }
 
 static int skcipher_recvmsg(struct kiocb *unused, struct socket *sock,
-			    struct msghdr *msg, size_t ignored, int flags)
+			    struct msghdr *msg, size_t ignored, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 1be82284cf9d..254515f71793 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -113,7 +113,7 @@ mISDN_sock_cmsg(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
 
 static int
 mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-		   struct msghdr *msg, size_t len, int flags)
+		   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sk_buff		*skb;
 	struct sock		*sk = sock->sk;
@@ -130,7 +130,7 @@ mISDN_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == MISDN_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3381c4f91a8c..13d12ef322f2 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1111,7 +1111,7 @@ static int macvtap_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int macvtap_recvmsg(struct kiocb *iocb, struct socket *sock,
 			   struct msghdr *m, size_t total_len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct macvtap_queue *q = container_of(sock, struct macvtap_queue, sock);
 	int ret;
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 2ea7efd11857..30194c6e3fe8 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -963,7 +963,7 @@ static const struct ppp_channel_ops pppoe_chan_ops = {
 };
 
 static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *m, size_t total_len, int flags)
+		  struct msghdr *m, size_t total_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -975,7 +975,7 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &error);
+				flags & MSG_DONTWAIT, &error, timeop);
 	if (error < 0)
 		goto end;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 98bad1fb1bfb..6a41a841e7e8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1327,7 +1327,7 @@ done:
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
-			   const struct iovec *iv, ssize_t len, int noblock)
+			   const struct iovec *iv, ssize_t len, int noblock, long *timeop)
 {
 	struct sk_buff *skb;
 	ssize_t ret = 0;
@@ -1343,7 +1343,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 
 	/* Read frames from queue */
 	skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (skb) {
 		ret = tun_put_user(tun, tfile, skb, iv, len);
 		kfree_skb(skb);
@@ -1370,7 +1370,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 	}
 
 	ret = tun_do_read(tun, tfile, iv, len,
-			  file->f_flags & O_NONBLOCK);
+			  file->f_flags & O_NONBLOCK, NULL);
 	ret = min_t(ssize_t, ret, len);
 	if (ret > 0)
 		iocb->ki_pos = ret;
@@ -1452,7 +1452,7 @@ static int tun_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *m, size_t total_len,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = __tun_get(tfile);
@@ -1471,7 +1471,7 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 	}
 	ret = tun_do_read(tun, tfile, m->msg_iov, total_len,
-			  flags & MSG_DONTWAIT);
+			  flags & MSG_DONTWAIT, timeop);
 	if (ret > total_len) {
 		m->msg_flags |= MSG_TRUNC;
 		ret = flags & MSG_TRUNC ? ret : total_len;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index be414d2b2b22..46a706378d79 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -601,7 +601,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(headcount > UIO_MAXIOV)) {
 			msg.msg_iovlen = 1;
 			err = sock->ops->recvmsg(NULL, sock, &msg,
-						 1, MSG_DONTWAIT | MSG_TRUNC);
+						 1, MSG_DONTWAIT | MSG_TRUNC, NULL);
 			pr_debug("Discarded rx packet: len %zd\n", sock_len);
 			continue;
 		}
@@ -627,7 +627,7 @@ static void handle_rx(struct vhost_net *net)
 			copy_iovec_hdr(vq->iov, nvq->hdr, sock_hlen, in);
 		msg.msg_iovlen = in;
 		err = sock->ops->recvmsg(NULL, sock, &msg,
-					 sock_len, MSG_DONTWAIT | MSG_TRUNC);
+					 sock_len, MSG_DONTWAIT | MSG_TRUNC, NULL);
 		/* Userspace might have consumed the packet meanwhile:
 		 * it's not supposed to do this usually, but might be hard
 		 * to prevent. Discard data we got (if any) and keep going. */
diff --git a/include/linux/net.h b/include/linux/net.h
index 17d83393afcc..f908cdd8cdd3 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -171,10 +171,13 @@ struct proto_ops {
 	 * returning uninitialized memory to user space.  The recvfrom
 	 * handlers can assume that msg.msg_name is either NULL or has
 	 * a minimum size of sizeof(struct sockaddr_storage).
+	 * timeop contains a per call timeout (as opposed as per socket,
+	 * used by recvmmsg, set it to NULL to disable it. It should return
+	 * the remaining time, if not NULL, even when interrupted by a signal.
 	 */
 	int		(*recvmsg)   (struct kiocb *iocb, struct socket *sock,
 				      struct msghdr *m, size_t total_len,
-				      int flags);
+				      int flags, long *timeop);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
@@ -215,7 +218,7 @@ int sock_create_lite(int family, int type, int proto, struct socket **res);
 void sock_release(struct socket *sock);
 int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
-		 int flags);
+		 int flags, long *timeop);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
 struct socket *sockfd_lookup(int fd, int *err);
 struct socket *sock_from_file(struct file *file, int *err);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7a9beeb1c458..cdfdd1bd6358 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2479,9 +2479,9 @@ static inline void skb_frag_add_head(struct sk_buff *skb, struct sk_buff *frag)
 	for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next)
 
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
-				    int *peeked, int *off, int *err);
+				    int *peeked, int *off, int *err, long *timeop);
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
-				  int *err);
+				  int *err, long *timeop);
 unsigned int datagram_poll(struct file *file, struct socket *sock,
 			   struct poll_table_struct *wait);
 int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 428277869400..6c007bd57f39 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -101,7 +101,7 @@ struct vsock_transport {
 	/* DGRAM. */
 	int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
 	int (*dgram_dequeue)(struct kiocb *kiocb, struct vsock_sock *vsk,
-			     struct msghdr *msg, size_t len, int flags);
+			     struct msghdr *msg, size_t len, int flags, long *timeop);
 	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
 			     struct iovec *, size_t len);
 	bool (*dgram_allow)(u32 cid, u32 port);
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index 904777c1cd24..ee75e6875aab 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -246,9 +246,9 @@ void bt_sock_unregister(int proto);
 void bt_sock_link(struct bt_sock_list *l, struct sock *s);
 void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
 int  bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags);
+				struct msghdr *msg, size_t len, int flags, long *timeop);
 int  bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t len, int flags);
+			struct msghdr *msg, size_t len, int flags, long *timeop);
 uint bt_sock_poll(struct file *file, struct socket *sock, poll_table *wait);
 int  bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
 int  bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo);
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index fe7994c48b75..f80071949b98 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -26,7 +26,7 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 		      size_t size, int flags);
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags);
+		 size_t size, int flags, long *timeop);
 int inet_shutdown(struct socket *sock, int how);
 int inet_listen(struct socket *sock, int backlog);
 void inet_sock_destruct(struct sock *sk);
diff --git a/include/net/ping.h b/include/net/ping.h
index 026479b61a2d..c259ba72c811 100644
--- a/include/net/ping.h
+++ b/include/net/ping.h
@@ -76,7 +76,7 @@ int  ping_getfrag(void *from, char *to, int offset, int fraglen, int odd,
 		  struct sk_buff *);
 
 int  ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int  ping_common_sendmsg(int family, struct msghdr *msg, size_t len,
 			 void *user_icmph, size_t icmph_len);
 int  ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
diff --git a/include/net/sock.h b/include/net/sock.h
index 07b7fcd60d80..c48f61c79801 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -961,7 +961,7 @@ struct proto {
 	int			(*recvmsg)(struct kiocb *iocb, struct sock *sk,
 					   struct msghdr *msg,
 					   size_t len, int noblock, int flags,
-					   int *addr_len);
+					   int *addr_len, long *timeop);
 	int			(*sendpage)(struct sock *sk, struct page *page,
 					int offset, size_t size, int flags);
 	int			(*bind)(struct sock *sk,
@@ -1593,7 +1593,7 @@ int sock_no_getsockopt(struct socket *, int , int, char __user *, int __user *);
 int sock_no_setsockopt(struct socket *, int, int, char __user *, unsigned int);
 int sock_no_sendmsg(struct kiocb *, struct socket *, struct msghdr *, size_t);
 int sock_no_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		    int);
+		    int, long *);
 int sock_no_mmap(struct file *file, struct socket *sock,
 		 struct vm_area_struct *vma);
 ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
@@ -1606,7 +1606,7 @@ ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
 int sock_common_getsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, int __user *optlen);
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags);
+			       struct msghdr *msg, size_t size, int flags, long *timeop);
 int sock_common_setsockopt(struct socket *sock, int level, int optname,
 				  char __user *optval, unsigned int optlen);
 int compat_sock_common_getsockopt(struct socket *sock, int level,
@@ -2104,6 +2104,11 @@ static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
 	return noblock ? 0 : sk->sk_rcvtimeo;
 }
 
+static inline long sock_rcvtimeop(const struct sock *sk, long *timeop, bool noblock)
+{
+	return noblock ? 0 : (timeop ? *timeop : sk->sk_rcvtimeo);
+}
+
 static inline long sock_sndtimeo(const struct sock *sk, bool noblock)
 {
 	return noblock ? 0 : sk->sk_sndtimeo;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e80abe4486cb..60b72ae2cdda 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -437,7 +437,7 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname,
 void tcp_set_keepalive(struct sock *sk, int val);
 void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req);
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len);
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop);
 void tcp_parse_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx,
 		       int estab, struct tcp_fastopen_cookie *foc);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 01a1082e02b3..11893a32a3c0 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1732,7 +1732,7 @@ out:
 }
 
 static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-			 size_t size, int flags)
+			 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ddpehdr *ddp;
@@ -1742,7 +1742,7 @@ static int atalk_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr
 	struct sk_buff *skb;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-						flags & MSG_DONTWAIT, &err);
+						flags & MSG_DONTWAIT, &err, timeop);
 	lock_sock(sk);
 
 	if (!skb)
diff --git a/net/atm/common.c b/net/atm/common.c
index 7b491006eaf4..8def66eaed87 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -524,7 +524,7 @@ int vcc_connect(struct socket *sock, int itf, short vpi, int vci)
 }
 
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags)
+		size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct atm_vcc *vcc;
@@ -544,7 +544,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	    !test_bit(ATM_VF_READY, &vcc->flags))
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/atm/common.h b/net/atm/common.h
index cc3c2dae4d79..b370ffd78a39 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -14,7 +14,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family);
 int vcc_release(struct socket *sock);
 int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int flags);
+		size_t size, int flags, long *timeop);
 int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 		size_t total_len);
 unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index c35c3f48fc0f..ee0411920216 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1600,7 +1600,7 @@ out:
 }
 
 static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1619,7 +1619,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* Now we can treat all alike */
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 2021c481cdb6..4896bd954293 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -209,7 +209,7 @@ struct sock *bt_accept_dequeue(struct sock *parent, struct socket *newsock)
 EXPORT_SYMBOL(bt_accept_dequeue);
 
 int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *msg, size_t len, int flags)
+				struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -222,7 +222,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
@@ -282,7 +282,7 @@ static long bt_sock_data_wait(struct sock *sk, long timeo)
 }
 
 int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int err = 0;
@@ -297,7 +297,7 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	lock_sock(sk);
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo  = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo  = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		struct sk_buff *skb;
@@ -381,6 +381,8 @@ int bt_sock_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	} while (size);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return copied ? : err;
 }
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index f608bffdb8b9..f24413835e2c 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -829,7 +829,7 @@ static void hci_sock_cmsg(struct sock *sk, struct msghdr *msg,
 }
 
 static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -844,7 +844,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (sk->sk_state == BT_CLOSED)
 		return 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index ef5e5b04f34f..19a90e0d8172 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -976,7 +976,7 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t len, int flags)
+			      struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct l2cap_pinfo *pi = l2cap_pi(sk);
@@ -1003,9 +1003,9 @@ static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	release_sock(sk);
 
 	if (sock->type == SOCK_STREAM)
-		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_stream_recvmsg(iocb, sock, msg, len, flags, timeop);
 	else
-		err = bt_sock_recvmsg(iocb, sock, msg, len, flags);
+		err = bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 
 	if (pi->chan->mode != L2CAP_MODE_ERTM)
 		return err;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index c603a5eb4720..a3cbf8c4daf5 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -617,7 +617,7 @@ done:
 }
 
 static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rfcomm_dlc *d = rfcomm_pi(sk)->dlc;
@@ -628,7 +628,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return 0;
 	}
 
-	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags);
+	len = bt_sock_stream_recvmsg(iocb, sock, msg, size, flags, timeop);
 
 	lock_sock(sk);
 	if (!(flags & MSG_PEEK) && len > 0)
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index c06dbd3938e8..bfaa16bdc366 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -700,7 +700,7 @@ static void sco_conn_defer_accept(struct hci_conn *conn, u16 setting)
 }
 
 static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *msg, size_t len, int flags)
+			    struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sco_pinfo *pi = sco_pi(sk);
@@ -718,7 +718,7 @@ static int sco_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	release_sock(sk);
 
-	return bt_sock_recvmsg(iocb, sock, msg, len, flags);
+	return bt_sock_recvmsg(iocb, sock, msg, len, flags, timeop);
 }
 
 static int sco_sock_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen)
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index e8437094d15f..069eb2ffde29 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -272,7 +272,7 @@ static void caif_check_flow_release(struct sock *sk)
  * changed locking, address handling and added MSG_TRUNC.
  */
 static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *m, size_t len, int flags)
+			       struct msghdr *m, size_t len, int flags, long *timeop)
 
 {
 	struct sock *sk = sock->sk;
@@ -284,7 +284,7 @@ static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (m->msg_flags&MSG_OOB)
 		goto read_error;
 
-	skb = skb_recv_datagram(sk, flags, 0 , &ret);
+	skb = skb_recv_datagram(sk, flags, 0 , &ret, timeop);
 	if (!skb)
 		goto read_error;
 	copylen = skb->len;
@@ -345,7 +345,7 @@ static long caif_stream_data_wait(struct sock *sk, long timeo)
  */
 static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int copied = 0;
@@ -367,7 +367,7 @@ static int caif_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	caif_read_lock(sk);
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, flags&MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags&MSG_DONTWAIT);
 
 	do {
 		int chunk;
@@ -450,6 +450,8 @@ unlock:
 	caif_read_unlock(sk);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return copied ? : err;
 }
 
diff --git a/net/can/bcm.c b/net/can/bcm.c
index dcb75c0e66c1..dc12c80ec5cd 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1541,7 +1541,7 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
 }
 
 static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -1551,7 +1551,7 @@ static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
-	skb = skb_recv_datagram(sk, flags, noblock, &error);
+	skb = skb_recv_datagram(sk, flags, noblock, &error, timeop);
 	if (!skb)
 		return error;
 
diff --git a/net/can/raw.c b/net/can/raw.c
index 081e81fd017f..0a4aa9d98e5e 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -731,7 +731,7 @@ send_failed:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
-		       struct msghdr *msg, size_t size, int flags)
+		       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -741,7 +741,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 	noblock =  flags & MSG_DONTWAIT;
 	flags   &= ~MSG_DONTWAIT;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		return err;
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a16ed7bbe376..0dd1715374fa 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -138,6 +138,9 @@ out_noerr:
  *	@off: an offset in bytes to peek skb from. Returns an offset
  *	      within an skb where data actually starts
  *	@err: error code returned
+ *	@timeop: per call timeout (as opposed as per socket via SO_RCVTIMEO),
+ *		 will return the remaining time, used in recvmmsg, ignored
+ *		 if set to NULL.
  *
  *	Get a datagram skbuff, understands the peeking, nonblocking wakeups
  *	and possible races. This replaces identical code in packet, raw and
@@ -162,7 +165,7 @@ out_noerr:
  *	the standard around please.
  */
 struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
-				    int *peeked, int *off, int *err)
+				    int *peeked, int *off, int *err, long *timeop)
 {
 	struct sk_buff *skb, *last;
 	long timeo;
@@ -174,7 +177,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 	if (error)
 		goto no_packet;
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	do {
 		/* Again only user level code calls this function, so nothing
@@ -205,6 +208,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
+			if (timeop)
+				*timeop = timeo;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
@@ -219,22 +224,24 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			goto no_packet;
 
 	} while (!wait_for_more_packets(sk, err, &timeo, last));
-
+out:
+	if (timeop)
+		*timeop = timeo;
 	return NULL;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 EXPORT_SYMBOL(__skb_recv_datagram);
 
 struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags,
-				  int noblock, int *err)
+				  int noblock, int *err, long *timeop)
 {
 	int peeked, off = 0;
 
 	return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				   &peeked, &off, err);
+				   &peeked, &off, err, timeop);
 }
 EXPORT_SYMBOL(skb_recv_datagram);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 026e01f70274..b462e38785af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2191,7 +2191,7 @@ int sock_no_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 EXPORT_SYMBOL(sock_no_sendmsg);
 
 int sock_no_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
-		    size_t len, int flags)
+		    size_t len, int flags, long *timeop)
 {
 	return -EOPNOTSUPP;
 }
@@ -2577,14 +2577,14 @@ EXPORT_SYMBOL(compat_sock_common_getsockopt);
 #endif
 
 int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
 	int err;
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index c67816647cce..fbf4cc113ffe 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -314,7 +314,7 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		 size_t size);
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		 struct msghdr *msg, size_t len, int nonblock, int flags,
-		 int *addr_len);
+		 int *addr_len, long *timeop);
 void dccp_shutdown(struct sock *sk, int how);
 int inet_dccp_listen(struct socket *sock, int backlog);
 unsigned int dccp_poll(struct file *file, struct socket *sock,
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index de2c1e719305..92ae3d37c7f0 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -808,7 +808,7 @@ out_discard:
 EXPORT_SYMBOL_GPL(dccp_sendmsg);
 
 int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int nonblock, int flags, int *addr_len)
+		 size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	const struct dccp_hdr *dh;
 	long timeo;
@@ -820,7 +820,7 @@ int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	do {
 		struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
@@ -910,6 +910,8 @@ verify_sock_status:
 	} while (1);
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return len;
 }
 
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index ae011b46c071..86dfcbe505de 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1669,7 +1669,7 @@ static int dn_data_ready(struct sock *sk, struct sk_buff_head *q, int flags, int
 
 
 static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
-	struct msghdr *msg, size_t size, int flags)
+	struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct dn_scp *scp = DN_SK(sk);
@@ -1680,7 +1680,7 @@ static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
 	struct sk_buff *skb, *n;
 	struct dn_skb_cb *cb = NULL;
 	unsigned char eor = 0;
-	long timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	long timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	lock_sock(sk);
 
@@ -1814,7 +1814,8 @@ out:
 	}
 
 	release_sock(sk);
-
+	if (timeop)
+		*timeop = timeo;
 	return rv;
 }
 
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index 4f0ed8780194..dd7de8959d07 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -305,14 +305,14 @@ out:
 
 static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
 		struct msghdr *msg, size_t len, int noblock, int flags,
-		int *addr_len)
+		int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 	DECLARE_SOCKADDR(struct sockaddr_ieee802154 *, saddr, msg->msg_name);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 74d54fae33d7..0303aa66a9e2 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -179,13 +179,13 @@ out:
 }
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	size_t copied = 0;
 	int err = -EOPNOTSUPP;
 	struct sk_buff *skb;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0e9bb08a91e4..5e0a9dc931e6 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -757,7 +757,7 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 EXPORT_SYMBOL(inet_sendpage);
 
 int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	int addr_len = 0;
@@ -766,7 +766,7 @@ int inet_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	sock_rps_record_flow(sk);
 
 	err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
-				   flags & ~MSG_DONTWAIT, &addr_len);
+				   flags & ~MSG_DONTWAIT, &addr_len, timeop);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
 	return err;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 044a0ddf6a79..791be60b38f1 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -840,7 +840,7 @@ do_confirm:
 }
 
 int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		 size_t len, int noblock, int flags, int *addr_len)
+		 size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *isk = inet_sk(sk);
 	int family = sk->sk_family;
@@ -864,7 +864,7 @@ int ping_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a9dbe58bdfe7..32aee7472bb3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -685,7 +685,7 @@ out:	return ret;
  */
 
 static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		       size_t len, int noblock, int flags, int *addr_len)
+		       size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -701,7 +701,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		goto out;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index eb1dde37e678..bc506ffbc8d0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1601,7 +1601,7 @@ EXPORT_SYMBOL(tcp_read_sock);
  */
 
 int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int nonblock, int flags, int *addr_len)
+		size_t len, int nonblock, int flags, int *addr_len, long *timeop)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int copied = 0;
@@ -1626,7 +1626,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	/* Urgent data needs to be handled specially. */
 	if (flags & MSG_OOB)
@@ -1993,20 +1993,18 @@ skip_copy:
 
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
-
-	release_sock(sk);
-	return copied;
-
 out:
 	release_sock(sk);
-	return err;
+	if (timeop)
+		*timeop = timeo;
+	return copied;
 
 recv_urg:
-	err = tcp_recv_urg(sk, msg, len, flags);
+	copied = tcp_recv_urg(sk, msg, len, flags);
 	goto out;
 
 recv_sndq:
-	err = tcp_peek_sndq(sk, msg, len);
+	copied = tcp_peek_sndq(sk, msg, len);
 	goto out;
 }
 EXPORT_SYMBOL(tcp_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e07d52b8617a..039ac25be82f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1224,7 +1224,7 @@ EXPORT_SYMBOL(udp_ioctl);
  */
 
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len)
+		size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
@@ -1240,7 +1240,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index f3c27899f62b..a39aa9996b72 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -22,7 +22,7 @@ int compat_udp_getsockopt(struct sock *sk, int level, int optname,
 			  char __user *optval, int __user *optlen);
 #endif
 int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		size_t len, int noblock, int flags, int *addr_len);
+		size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 		 int flags);
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index b2dc60b0c764..1d267f89eb71 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -458,7 +458,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb)
 
 static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
@@ -475,7 +475,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
 		return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 60325236446a..d1f1b63cfbcd 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -380,7 +380,7 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 		  struct msghdr *msg, size_t len,
-		  int noblock, int flags, int *addr_len)
+		  int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct inet_sock *inet = inet_sk(sk);
@@ -400,7 +400,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
-				  &peeked, &off, &err);
+				  &peeked, &off, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index c779c3c90b9d..cd414d719977 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -26,7 +26,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, int optname,
 int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		  size_t len);
 int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-		  size_t len, int noblock, int flags, int *addr_len);
+		  size_t len, int noblock, int flags, int *addr_len, long *timeop);
 int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 void udpv6_destroy_sock(struct sock *sk);
 
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 91729b807c7d..4964c1e0ab03 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1756,7 +1756,7 @@ out:
 
 
 static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
-		struct msghdr *msg, size_t size, int flags)
+		struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct ipx_sock *ipxs = ipx_sk(sk);
@@ -1791,7 +1791,7 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &rc);
+				flags & MSG_DONTWAIT, &rc, timeop);
 	if (!skb) {
 		if (rc == -EAGAIN && (sk->sk_shutdown & RCV_SHUTDOWN))
 			rc = 0;
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 54747c25c86c..0991da69f39d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1373,7 +1373,7 @@ out:
  *    after being read, regardless of how much the user actually read
  */
 static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
-			      struct msghdr *msg, size_t size, int flags)
+			      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1384,7 +1384,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
 	IRDA_DEBUG(4, "%s()\n", __func__);
 
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		return err;
 
@@ -1422,7 +1422,7 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
  * Function irda_recvmsg_stream (iocb, sock, msg, size, flags)
  */
 static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
-			       struct msghdr *msg, size_t size, int flags)
+			       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct irda_sock *self = irda_sk(sk);
@@ -1445,7 +1445,7 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	do {
 		int chunk;
@@ -1480,8 +1480,10 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 
 			finish_wait(sk_sleep(sk), &wait);
 
-			if (err)
-				return err;
+			if (err) {
+				copied = err;
+				break;
+			}
 			if (sk->sk_shutdown & RCV_SHUTDOWN)
 				break;
 
@@ -1534,6 +1536,8 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 		}
 	}
 
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 }
 
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 8c9d7302c846..0714cf592dc2 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1314,7 +1314,7 @@ static void iucv_process_message_q(struct sock *sk)
 }
 
 static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -1335,7 +1335,7 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* receive/dequeue next skb:
 	 * the function understands MSG_PEEK and, thus, does not dequeue skb */
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
 			return 0;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index b47f8e542aae..925b45078c03 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3635,7 +3635,7 @@ out:
 
 static int pfkey_recvmsg(struct kiocb *kiocb,
 			 struct socket *sock, struct msghdr *msg, size_t len,
-			 int flags)
+			 int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct pfkey_sock *pfk = pfkey_sk(sk);
@@ -3646,7 +3646,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
 	if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT))
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 369a9822488c..4347233855cb 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -507,7 +507,7 @@ no_route:
 }
 
 static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
-			   size_t len, int noblock, int flags, int *addr_len)
+			   size_t len, int noblock, int flags, int *addr_len, long *timeop)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	size_t copied = 0;
@@ -518,7 +518,7 @@ static int l2tp_ip_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 	if (flags & MSG_OOB)
 		goto out;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index f3f98a156cee..6c839ba9d299 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -645,7 +645,7 @@ do_confirm:
 
 static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			    struct msghdr *msg, size_t len, int noblock,
-			    int flags, int *addr_len)
+			    int flags, int *addr_len, long *timeop)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
@@ -662,7 +662,7 @@ static int l2tp_ip6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (flags & MSG_ERRQUEUE)
 		return ipv6_recv_error(sk, msg, len, addr_len);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 950909f04ee6..9e6db6946e4f 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -187,7 +187,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff *skb)
  */
 static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 			    struct msghdr *msg, size_t len,
-			    int flags)
+			    int flags, long *timeop)
 {
 	int err;
 	struct sk_buff *skb;
@@ -199,7 +199,7 @@ static int pppol2tp_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	err = 0;
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-				flags & MSG_DONTWAIT, &err);
+				flags & MSG_DONTWAIT, &err, timeop);
 	if (!skb)
 		goto end;
 
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 0080d2b0a8ae..b5edf838f9fa 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -705,7 +705,7 @@ out:
  *	Returns non-negative upon success, negative otherwise.
  */
 static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	DECLARE_SOCKADDR(struct sockaddr_llc *, uaddr, msg->msg_name);
 	const int nonblock = flags & MSG_DONTWAIT;
@@ -725,7 +725,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))
 		goto out;
 
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	seq = &llc->copied_seq;
 	if (flags & MSG_PEEK) {
@@ -851,6 +851,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 out:
 	release_sock(sk);
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 copy_uaddr:
 	if (uaddr != NULL && skb != NULL) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index e0ccd84d4d67..d0b39b90d41a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2399,7 +2399,7 @@ out:
 
 static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 			   struct msghdr *msg, size_t len,
-			   int flags)
+			   int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
 	struct scm_cookie scm;
@@ -2415,7 +2415,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 
 	copied = 0;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (skb == NULL)
 		goto out;
 
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ede50d197e10..4a9078e2bf7a 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1134,7 +1134,7 @@ out:
 }
 
 static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
-		      struct msghdr *msg, size_t size, int flags)
+		      struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	DECLARE_SOCKADDR(struct sockaddr_ax25 *, sax, msg->msg_name);
@@ -1154,7 +1154,7 @@ static int nr_recvmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL) {
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL) {
 		release_sock(sk);
 		return er;
 	}
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 51f077a92fa9..0b233d1f1a57 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -794,7 +794,7 @@ static int llcp_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			     struct msghdr *msg, size_t len, int flags)
+			     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -817,7 +817,7 @@ static int llcp_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (flags & (MSG_OOB))
 		return -EOPNOTSUPP;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb) {
 		pr_err("Recv datagram failed state %d %d %d",
 		       sk->sk_state, err, sock_error(sk));
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index c27a6e86cae4..665d9523ce5c 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -228,7 +228,7 @@ static int rawsock_sendmsg(struct kiocb *iocb, struct socket *sock,
 }
 
 static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
-			   struct msghdr *msg, size_t len, int flags)
+			   struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	int noblock = flags & MSG_DONTWAIT;
 	struct sock *sk = sock->sk;
@@ -238,7 +238,7 @@ static int rawsock_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	pr_debug("sock=%p sk=%p len=%zu flags=%d\n", sock, sk, len, flags);
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rc);
+	skb = skb_recv_datagram(sk, flags, noblock, &rc, timeop);
 	if (!skb)
 		return rc;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b85c67ccb797..f56d816340e2 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2852,7 +2852,7 @@ out:
  */
 
 static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
-			  struct msghdr *msg, size_t len, int flags)
+			  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
@@ -2884,7 +2884,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	 *	but then it will block.
 	 */
 
-	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err);
+	skb = skb_recv_datagram(sk, flags, flags & MSG_DONTWAIT, &err, timeop);
 
 	/*
 	 *	An error occurred so return it. Because skb_recv_datagram()
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 290352c0e6b4..77eff48eeb83 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -127,7 +127,7 @@ static int pn_sendmsg(struct kiocb *iocb, struct sock *sk,
 
 static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb = NULL;
 	struct sockaddr_pn sa;
@@ -138,7 +138,7 @@ static int pn_recvmsg(struct kiocb *iocb, struct sock *sk,
 			MSG_CMSG_COMPAT))
 		goto out_nofree;
 
-	skb = skb_recv_datagram(sk, flags, noblock, &rval);
+	skb = skb_recv_datagram(sk, flags, noblock, &rval, timeop);
 	if (skb == NULL)
 		goto out_nofree;
 
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 70a547ea5177..c5832e1958f8 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -783,7 +783,7 @@ static struct sock *pep_sock_accept(struct sock *sk, int flags, int *errp)
 	u8 pipe_handle, enabled, n_sb;
 	u8 aligned = 0;
 
-	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp);
+	skb = skb_recv_datagram(sk, 0, flags & O_NONBLOCK, errp, NULL);
 	if (!skb)
 		return NULL;
 
@@ -1248,7 +1248,7 @@ struct sk_buff *pep_read(struct sock *sk)
 
 static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sk_buff *skb;
 	int err;
@@ -1277,7 +1277,7 @@ static int pep_recvmsg(struct kiocb *iocb, struct sock *sk,
 			return -EINVAL;
 	}
 
-	skb = skb_recv_datagram(sk, flags, noblock, &err);
+	skb = skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	lock_sock(sk);
 	if (skb == NULL) {
 		if (err == -ENOTCONN && sk->sk_state == TCP_CLOSE_WAIT)
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 48f8ffc60f8f..e511e569bbc9 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -706,7 +706,7 @@ void rds_inc_put(struct rds_incoming *inc);
 void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 		       struct rds_incoming *inc, gfp_t gfp);
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags);
+		size_t size, int msg_flags, long *timeop);
 void rds_clear_recv_queue(struct rds_sock *rs);
 int rds_notify_queue_get(struct rds_sock *rs, struct msghdr *msg);
 void rds_inc_info_copy(struct rds_incoming *inc,
diff --git a/net/rds/recv.c b/net/rds/recv.c
index bd82522534fc..6223a4b0fded 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -396,7 +396,7 @@ static int rds_cmsg_recv(struct rds_incoming *inc, struct msghdr *msg)
 }
 
 int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
-		size_t size, int msg_flags)
+		size_t size, int msg_flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rds_sock *rs = rds_sk_to_rs(sk);
@@ -406,7 +406,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	struct rds_incoming *inc = NULL;
 
 	/* udp_recvmsg()->sock_recvtimeo() gets away without locking too.. */
-	timeo = sock_rcvtimeo(sk, nonblock);
+	timeo = sock_rcvtimeop(sk, timeop, nonblock);
 
 	rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);
 
@@ -493,6 +493,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 		rds_inc_put(inc);
 
 out:
+	if (timeop)
+		*timeop = timeo;
 	return ret;
 }
 
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 8451c8cdc9de..2cfc75a1cbbb 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1212,7 +1212,7 @@ static int rose_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 
 static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *msg, size_t size, int flags)
+			struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct rose_sock *rose = rose_sk(sk);
@@ -1229,7 +1229,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return -ENOTCONN;
 
 	/* Now we can treat all alike */
-	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er)) == NULL)
+	if ((skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &er, timeop)) == NULL)
 		return er;
 
 	qbit = (skb->data[0] & ROSE_Q_BIT) == ROSE_Q_BIT;
diff --git a/net/rxrpc/ar-input.c b/net/rxrpc/ar-input.c
index 63b21e580de9..2319fae4b1f6 100644
--- a/net/rxrpc/ar-input.c
+++ b/net/rxrpc/ar-input.c
@@ -655,7 +655,7 @@ void rxrpc_data_ready(struct sock *sk)
 		return;
 	}
 
-	skb = skb_recv_datagram(sk, 0, 1, &ret);
+	skb = skb_recv_datagram(sk, 0, 1, &ret, NULL);
 	if (!skb) {
 		rxrpc_put_local(local);
 		if (ret == -EAGAIN)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ba9fd36d3f15..a21e51937e27 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -573,7 +573,7 @@ extern const struct file_operations rxrpc_connection_seq_fops;
  */
 void rxrpc_remove_user_ID(struct rxrpc_sock *, struct rxrpc_call *);
 int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t,
-		  int);
+		  int, long *);
 
 /*
  * ar-security.c
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index e9aaa65c0778..e9082ed598cd 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -44,7 +44,7 @@ void rxrpc_remove_user_ID(struct rxrpc_sock *rx, struct rxrpc_call *call)
  *   simultaneously
  */
 int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
-		  struct msghdr *msg, size_t len, int flags)
+		  struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct rxrpc_skb_priv *sp;
 	struct rxrpc_call *call = NULL, *continue_call = NULL;
@@ -63,7 +63,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	ullen = msg->msg_flags & MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
 
-	timeo = sock_rcvtimeo(&rx->sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(&rx->sk, timeop, flags & MSG_DONTWAIT);
 	msg->msg_flags |= MSG_MORE;
 
 	lock_sock(&rx->sk);
@@ -78,7 +78,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				if (continue_call)
 					rxrpc_put_call(continue_call);
-				return -ENODATA;
+				copied = -ENODATA;
+				goto out_copied;
 			}
 		}
 
@@ -135,7 +136,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				release_sock(&rx->sk);
 				rxrpc_put_call(continue_call);
 				_leave(" = %d [noncont]", copied);
-				return copied;
+				goto out_copied;
 			}
 		}
 
@@ -252,6 +253,9 @@ out:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d [data]", copied);
+out_copied:
+	if (timeop)
+		*timeop = timeo;
 	return copied;
 
 	/* handle non-DATA messages such as aborts, incoming connections and
@@ -328,7 +332,8 @@ terminal_message:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 copy_error:
 	_debug("copy error");
@@ -337,7 +342,8 @@ copy_error:
 	if (continue_call)
 		rxrpc_put_call(continue_call);
 	_leave(" = %d", ret);
-	return ret;
+	copied = ret;
+	goto out_copied;
 
 wait_interrupted:
 	ret = sock_intr_errno(timeo);
@@ -348,8 +354,7 @@ wait_error:
 	if (copied)
 		copied = ret;
 	_leave(" = %d [waitfail %d]", copied, ret);
-	return copied;
-
+	goto out_copied;
 }
 
 /**
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 429899689408..d05161a168bc 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2042,11 +2042,11 @@ static int sctp_skb_pull(struct sk_buff *skb, int len)
  *  flags   - flags sent or received with the user message, see Section
  *            5 for complete description of the flags.
  */
-static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *);
+static struct sk_buff *sctp_skb_recv_datagram(struct sock *, int, int, int *, long *);
 
 static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 			struct msghdr *msg, size_t len, int noblock,
-			int flags, int *addr_len)
+			int flags, int *addr_len, long *timeop)
 {
 	struct sctp_ulpevent *event = NULL;
 	struct sctp_sock *sp = sctp_sk(sk);
@@ -2066,7 +2066,7 @@ static int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 		goto out;
 	}
 
-	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err);
+	skb = sctp_skb_recv_datagram(sk, flags, noblock, &err, timeop);
 	if (!skb)
 		goto out;
 
@@ -6519,13 +6519,13 @@ out:
  * with a few changes to make lksctp work.
  */
 static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
-					      int noblock, int *err)
+					      int noblock, int *err, long *timeop)
 {
 	int error;
 	struct sk_buff *skb;
 	long timeo;
 
-	timeo = sock_rcvtimeo(sk, noblock);
+	timeo = sock_rcvtimeop(sk, timeop, noblock);
 
 	pr_debug("%s: timeo:%ld, max:%ld\n", __func__, timeo,
 		 MAX_SCHEDULE_TIMEOUT);
@@ -6549,7 +6549,7 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		}
 
 		if (skb)
-			return skb;
+			break;
 
 		/* Caller is allowed not to check sk->sk_err before calling. */
 		error = sock_error(sk);
@@ -6569,11 +6569,15 @@ static struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 			goto no_packet;
 	} while (sctp_wait_for_packet(sk, err, &timeo) == 0);
 
-	return NULL;
+out:
+	if (timeop)
+		*timeop = timeo;
+
+	return skb;
 
 no_packet:
 	*err = error;
-	return NULL;
+	goto out;
 }
 
 /* If sndbuf has changed, wake up per association sndbuf waiters.  */
diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..379be43879db 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -772,7 +772,7 @@ void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops);
 
 static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
-				       struct msghdr *msg, size_t size, int flags)
+				       struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	struct sock_iocb *si = kiocb_to_siocb(iocb);
 
@@ -782,19 +782,19 @@ static inline int __sock_recvmsg_nosec(struct kiocb *iocb, struct socket *sock,
 	si->size = size;
 	si->flags = flags;
 
-	return sock->ops->recvmsg(iocb, sock, msg, size, flags);
+	return sock->ops->recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
-				 struct msghdr *msg, size_t size, int flags)
+				 struct msghdr *msg, size_t size, int flags, long *timeop)
 {
 	int err = security_socket_recvmsg(sock, msg, size, flags);
 
-	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags);
+	return err ?: __sock_recvmsg_nosec(iocb, sock, msg, size, flags, timeop);
 }
 
 int sock_recvmsg(struct socket *sock, struct msghdr *msg,
-		 size_t size, int flags)
+		 size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -802,7 +802,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -810,7 +810,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
 EXPORT_SYMBOL(sock_recvmsg);
 
 static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
-			      size_t size, int flags)
+			      size_t size, int flags, long *timeop)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;
@@ -818,7 +818,7 @@ static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
 
 	init_sync_kiocb(&iocb, NULL);
 	iocb.private = &siocb;
-	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags);
+	ret = __sock_recvmsg_nosec(&iocb, sock, msg, size, flags, timeop);
 	if (-EIOCBQUEUED == ret)
 		ret = wait_on_sync_kiocb(&iocb);
 	return ret;
@@ -851,7 +851,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
 	 * iovec are identical, yielding the same in-core layout and alignment
 	 */
 	msg->msg_iov = (struct iovec *)vec, msg->msg_iovlen = num;
-	result = sock_recvmsg(sock, msg, size, flags);
+	result = sock_recvmsg(sock, msg, size, flags, NULL);
 	set_fs(oldfs);
 	return result;
 }
@@ -914,7 +914,7 @@ static ssize_t do_sock_read(struct msghdr *msg, struct kiocb *iocb,
 	msg->msg_iovlen = nr_segs;
 	msg->msg_flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
 
-	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags);
+	return __sock_recvmsg(iocb, sock, msg, size, msg->msg_flags, NULL);
 }
 
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
@@ -1862,7 +1862,7 @@ SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
 	msg.msg_namelen = 0;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
-	err = sock_recvmsg(sock, &msg, size, flags);
+	err = sock_recvmsg(sock, &msg, size, flags, NULL);
 
 	if (err >= 0 && addr != NULL) {
 		err2 = move_addr_to_user(&address,
@@ -2207,7 +2207,7 @@ SYSCALL_DEFINE4(sendmmsg, int, fd, struct mmsghdr __user *, mmsg,
 }
 
 static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
-			 struct msghdr *msg_sys, unsigned int flags, int nosec)
+			 struct msghdr *msg_sys, unsigned int flags, int nosec, long *timeop)
 {
 	struct compat_msghdr __user *msg_compat =
 	    (struct compat_msghdr __user *)msg;
@@ -2265,7 +2265,7 @@ static int ___sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys,
-							  total_len, flags);
+							  total_len, flags, timeop);
 	if (err < 0)
 		goto out_freeiov;
 	len = err;
@@ -2312,7 +2312,7 @@ long __sys_recvmsg(int fd, struct msghdr __user *msg, unsigned flags)
 	if (!sock)
 		goto out;
 
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0, NULL);
 
 	fput_light(sock->file, fput_needed);
 out:
@@ -2327,6 +2327,30 @@ SYSCALL_DEFINE3(recvmsg, int, fd, struct msghdr __user *, msg,
 	return __sys_recvmsg(fd, msg, flags);
 }
 
+static int sock_set_timeout_ts(long *timeo_p, struct timespec *ts)
+{
+	if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC)
+		return -EDOM;
+
+	if (ts->tv_sec < 0) {
+		static int warned __read_mostly;
+
+		*timeo_p = 0;
+		if (warned < 10 && net_ratelimit()) {
+			warned++;
+			pr_info("%s: `%s' (pid %d) tries to set negative timeout\n",
+				__func__, current->comm, task_pid_nr(current));
+		}
+		return 0;
+	}
+	*timeo_p = MAX_SCHEDULE_TIMEOUT;
+	if (ts->tv_sec == 0 && ts->tv_nsec == 0)
+		return 0;
+	if (ts->tv_sec < (MAX_SCHEDULE_TIMEOUT / HZ - 1))
+		*timeo_p = ts->tv_sec * HZ + (ts->tv_nsec + (NSEC_PER_SEC / HZ - 1)) / (NSEC_PER_SEC / HZ);
+	return 0;
+}
+
 /*
  *     Linux recvmmsg interface
  */
@@ -2339,12 +2363,14 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
-	struct timespec end_time;
+	long timeout_hz, *timeop = NULL;
 
-	if (timeout &&
-	    poll_select_set_timeout(&end_time, timeout->tv_sec,
-				    timeout->tv_nsec))
-		return -EINVAL;
+	if (timeout) {
+		err = sock_set_timeout_ts(&timeout_hz, timeout);
+		if (err)
+			return err;
+		timeop = &timeout_hz;
+	}
 
 	datagrams = 0;
 
@@ -2366,7 +2392,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (MSG_CMSG_COMPAT & flags) {
 			err = ___sys_recvmsg(sock, (struct msghdr __user *)compat_entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = __put_user(err, &compat_entry->msg_len);
@@ -2375,7 +2401,7 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 			err = ___sys_recvmsg(sock,
 					     (struct msghdr __user *)entry,
 					     &msg_sys, flags & ~MSG_WAITFORONE,
-					     datagrams);
+					     datagrams, timeop);
 			if (err < 0)
 				break;
 			err = put_user(err, &entry->msg_len);
@@ -2390,17 +2416,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		if (flags & MSG_WAITFORONE)
 			flags |= MSG_DONTWAIT;
 
-		if (timeout) {
-			ktime_get_ts(timeout);
-			*timeout = timespec_sub(end_time, *timeout);
-			if (timeout->tv_sec < 0) {
-				timeout->tv_sec = timeout->tv_nsec = 0;
-				break;
-			}
-
+		if (timeout && timeout_hz == 0) {
 			/* Timeout, return less than vlen datagrams */
-			if (timeout->tv_nsec == 0 && timeout->tv_sec == 0)
-				break;
+			timeout->tv_sec = timeout->tv_nsec = 0;
+			timeop = NULL;
+			break;
 		}
 
 		/* Out of band data, return right away */
@@ -2411,6 +2431,11 @@ int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 out_put:
 	fput_light(sock->file, fput_needed);
 
+	if (timeop) {
+		timeout->tv_sec	 = timeout_hz / HZ;
+		timeout->tv_nsec = (timeout_hz % HZ) * (NSEC_PER_SEC / HZ);
+	}
+
 	if (err == 0)
 		return datagrams;
 
@@ -2453,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
 
 	datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys);
 
-	if (datagrams > 0 &&
-	    copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
+	if (copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
 		datagrams = -EFAULT;
 
 	return datagrams;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 43bcb4699d69..e1e61082f45d 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -545,7 +545,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 	err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
 			     0, 0, MSG_PEEK | MSG_DONTWAIT);
 	if (err >= 0)
-		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err);
+		skb = skb_recv_datagram(svsk->sk_sk, 0, 1, &err, NULL);
 
 	if (skb == NULL) {
 		if (err != -EAGAIN) {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 1dec6043e4de..f0008257ca68 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -965,7 +965,7 @@ static void xs_local_data_ready(struct sock *sk)
 	if (xprt == NULL)
 		goto out;
 
-	skb = skb_recv_datagram(sk, 0, 1, &err);
+	skb = skb_recv_datagram(sk, 0, 1, &err, NULL);
 	if (skb == NULL)
 		goto out;
 
@@ -1027,7 +1027,7 @@ static void xs_udp_data_ready(struct sock *sk)
 	if (!(xprt = xprt_from_sock(sk)))
 		goto out;
 
-	if ((skb = skb_recv_datagram(sk, 0, 1, &err)) == NULL)
+	if ((skb = skb_recv_datagram(sk, 0, 1, &err, NULL)) == NULL)
 		goto out;
 
 	repsize = skb->len - sizeof(struct udphdr);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 08d87fc80b10..b4f7d923c9e2 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1031,7 +1031,7 @@ static int tipc_wait_for_rcvmsg(struct socket *sock, long *timeop)
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
-			struct msghdr *m, size_t buf_len, int flags)
+			struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
@@ -1054,7 +1054,7 @@ static int tipc_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
-	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	timeo = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 restart:
 
 	/* Look for a message in receive queue; wait if necessary */
@@ -1109,6 +1109,8 @@ restart:
 		advance_rx_queue(sk);
 	}
 exit:
+	if (timeop)
+		*timeop = timeo;
 	release_sock(sk);
 	return res;
 }
@@ -1126,7 +1128,7 @@ exit:
  * Returns size of returned message data, errno otherwise
  */
 static int tipc_recv_stream(struct kiocb *iocb, struct socket *sock,
-			    struct msghdr *m, size_t buf_len, int flags)
+			    struct msghdr *m, size_t buf_len, int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct tipc_sock *tsk = tipc_sk(sk);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b9114e0a5b1..3203defdb503 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -519,17 +519,17 @@ static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct kiocb *, struct socket *,
 			       struct msghdr *, size_t);
 static int unix_stream_recvmsg(struct kiocb *, struct socket *,
-			       struct msghdr *, size_t, int);
+			       struct msghdr *, size_t, int, long *);
 static int unix_dgram_sendmsg(struct kiocb *, struct socket *,
 			      struct msghdr *, size_t);
 static int unix_dgram_recvmsg(struct kiocb *, struct socket *,
-			      struct msghdr *, size_t, int);
+			      struct msghdr *, size_t, int, long *);
 static int unix_dgram_connect(struct socket *, struct sockaddr *,
 			      int, int);
 static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *,
 				  struct msghdr *, size_t);
 static int unix_seqpacket_recvmsg(struct kiocb *, struct socket *,
-				  struct msghdr *, size_t, int);
+				  struct msghdr *, size_t, int, long *);
 
 static int unix_set_peek_off(struct sock *sk, int val)
 {
@@ -1283,7 +1283,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags)
 	 * so that no locks are necessary.
 	 */
 
-	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err);
+	skb = skb_recv_datagram(sk, 0, flags&O_NONBLOCK, &err, NULL);
 	if (!skb) {
 		/* This means receive shutdown. */
 		if (err == 0)
@@ -1755,14 +1755,14 @@ static int unix_seqpacket_sendmsg(struct kiocb *kiocb, struct socket *sock,
 
 static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 
 	if (sk->sk_state != TCP_ESTABLISHED)
 		return -ENOTCONN;
 
-	return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
+	return unix_dgram_recvmsg(iocb, sock, msg, size, flags, timeop);
 }
 
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
@@ -1777,7 +1777,7 @@ static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 
 static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 			      struct msghdr *msg, size_t size,
-			      int flags)
+			      int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1803,7 +1803,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	skip = sk_peek_offset(sk, flags);
 
-	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err);
+	skb = __skb_recv_datagram(sk, flags, &peeked, &skip, &err, timeop);
 	if (!skb) {
 		unix_state_lock(sk);
 		/* Signal EOF on disconnected non-blocking SEQPACKET socket. */
@@ -1914,7 +1914,7 @@ static unsigned int unix_skb_len(const struct sk_buff *skb)
 
 static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			       struct msghdr *msg, size_t size,
-			       int flags)
+			       int flags, long *timeop)
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(iocb);
 	struct scm_cookie tmp_scm;
@@ -1926,7 +1926,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 	int check_creds = 0;
 	int target;
 	int err = 0;
-	long timeo;
+	long timeo = sock_rcvtimeop(sk, timeop, noblock);
 	int skip;
 
 	err = -EINVAL;
@@ -1938,7 +1938,6 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out;
 
 	target = sock_rcvlowat(sk, flags&MSG_WAITALL, size);
-	timeo = sock_rcvtimeo(sk, noblock);
 
 	/* Lock the socket to prevent queue disordering
 	 * while sleeps in memcpy_tomsg
@@ -2071,6 +2070,8 @@ again:
 	mutex_unlock(&u->readlock);
 	scm_recv(sock, msg, siocb->scm, flags);
 out:
+	if (timeop)
+		*timeop = timeo;
 	return copied ? : err;
 }
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 85d232bed87d..10568565f57d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1063,10 +1063,10 @@ out:
 }
 
 static int vsock_dgram_recvmsg(struct kiocb *kiocb, struct socket *sock,
-			       struct msghdr *msg, size_t len, int flags)
+			       struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	return transport->dgram_dequeue(kiocb, vsock_sk(sock->sk), msg, len,
-					flags);
+					flags, timeop);
 }
 
 static const struct proto_ops vsock_dgram_ops = {
@@ -1646,7 +1646,7 @@ out:
 static int
 vsock_stream_recvmsg(struct kiocb *kiocb,
 		     struct socket *sock,
-		     struct msghdr *msg, size_t len, int flags)
+		     struct msghdr *msg, size_t len, int flags, long *timeop)
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
@@ -1661,6 +1661,7 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
 	err = 0;
+	timeout = sock_rcvtimeop(sk, timeop, flags & MSG_DONTWAIT);
 
 	lock_sock(sk);
 
@@ -1711,7 +1712,6 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 		err = -ENOMEM;
 		goto out;
 	}
-	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 	copied = 0;
 
 	err = transport->notify_recv_init(vsk, target, &recv_data);
@@ -1821,6 +1821,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
 out_wait:
 	finish_wait(sk_sleep(sk), &wait);
 out:
+	if (timeop)
+		*timeop = timeout;
 	release_sock(sk);
 	return err;
 }
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 9bb63ffec4f2..9c9e43c17b34 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1733,7 +1733,7 @@ static int vmci_transport_dgram_enqueue(
 static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 					struct vsock_sock *vsk,
 					struct msghdr *msg, size_t len,
-					int flags)
+					int flags, long *timeop)
 {
 	int err;
 	int noblock;
@@ -1748,7 +1748,7 @@ static int vmci_transport_dgram_dequeue(struct kiocb *kiocb,
 
 	/* Retrieve the head sk_buff from the socket's receive queue. */
 	err = 0;
-	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err);
+	skb = skb_recv_datagram(&vsk->sk, flags, noblock, &err, timeop);
 	if (err)
 		return err;
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5ad4418ef093..da22c042469a 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1254,7 +1254,7 @@ out_kfree_skb:
 
 static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *msg, size_t size,
-		       int flags)
+		       int flags, long *timeop)
 {
 	struct sock *sk = sock->sk;
 	struct x25_sock *x25 = x25_sk(sk);
@@ -1306,7 +1306,7 @@ static int x25_recvmsg(struct kiocb *iocb, struct socket *sock,
 		/* Now we can treat all alike */
 		release_sock(sk);
 		skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
-					flags & MSG_DONTWAIT, &rc);
+					flags & MSG_DONTWAIT, &rc, timeop);
 		lock_sock(sk);
 		if (!skb)
 			goto out;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 20:30                   ` Arnaldo Carvalho de Melo
@ 2014-05-28  5:00                     ` Michael Kerrisk (man-pages)
  2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-28  5:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu:
>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>> <acme@ghostprotocols.net> wrote:
>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
>>>> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>>>>> Can you try the attached patch on top of the first one?
>>>
>>>> Patches on patches is a way to make your testers work unnecessarily
>>>> harder. Also, it means that anyone else who was interested in this
>>>
>>> It was meant to highlight the changes with regard to the previous patch,
>>> i.e. to make things easier for reviewing.
>>
>> (I don't think that works...)
> 
> Lets try both then, attached goes the updated patch, and this is the
> diff to the last combined one:

What tree does this apply to? I tried applying to 3.15-rc7, but a piece 
was rejected, and the fix was not obvious.

Cheers,

Michael


drivers/net/tun.c.rej

--- drivers/net/tun.c
+++ drivers/net/tun.c
@@ -1343,7 +1343,7 @@
 
        /* Read frames from queue */
        skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
-                                 &peeked, &off, &err);
+                                 &peeked, &off, &err, timeop);
        if (skb) {
                ret = tun_put_user(tun, tfile, skb, iv, len);
                kfree_skb(skb);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-27 20:30                   ` Arnaldo Carvalho de Melo
  2014-05-28  5:00                     ` Michael Kerrisk (man-pages)
@ 2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
  2014-05-28 15:07                       ` Arnaldo Carvalho de Melo
  2014-06-27 11:37                       ` Michael Kerrisk (man-pages)
  1 sibling, 2 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-28 12:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu:
>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>> <acme@ghostprotocols.net> wrote:
>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
>>>> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>>>>> Can you try the attached patch on top of the first one?
>>>
>>>> Patches on patches is a way to make your testers work unnecessarily
>>>> harder. Also, it means that anyone else who was interested in this
>>>
>>> It was meant to highlight the changes with regard to the previous patch,
>>> i.e. to make things easier for reviewing.
>>
>> (I don't think that works...)
> 
> Lets try both then, 

That's better!

> attached goes the updated patch, and this is the
> diff to the last combined one:
> 
> diff --git a/net/socket.c b/net/socket.c
> index 310a50971769..379be43879db 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
>  
>  	datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys);
>  
> -	if (datagrams > 0 &&
> -	    copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
> +	if (copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
>  		datagrams = -EFAULT;
>  
>  	return datagrams;
>  
> ------------------------------------------
> 
> This is a quick thing just to show where the problem lies, need to think
> how to report an -EFAULT at this point properly, i.e. look at
> __sys_recvmmsg for something related (returning the number of
> successfully copied datagrams to userspace while storing the error for
> subsequent reporting):
> 
>         if (err == 0)
>                 return datagrams;
> 
>         if (datagrams != 0) {
>                 /*
>                  * We may return less entries than requested (vlen) if
>                  * the
>                  * sock is non block and there aren't enough
>                  * datagrams...
>                  */
>                 if (err != -EAGAIN) {
>                         /*
>                          * ... or  if recvmsg returns an error after we
>                          * received some datagrams, where we record the
>                          * error to return on the next call or if the
>                          * app asks about it using getsockopt(SO_ERROR).
>                          */
>                         sock->sk->sk_err = -err;
>                 }
> 
>                 return datagrams;
>         }
> 
> I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
> more about it, sidetracked now, will be back to this.
> 
> Anyway, attached goes the current combined patch.

So, I applied against net-next as you suggested offlist.
Builds and generally tests fine. Some observations:

* In the case that the call is interrupted by a signal handler and no
  datagrams have been received, the call fails with EINTR, as expected.

* The call always updates 'timeout', both in the success case and in the
  EINTR case. (That seems fine.)

But, another question...

In the case that the call is interrupted by a signal handler and some
datagrams have already been received, then the call succeeds, and
returns the number of datagrams received, and 'timeout' is updated with
the remaining time. Maybe that's the right behavior, but I just want to
check. There is at least one other possibility:

* Fetch no datagrams (i.e., the datagrams are left to receive in a
  future call), and the call fails with EINTR, and 'timeout' is updated.

Maybe that possibility is hard to implement (not sure). But my main point
is to make the current behavior clear, note the alternative, and ask:
is the current behavior the best choice. (I'm not saying it's not, but I
do want the choice to be a conscious one.)

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
@ 2014-05-28 15:07                       ` Arnaldo Carvalho de Melo
  2014-05-28 15:17                         ` David Laight
  2014-06-27 11:37                       ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-05-28 15:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondřej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Wed, May 28, 2014 at 02:20:10PM +0200, Michael Kerrisk (man-pages) escreveu:
> On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> > attached goes the updated patch, and this is the
> > diff to the last combined one:
> > 
> > diff --git a/net/socket.c b/net/socket.c
> > index 310a50971769..379be43879db 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
> >  
> >  	datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys);
> >  
> > -	if (datagrams > 0 &&
> > -	    copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
> > +	if (copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
> >  		datagrams = -EFAULT;
> >  
> >  	return datagrams;
> >  
> > ------------------------------------------
> > 
> > This is a quick thing just to show where the problem lies, need to think
> > how to report an -EFAULT at this point properly, i.e. look at

Ok, so I can live with the way things were before this fix, i.e. if the
user specifies a timeout, then if it fails when copying to remaining
time to userspace (copy_to_user call above), then we return -EFAULT.

I.e. there would be no change in behaviour, but then perhaps we should
go with the interface that is in place when we received some datagrams
and then some error happens, see comment in the existing code, below:

> > __sys_recvmmsg for something related (returning the number of
> > successfully copied datagrams to userspace while storing the error for
> > subsequent reporting):
> > 
> >         if (err == 0)
> >                 return datagrams;
> > 
> >         if (datagrams != 0) {
> >                 /*
> >                  * We may return less entries than requested (vlen) if
> >                  * the sock is non block and there aren't enough
> >                  * datagrams...
> >                  */
> >                 if (err != -EAGAIN) {
> >                         /*
> >                          * ... or  if recvmsg returns an error after we
> >                          * received some datagrams, where we record the
> >                          * error to return on the next call or if the
> >                          * app asks about it using getsockopt(SO_ERROR).
> >                          */
> >                         sock->sk->sk_err = -err;
> >                 }
> > 
> >                 return datagrams;
> >         }
> > 
> > I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
> > more about it, sidetracked now, will be back to this.
> > 
> > Anyway, attached goes the current combined patch.
 
> So, I applied against net-next as you suggested offlist.
> Builds and generally tests fine. Some observations:
 
> * In the case that the call is interrupted by a signal handler and no
>   datagrams have been received, the call fails with EINTR, as expected.

Ok
 
> * The call always updates 'timeout', both in the success case and in the
>   EINTR case. (That seems fine.)

Agreed that it is how it should behave.
 
> But, another question...
> 
> In the case that the call is interrupted by a signal handler and some
> datagrams have already been received, then the call succeeds, and
> returns the number of datagrams received, and 'timeout' is updated with
> the remaining time. Maybe that's the right behavior, but I just want to

Note that what the comment in the existing code says should apply here,
namely that the next recv (m or mmsg) syscall on this socket will return
what is in sock->sk->sk_err, that is the signal:

  sys_recvmmsg()
      sock->ops->recvmsg() (e.g. inet_recvmsg)
          sk->prot->recvmsg() (e.g., udp_recvmsg)
              skb_recv_datagram()
                  wait_for_more_packets()
                      sock_intr_errno()
                        *err = -EINTR
      sock->sk->sk_err = err

Next recv will end up calling skb_recv_datagram and that does:

struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
                                    int *peeked, int *off, int *err, long *timeop)
{
        struct sk_buff *skb, *last;
        long timeo;
        /*
         * Caller is allowed not to check sk->sk_err before
         * skb_recv_datagram()
         */
        int error = sock_error(sk);

        if (error)
                goto no_packet;
<SNIP>
out:
	if (timeop)
		*timeop = timeo;
	return NULL;

no_packet:
	*err = error;
	goto out;
}

So, yes, the user _can_ process the packets already copied to userspace,
i.e. no packet loss, and then, on the next call, will receive the signal
notification.

So, the user can just try the next call and see the signal, and it is
also possible to notice that the timeout didn't expire and less than
vlen packets were received, so something went wrong and calling
getsockopt(SO_ERROR) will clarify things.

This is not some new error reporting facility, it predates recvmmsg,
that merely uses it for consistency.

How to properly report the -EFAULT when copying the remaining timeout to
userspace is the special case here, with my patches it will:

. copy n (less than vlen) packets to userspace successfully
. return -EFAULT, not n, just as before the patches being cooked now.

> check. There is at least one other possibility:
> 
> * Fetch no datagrams (i.e., the datagrams are left to receive in a
>   future call), and the call fails with EINTR, and 'timeout' is updated.

Humm, then we would have to go back to the protocol layer and re-add the
packets to queues, etc, etc, not feasible, I'd say, too much state was
lost already, we would have to have some sort of commit interface
(perhaps using peek, but that gets crazy quickly with multithreaded
apps), guess its a super intrusive path to follow, not worth it, I
think.

> Maybe that possibility is hard to implement (not sure). But my main point
> is to make the current behavior clear, note the alternative, and ask:
> is the current behavior the best choice. (I'm not saying it's not, but I
> do want the choice to be a conscious one.)

Well, my main pet peeve here is how to report that we managed to copy
the datagrams but failed to copy the remaining timeout and then don't
report how many datagrams were successfully copied.

I'm inclined to say that failing to copy the timeout is something so
unlikely, even more since we managed to copy it from userspace to kernel
space at that point in time, that we should keep the current behaviour
and report that something terribly wrong happened, i.e. -EFAULT when
copying the timeout.

Thanks a lot for testing all this, was a pity you were not around when
we first designed and implemented this syscall.

And also it would be really nice if the people in the CC list commented
on this last round of discussion about fixing the timeout behavior.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 15:07                       ` Arnaldo Carvalho de Melo
@ 2014-05-28 15:17                         ` David Laight
  2014-05-28 19:50                           ` 'Arnaldo Carvalho de Melo'
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2014-05-28 15:17 UTC (permalink / raw)
  To: 'Arnaldo Carvalho de Melo', Michael Kerrisk (man-pages)
  Cc: lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

From: Arnaldo Carvalho de Melo
...
> > But, another question...
> >
> > In the case that the call is interrupted by a signal handler and some
> > datagrams have already been received, then the call succeeds, and
> > returns the number of datagrams received, and 'timeout' is updated with
> > the remaining time. Maybe that's the right behavior, but I just want to
> 
> Note that what the comment in the existing code says should apply here,
> namely that the next recv (m or mmsg) syscall on this socket will return
> what is in sock->sk->sk_err, that is the signal:
> 
...
> 
> So, yes, the user _can_ process the packets already copied to userspace,
> i.e. no packet loss, and then, on the next call, will receive the signal
> notification.

The application shouldn't need to see an EINTR response, any signal handler
should be run when the system call returns to user (regardless of the
system call result code).
If that doesn't happen Linux is badly broken!
>From an application point of view this is exactly the same as the signal
occurring just before/after the kernel entry/exit for the system call.

The call should just return early with success status.
No need to preserve the EINTR response for later.

The same might be appropriate for other errors - maybe including EFAULT
copying non-initial messages to userspace.
Put the message being processed back on the socket queue and return
success with the (non-zero) partial message count.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 15:17                         ` David Laight
@ 2014-05-28 19:50                           ` 'Arnaldo Carvalho de Melo'
  2014-05-28 21:33                             ` Chris Friesen
  2014-05-29 10:53                             ` David Laight
  0 siblings, 2 replies; 37+ messages in thread
From: 'Arnaldo Carvalho de Melo' @ 2014-05-28 19:50 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Wed, May 28, 2014 at 03:17:40PM +0000, David Laight escreveu:
> From: Arnaldo Carvalho de Melo
> ...
> > > But, another question...
> > >
> > > In the case that the call is interrupted by a signal handler and some
> > > datagrams have already been received, then the call succeeds, and
> > > returns the number of datagrams received, and 'timeout' is updated with
> > > the remaining time. Maybe that's the right behavior, but I just want to
 
> > Note that what the comment in the existing code says should apply here,
> > namely that the next recv (m or mmsg) syscall on this socket will return
> > what is in sock->sk->sk_err, that is the signal:
 
> ...
 
> > So, yes, the user _can_ process the packets already copied to userspace,
> > i.e. no packet loss, and then, on the next call, will receive the signal
> > notification.
 
> The application shouldn't need to see an EINTR response, any signal handler
> should be run when the system call returns to user (regardless of the
> system call result code).
> If that doesn't happen Linux is badly broken!
> >From an application point of view this is exactly the same as the signal
> occurring just before/after the kernel entry/exit for the system call.
> 
> The call should just return early with success status.
> No need to preserve the EINTR response for later.
> 
> The same might be appropriate for other errors - maybe including EFAULT
> copying non-initial messages to userspace.
> Put the message being processed back on the socket queue and return
> success with the (non-zero) partial message count.

We don't need to put anything back, if we get an EFAULT for a datagram,
then we stop processing that packet, _dropping_ it (and that is just
like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
and stop the batch, and if no datagrams were received, return the error
straight away.

But if some datagrams were successfully received, and at that point
_already_ removed from queues and sent successfully to userspace,
recvmmsg will return the number of successfully copied datagrams and
store the error so that it can return on the next syscall,

Please refer to the original discussion on how to report how many
successfully copied datagrams and also report that it stopped before the
timeout and the number of requested datagrams in a batch:

http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@nokia.com

What is being discussed here is how to return the EFAULT that may happen
_after_ datagram processing, be it interrupted by an EFAULT, signal, or
plain returning all that was requested, with no errors.

This EFAULT _after_ datagram processing may happen when updating the
remaining timeout, because then how can userspace both receive the
number of successfully copied datagrams (in any of the cases mentioned
in the previous paragraph) and know that that timeout can't be used
because there was a problem while trying to copy it to userspace
(EFAULT)?

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 19:50                           ` 'Arnaldo Carvalho de Melo'
@ 2014-05-28 21:33                             ` Chris Friesen
  2014-05-28 21:49                               ` 'Arnaldo Carvalho de Melo'
  2014-05-29 10:53                             ` David Laight
  1 sibling, 1 reply; 37+ messages in thread
From: Chris Friesen @ 2014-05-28 21:33 UTC (permalink / raw)
  To: 'Arnaldo Carvalho de Melo', David Laight
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore

On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:

> What is being discussed here is how to return the EFAULT that may happen
> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> plain returning all that was requested, with no errors.
>
> This EFAULT _after_ datagram processing may happen when updating the
> remaining timeout, because then how can userspace both receive the
> number of successfully copied datagrams (in any of the cases mentioned
> in the previous paragraph) and know that that timeout can't be used
> because there was a problem while trying to copy it to userspace
> (EFAULT)?


How does select() handle this problem?  It updates the timeout and also 
modifies other data.

Could we just check whether the timeout pointer is valid before doing 
anything else?  Of course we could still fault the page out while 
waiting for messages and then fail to fault it back in later, but that 
seems like a not-very-likely scenario.

Chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 21:33                             ` Chris Friesen
@ 2014-05-28 21:49                               ` 'Arnaldo Carvalho de Melo'
  0 siblings, 0 replies; 37+ messages in thread
From: 'Arnaldo Carvalho de Melo' @ 2014-05-28 21:49 UTC (permalink / raw)
  To: Chris Friesen
  Cc: David Laight, Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore

Em Wed, May 28, 2014 at 03:33:51PM -0600, Chris Friesen escreveu:
> On 05/28/2014 01:50 PM, 'Arnaldo Carvalho de Melo' wrote:
 
> >What is being discussed here is how to return the EFAULT that may happen
> >_after_ datagram processing, be it interrupted by an EFAULT, signal, or
> >plain returning all that was requested, with no errors.

> >This EFAULT _after_ datagram processing may happen when updating the
> >remaining timeout, because then how can userspace both receive the
> >number of successfully copied datagrams (in any of the cases mentioned
> >in the previous paragraph) and know that that timeout can't be used
> >because there was a problem while trying to copy it to userspace
> >(EFAULT)?
 
> How does select() handle this problem?  It updates the timeout and also
> modifies other data.
 
> Could we just check whether the timeout pointer is valid before doing
> anything else?  Of course we could still fault the page out while waiting
> for messages and then fail to fault it back in later, but that seems like a
> not-very-likely scenario.

I'll check how select behaves, and yes, I think it is not-very-likely
and what we're doing now is reasonable for datagram protocols, i.e. to
return -EFAULT when updating the timeout fails, not reporting if packets
were successfully received, i.e. they end up being "dropped", as
userspace can't easily figure out if some was received short of painting
it with some pattern and then checking the ones that aren't with that
pattern.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 19:50                           ` 'Arnaldo Carvalho de Melo'
  2014-05-28 21:33                             ` Chris Friesen
@ 2014-05-29 10:53                             ` David Laight
  2014-05-29 13:55                               ` 'Arnaldo Carvalho de Melo'
  2014-05-29 14:07                               ` Michael Kerrisk (man-pages)
  1 sibling, 2 replies; 37+ messages in thread
From: David Laight @ 2014-05-29 10:53 UTC (permalink / raw)
  To: 'Arnaldo Carvalho de Melo'
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

From: 'Arnaldo Carvalho de 
...
> > > So, yes, the user _can_ process the packets already copied to userspace,
> > > i.e. no packet loss, and then, on the next call, will receive the signal
> > > notification.
> 
> > The application shouldn't need to see an EINTR response, any signal handler
> > should be run when the system call returns to user (regardless of the
> > system call result code).
> > If that doesn't happen Linux is badly broken!
> > >From an application point of view this is exactly the same as the signal
> > occurring just before/after the kernel entry/exit for the system call.
> >
> > The call should just return early with success status.
> > No need to preserve the EINTR response for later.
> >
> > The same might be appropriate for other errors - maybe including EFAULT
> > copying non-initial messages to userspace.
> > Put the message being processed back on the socket queue and return
> > success with the (non-zero) partial message count.
> 
> We don't need to put anything back, if we get an EFAULT for a datagram,
> then we stop processing that packet, _dropping_ it (and that is just
> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
> and stop the batch, and if no datagrams were received, return the error
> straight away.
> 
> But if some datagrams were successfully received, and at that point
> _already_ removed from queues and sent successfully to userspace,
> recvmmsg will return the number of successfully copied datagrams and
> store the error so that it can return on the next syscall,

That just doesn't make any sense.
Saving an errno code would only make any sense if the error were a
property of the socket - but EFAULT is a property of the system call,
and EINTR a property of the process (it exists so that the process
can return to userspace to execute a signal handler - relying on
SIGALRM to timeout blocking system calls is a recipe for disaster).

The next system call could be from an entirely different process,
neither EFAULT nor EINTR would mean anything to it at all.

ISTR that returning EFAULT generates a signal that will typically
terminate the process.
You definitely don't want to send one to a different process.

> Please refer to the original discussion on how to report how many
> successfully copied datagrams and also report that it stopped before the
> timeout and the number of requested datagrams in a batch:
> 
> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@nokia.com

I do remember the original problem.
I don't recall error reporting being referenced.

> What is being discussed here is how to return the EFAULT that may happen
> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> plain returning all that was requested, with no errors.

I remember some discussions from an XNET standards meeting (I've forgotten
exactly which errors on which calls were being discussed).
My recollection is that you return success with a partial transfer
count for ANY error that happens after some data has been transferred.
The actual error will be returned when it happens again on the next
system call - Note the AGAIN, not a saved error.

Things like blocking send/write being interrupted spring to mind.
Possibly even copyin/out failures part way through a read/write call.

> This EFAULT _after_ datagram processing may happen when updating the
> remaining timeout, because then how can userspace both receive the
> number of successfully copied datagrams (in any of the cases mentioned
> in the previous paragraph) and know that that timeout can't be used
> because there was a problem while trying to copy it to userspace
> (EFAULT)?

Failure to write the control structure back to userspace probably
deserves an EFAULT return - the application is buggy.
IIRC normal recvmsg() copies out the control structure at the end
of processing - that can fail.
I wouldn't worry about datagram discards on any of those late
EFAULT conditions.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 10:53                             ` David Laight
@ 2014-05-29 13:55                               ` 'Arnaldo Carvalho de Melo'
  2014-05-29 14:06                                 ` David Laight
  2014-05-29 14:07                               ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 37+ messages in thread
From: 'Arnaldo Carvalho de Melo' @ 2014-05-29 13:55 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Thu, May 29, 2014 at 10:53:22AM +0000, David Laight escreveu:
> From: 'Arnaldo Carvalho de 
> ...
> > > > So, yes, the user _can_ process the packets already copied to userspace,
> > > > i.e. no packet loss, and then, on the next call, will receive the signal
> > > > notification.
> > 
> > > The application shouldn't need to see an EINTR response, any signal handler
> > > should be run when the system call returns to user (regardless of the
> > > system call result code).
> > > If that doesn't happen Linux is badly broken!
> > > >From an application point of view this is exactly the same as the signal
> > > occurring just before/after the kernel entry/exit for the system call.

> > > The call should just return early with success status.
> > > No need to preserve the EINTR response for later.

> > > The same might be appropriate for other errors - maybe including EFAULT
> > > copying non-initial messages to userspace.
> > > Put the message being processed back on the socket queue and return
> > > success with the (non-zero) partial message count.
 
> > We don't need to put anything back, if we get an EFAULT for a datagram,
> > then we stop processing that packet, _dropping_ it (and that is just
> > like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
> > and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
> > and stop the batch, and if no datagrams were received, return the error
> > straight away.
 
> > But if some datagrams were successfully received, and at that point
> > _already_ removed from queues and sent successfully to userspace,
> > recvmmsg will return the number of successfully copied datagrams and
> > store the error so that it can return on the next syscall,
 
> That just doesn't make any sense.

Yeah for things like EFAULT, storing it in a per socket area for later
reporting is a bug, a separate bug.

> Saving an errno code would only make any sense if the error were a
> property of the socket - but EFAULT is a property of the system call,

Agreed, so for the errors that are socket related, the mechanism should
work, not for things that are thread specific, then we should either
straight away signal it despite of any successfully received packets in
the batch so far in the current recvmmsg syscall or mimic what would
happen if the user issued multiple recvmsg syscalls instead, i.e. in the
next call _for this thread_, the EFAULT will take place.

> and EINTR a property of the process (it exists so that the process
> can return to userspace to execute a signal handler - relying on
> SIGALRM to timeout blocking system calls is a recipe for disaster).
> 
> The next system call could be from an entirely different process,
> neither EFAULT nor EINTR would mean anything to it at all.

Right, storing thread specific errors on the socket is a bug and has to
be fixed. I.e. _if_ we keep the saving error for next syscall strategy,
then that error has, for the per thread cases, be stored in a per thread
area error field for socket operations.
 
> ISTR that returning EFAULT generates a signal that will typically
> terminate the process.
> You definitely don't want to send one to a different process.

Right.
 
> > Please refer to the original discussion on how to report how many
> > successfully copied datagrams and also report that it stopped before the
> > timeout and the number of requested datagrams in a batch:
 
> > http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@nokia.com
 
> I do remember the original problem.
> I don't recall error reporting being referenced.
 
> > What is being discussed here is how to return the EFAULT that may happen
> > _after_ datagram processing, be it interrupted by an EFAULT, signal, or
> > plain returning all that was requested, with no errors.

> I remember some discussions from an XNET standards meeting (I've forgotten
> exactly which errors on which calls were being discussed).
> My recollection is that you return success with a partial transfer
> count for ANY error that happens after some data has been transferred.
> The actual error will be returned when it happens again on the next
> system call - Note the AGAIN, not a saved error.

A saved error, for the right entity, in the recvmmsg case, that
basically is batching multiple recvmsg syscalls, doesn't sound like a
problem, i.e. the idea is to, as much as possible, mimic what multiple
recvmsg calls would do, but reduce its in/out kernel (and inside kernel
subsystems) overhead.

Perhaps we can have something in between, i.e. for things like EFAULT,
we should report straight away, effectively dropping whatever datagrams
successfully received in the current batch, do you agree?

For transient errors the existing mechanism, fixed so that only per
socket errors are saved for later, as today, could be kept?
 
> Things like blocking send/write being interrupted spring to mind.
> Possibly even copyin/out failures part way through a read/write call.
> 
> > This EFAULT _after_ datagram processing may happen when updating the
> > remaining timeout, because then how can userspace both receive the
> > number of successfully copied datagrams (in any of the cases mentioned
> > in the previous paragraph) and know that that timeout can't be used
> > because there was a problem while trying to copy it to userspace
> > (EFAULT)?
> 
> Failure to write the control structure back to userspace probably
> deserves an EFAULT return - the application is buggy.
> IIRC normal recvmsg() copies out the control structure at the end
> of processing - that can fail.
> I wouldn't worry about datagram discards on any of those late
> EFAULT conditions.

This part we all seem to be in agreement, so I'll just leave it as is,
i.e. it doesn't matter that the actual packet receiving part was
(partially) successful, if the copy_to_user(remaining timeout) fails,
EFAULT should be returned.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 13:55                               ` 'Arnaldo Carvalho de Melo'
@ 2014-05-29 14:06                                 ` David Laight
  2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
  0 siblings, 1 reply; 37+ messages in thread
From: David Laight @ 2014-05-29 14:06 UTC (permalink / raw)
  To: 'Arnaldo Carvalho de Melo'
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

From: 'Arnaldo Carvalho de Melo'
...
> > I remember some discussions from an XNET standards meeting (I've forgotten
> > exactly which errors on which calls were being discussed).
> > My recollection is that you return success with a partial transfer
> > count for ANY error that happens after some data has been transferred.
> > The actual error will be returned when it happens again on the next
> > system call - Note the AGAIN, not a saved error.
> 
> A saved error, for the right entity, in the recvmmsg case, that
> basically is batching multiple recvmsg syscalls, doesn't sound like a
> problem, i.e. the idea is to, as much as possible, mimic what multiple
> recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> subsystems) overhead.
> 
> Perhaps we can have something in between, i.e. for things like EFAULT,
> we should report straight away, effectively dropping whatever datagrams
> successfully received in the current batch, do you agree?

Not unreasonable - EFAULT shouldn't happen unless the application
is buggy.

> For transient errors the existing mechanism, fixed so that only per
> socket errors are saved for later, as today, could be kept?

I don't think it is ever necessary to save an errno value for the
next system call at all.
Just process the next system call and see what happens.

If the call returns with less than the maximum number of datagrams
and with a non-zero timeout left - then the application can infer
that it was terminated by an abnormal event of some kind.
This might be a signal.
I'm not sure if an icmp error on a connected datagram socket could
generate a 'disconnect'. It might happen if the interface is being
used for something like SCTP.
In either case the next call will detect the error.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 10:53                             ` David Laight
  2014-05-29 13:55                               ` 'Arnaldo Carvalho de Melo'
@ 2014-05-29 14:07                               ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-29 14:07 UTC (permalink / raw)
  To: David Laight, 'Arnaldo Carvalho de Melo'
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondrej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

On 05/29/2014 12:53 PM, David Laight wrote:
> From: 'Arnaldo Carvalho de 
> ...
>>>> So, yes, the user _can_ process the packets already copied to userspace,
>>>> i.e. no packet loss, and then, on the next call, will receive the signal
>>>> notification.
>>
>>> The application shouldn't need to see an EINTR response, any signal handler
>>> should be run when the system call returns to user (regardless of the
>>> system call result code).
>>> If that doesn't happen Linux is badly broken!
>>> >From an application point of view this is exactly the same as the signal
>>> occurring just before/after the kernel entry/exit for the system call.
>>>
>>> The call should just return early with success status.
>>> No need to preserve the EINTR response for later.
>>>
>>> The same might be appropriate for other errors - maybe including EFAULT
>>> copying non-initial messages to userspace.
>>> Put the message being processed back on the socket queue and return
>>> success with the (non-zero) partial message count.
>>
>> We don't need to put anything back, if we get an EFAULT for a datagram,
>> then we stop processing that packet, _dropping_ it (and that is just
>> like recvmsg works, look at __skb_recv_datagram, the skb_unlink there,
>> and udp_recvmsg, what happens if skb_copy_and_csum_datagram_iovec fails)
>> and stop the batch, and if no datagrams were received, return the error
>> straight away.
>>
>> But if some datagrams were successfully received, and at that point
>> _already_ removed from queues and sent successfully to userspace,
>> recvmmsg will return the number of successfully copied datagrams and
>> store the error so that it can return on the next syscall,
> 
> That just doesn't make any sense.

Agreed.

> Saving an errno code would only make any sense if the error were a
> property of the socket - 

Back in http://marc.info/?l=linux-netdev&m=124298156121906&w=2
(the follow-on from the discussion that Arnaldo mentions below), 
it was noted:

: Normally you'd expect the call to return what it has read without an
: error, and then the socket error would be picked up on the next call.

and the key point in that sentence was "*socket* error."

> but EFAULT is a property of the system call,
> and EINTR a property of the process (it exists so that the process
> can return to userspace to execute a signal handler - relying on
> SIGALRM to timeout blocking system calls is a recipe for disaster).

Exactly. Interruption by a signal should just result in an early
success return, unless no datagrams have been received so far, in
which case it should produce an EINTR failure. No error should be 
saved for a future call.

> The next system call could be from an entirely different process,
> neither EFAULT nor EINTR would mean anything to it at all.
> 
> ISTR that returning EFAULT generates a signal that will typically
> terminate the process.

Not generally, I think. (I think you're thinking of SIGSEGV when
a process  touches a nonexistent address in user mode.)

> You definitely don't want to send one to a different process.

But it's true that the EFAULT or EINTR shouldn't be returned
to another process.
 
>> Please refer to the original discussion on how to report how many
>> successfully copied datagrams and also report that it stopped before the
>> timeout and the number of requested datagrams in a batch:
>>
>> http://lkml.kernel.org/r/200905221022.48790.remi.denis-courmont@nokia.com
> 
> I do remember the original problem.
> I don't recall error reporting being referenced.

(See above.)

>> What is being discussed here is how to return the EFAULT that may happen
>> _after_ datagram processing, be it interrupted by an EFAULT, signal, or
>> plain returning all that was requested, with no errors.
> 
> I remember some discussions from an XNET standards meeting (I've forgotten
> exactly which errors on which calls were being discussed).
> My recollection is that you return success with a partial transfer
> count for ANY error that happens after some data has been transferred.
> The actual error will be returned when it happens again on the next
> system call - Note the AGAIN, not a saved error.
> 
> Things like blocking send/write being interrupted spring to mind.
> Possibly even copyin/out failures part way through a read/write call.
> 
>> This EFAULT _after_ datagram processing may happen when updating the
>> remaining timeout, because then how can userspace both receive the
>> number of successfully copied datagrams (in any of the cases mentioned
>> in the previous paragraph) and know that that timeout can't be used
>> because there was a problem while trying to copy it to userspace
>> (EFAULT)?
> 
> Failure to write the control structure back to userspace probably
> deserves an EFAULT return - the application is buggy.
> IIRC normal recvmsg() copies out the control structure at the end
> of processing - that can fail.
> I wouldn't worry about datagram discards on any of those late
> EFAULT conditions.

Agree on all of the above, and that last point certainly seems
like the right approach to me.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 14:06                                 ` David Laight
@ 2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
  2014-05-29 14:40                                     ` David Laight
                                                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: 'Arnaldo Carvalho de Melo' @ 2014-05-29 14:17 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> From: 'Arnaldo Carvalho de Melo'
> ...
> > > I remember some discussions from an XNET standards meeting (I've forgotten
> > > exactly which errors on which calls were being discussed).
> > > My recollection is that you return success with a partial transfer
> > > count for ANY error that happens after some data has been transferred.
> > > The actual error will be returned when it happens again on the next
> > > system call - Note the AGAIN, not a saved error.

> > A saved error, for the right entity, in the recvmmsg case, that
> > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > subsystems) overhead.

> > Perhaps we can have something in between, i.e. for things like EFAULT,
> > we should report straight away, effectively dropping whatever datagrams
> > successfully received in the current batch, do you agree?
 
> Not unreasonable - EFAULT shouldn't happen unless the application
> is buggy.

Ok.
 
> > For transient errors the existing mechanism, fixed so that only per
> > socket errors are saved for later, as today, could be kept?
 
> I don't think it is ever necessary to save an errno value for the
> next system call at all.
> Just process the next system call and see what happens.
 
> If the call returns with less than the maximum number of datagrams
> and with a non-zero timeout left - then the application can infer
> that it was terminated by an abnormal event of some kind.
> This might be a signal.

Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
error on the next call, but we provide a way for the app to retrieve the
reason for the smaller than expected batch?

> I'm not sure if an icmp error on a connected datagram socket could
> generate a 'disconnect'. It might happen if the interface is being
> used for something like SCTP.
> In either case the next call will detect the error.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
@ 2014-05-29 14:40                                     ` David Laight
  2014-05-29 15:33                                     ` [PATCH/RFC] Handle EFAULT in partial recvmmsg was " 'Arnaldo Carvalho de Melo'
  2014-06-16  9:58                                     ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 37+ messages in thread
From: David Laight @ 2014-05-29 14:40 UTC (permalink / raw)
  To: 'Arnaldo Carvalho de Melo'
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

From: 'Arnaldo Carvalho de Melo'
> Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> > From: 'Arnaldo Carvalho de Melo'
> > ...
> > > > I remember some discussions from an XNET standards meeting (I've forgotten
> > > > exactly which errors on which calls were being discussed).
> > > > My recollection is that you return success with a partial transfer
> > > > count for ANY error that happens after some data has been transferred.
> > > > The actual error will be returned when it happens again on the next
> > > > system call - Note the AGAIN, not a saved error.
> 
> > > A saved error, for the right entity, in the recvmmsg case, that
> > > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > > subsystems) overhead.
> 
> > > Perhaps we can have something in between, i.e. for things like EFAULT,
> > > we should report straight away, effectively dropping whatever datagrams
> > > successfully received in the current batch, do you agree?
> 
> > Not unreasonable - EFAULT shouldn't happen unless the application
> > is buggy.
> 
> Ok.
> 
> > > For transient errors the existing mechanism, fixed so that only per
> > > socket errors are saved for later, as today, could be kept?
> 
> > I don't think it is ever necessary to save an errno value for the
> > next system call at all.
> > Just process the next system call and see what happens.
> 
> > If the call returns with less than the maximum number of datagrams
> > and with a non-zero timeout left - then the application can infer
> > that it was terminated by an abnormal event of some kind.
> > This might be a signal.
> 
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?

If you really think it is necessary, then you want a field in the
control structure.
But IMHO returning the 'time left' is more than enough.

IIRC the original problem was that the user-specified timeout
was used as an inter-datagram timer instead of an overall timeout.

I suspect that most application won't actually care about the
'time left', nor the actual number of returned datagrams.
They will just process what they are given and then wait for
the next batch.

	David




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH/RFC] Handle EFAULT in partial recvmmsg was Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
  2014-05-29 14:40                                     ` David Laight
@ 2014-05-29 15:33                                     ` 'Arnaldo Carvalho de Melo'
  2014-06-16  9:58                                     ` Michael Kerrisk (man-pages)
  2 siblings, 0 replies; 37+ messages in thread
From: 'Arnaldo Carvalho de Melo' @ 2014-05-29 15:33 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Kerrisk (man-pages),
	lkml, linux-man, netdev, Ondrej Bílka, Caitlin Bestler,
	Neil Horman, Elie De Brauwer, David Miller, Steven Whitehouse,
	Rémi Denis-Courmont, Paul Moore, Chris Friesen

Em Thu, May 29, 2014 at 11:17:05AM -0300, 'Arnaldo Carvalho de Melo' escreveu:
> Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> > From: 'Arnaldo Carvalho de Melo'
> > ...
> > > > I remember some discussions from an XNET standards meeting (I've forgotten
> > > > exactly which errors on which calls were being discussed).
> > > > My recollection is that you return success with a partial transfer
> > > > count for ANY error that happens after some data has been transferred.
> > > > The actual error will be returned when it happens again on the next
> > > > system call - Note the AGAIN, not a saved error.
 
> > > A saved error, for the right entity, in the recvmmsg case, that
> > > basically is batching multiple recvmsg syscalls, doesn't sound like a
> > > problem, i.e. the idea is to, as much as possible, mimic what multiple
> > > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> > > subsystems) overhead.
 
> > > Perhaps we can have something in between, i.e. for things like EFAULT,
> > > we should report straight away, effectively dropping whatever datagrams
> > > successfully received in the current batch, do you agree?
  
> > Not unreasonable - EFAULT shouldn't happen unless the application
> > is buggy.
 
> Ok.

So the patch below should handle it, and record that the packets were
dropped, not at the transport level, like UDP_MIB_INERRORS, for
instance, would indicate, but at the batching, recvmmsg level, so
perhaps we'll need a MIB variable for that.

Also a counterpart to the trace_kfree_skb(skb, udp_recvmsg) tracepoint
for dropwatch and similar tools to use, Neil?

I'm keeping this separate from the timeout update patch.

- Arnaldo

diff --git a/net/socket.c b/net/socket.c
index abf56b2a14f9..63491f015912 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2415,13 +2415,17 @@ out_put:
 		return datagrams;
 
 	if (datagrams != 0) {
+		if (err == -EFAULT) {
+			atomic_add(datagrams, &sock->sk->sk_drops);
+			return -EFAULT;
+		}
 		/*
 		 * We may return less entries than requested (vlen) if the
 		 * sock is non block and there aren't enough datagrams...
 		 */
 		if (err != -EAGAIN) {
 			/*
-			 * ... or  if recvmsg returns an error after we
+			 * ... or if recvmsg returns a socket error after we
 			 * received some datagrams, where we record the
 			 * error to return on the next call or if the
 			 * app asks about it using getsockopt(SO_ERROR).

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
  2014-05-29 14:40                                     ` David Laight
  2014-05-29 15:33                                     ` [PATCH/RFC] Handle EFAULT in partial recvmmsg was " 'Arnaldo Carvalho de Melo'
@ 2014-06-16  9:58                                     ` Michael Kerrisk (man-pages)
  2014-06-24 20:25                                       ` Arnaldo Carvalho de Melo
  2 siblings, 1 reply; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-06-16  9:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Laight, lkml, linux-man, netdev, Ondrej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Hi Arnaldo,

Things have gone quiet ;-). What's the current state of this patch?

Thanks,

Michael


On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
<acme@ghostprotocols.net> wrote:
> Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
>> From: 'Arnaldo Carvalho de Melo'
>> ...
>> > > I remember some discussions from an XNET standards meeting (I've forgotten
>> > > exactly which errors on which calls were being discussed).
>> > > My recollection is that you return success with a partial transfer
>> > > count for ANY error that happens after some data has been transferred.
>> > > The actual error will be returned when it happens again on the next
>> > > system call - Note the AGAIN, not a saved error.
>
>> > A saved error, for the right entity, in the recvmmsg case, that
>> > basically is batching multiple recvmsg syscalls, doesn't sound like a
>> > problem, i.e. the idea is to, as much as possible, mimic what multiple
>> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
>> > subsystems) overhead.
>
>> > Perhaps we can have something in between, i.e. for things like EFAULT,
>> > we should report straight away, effectively dropping whatever datagrams
>> > successfully received in the current batch, do you agree?
>
>> Not unreasonable - EFAULT shouldn't happen unless the application
>> is buggy.
>
> Ok.
>
>> > For transient errors the existing mechanism, fixed so that only per
>> > socket errors are saved for later, as today, could be kept?
>
>> I don't think it is ever necessary to save an errno value for the
>> next system call at all.
>> Just process the next system call and see what happens.
>
>> If the call returns with less than the maximum number of datagrams
>> and with a non-zero timeout left - then the application can infer
>> that it was terminated by an abnormal event of some kind.
>> This might be a signal.
>
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?
>
>> I'm not sure if an icmp error on a connected datagram socket could
>> generate a 'disconnect'. It might happen if the interface is being
>> used for something like SCTP.
>> In either case the next call will detect the error.
>
> - Arnaldo



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-06-16  9:58                                     ` Michael Kerrisk (man-pages)
@ 2014-06-24 20:25                                       ` Arnaldo Carvalho de Melo
  2014-06-27 11:29                                         ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-06-24 20:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: David Laight, lkml, linux-man, netdev, Ondrej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, Rémi Denis-Courmont, Paul Moore,
	Chris Friesen

Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
> 
> Things have gone quiet ;-). What's the current state of this patch?

Yeah, I kept meaning to prod the other people on this thread about what
they thought about my last messages, patches, etc. :-)

Can I have acked-by or even tested-by on those? Is it ok?

- Arnaldo

> Thanks,
> 
> Michael
> 
> 
> On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
> <acme@ghostprotocols.net> wrote:
> > Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> >> From: 'Arnaldo Carvalho de Melo'
> >> ...
> >> > > I remember some discussions from an XNET standards meeting (I've forgotten
> >> > > exactly which errors on which calls were being discussed).
> >> > > My recollection is that you return success with a partial transfer
> >> > > count for ANY error that happens after some data has been transferred.
> >> > > The actual error will be returned when it happens again on the next
> >> > > system call - Note the AGAIN, not a saved error.
> >
> >> > A saved error, for the right entity, in the recvmmsg case, that
> >> > basically is batching multiple recvmsg syscalls, doesn't sound like a
> >> > problem, i.e. the idea is to, as much as possible, mimic what multiple
> >> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> >> > subsystems) overhead.
> >
> >> > Perhaps we can have something in between, i.e. for things like EFAULT,
> >> > we should report straight away, effectively dropping whatever datagrams
> >> > successfully received in the current batch, do you agree?
> >
> >> Not unreasonable - EFAULT shouldn't happen unless the application
> >> is buggy.
> >
> > Ok.
> >
> >> > For transient errors the existing mechanism, fixed so that only per
> >> > socket errors are saved for later, as today, could be kept?
> >
> >> I don't think it is ever necessary to save an errno value for the
> >> next system call at all.
> >> Just process the next system call and see what happens.
> >
> >> If the call returns with less than the maximum number of datagrams
> >> and with a non-zero timeout left - then the application can infer
> >> that it was terminated by an abnormal event of some kind.
> >> This might be a signal.
> >
> > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> > error on the next call, but we provide a way for the app to retrieve the
> > reason for the smaller than expected batch?
> >
> >> I'm not sure if an icmp error on a connected datagram socket could
> >> generate a 'disconnect'. It might happen if the interface is being
> >> used for something like SCTP.
> >> In either case the next call will detect the error.
> >
> > - Arnaldo
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-06-24 20:25                                       ` Arnaldo Carvalho de Melo
@ 2014-06-27 11:29                                         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-06-27 11:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, David Laight, lkml, linux-man, netdev,
	Ondrej Bílka, Caitlin Bestler, Neil Horman, Elie De Brauwer,
	David Miller, Steven Whitehouse, Rémi Denis-Courmont,
	Paul Moore, Chris Friesen

On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
>> Hi Arnaldo,
>>
>> Things have gone quiet ;-). What's the current state of this patch?
> 
> Yeah, I kept meaning to prod the other people on this thread about what
> they thought about my last messages, patches, etc. :-)
> 
> Can I have acked-by or even tested-by on those? Is it ok?

I just need to go back and test one point that sounds like it might still be 
broken.

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]
  2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
  2014-05-28 15:07                       ` Arnaldo Carvalho de Melo
@ 2014-06-27 11:37                       ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-06-27 11:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: mtk.manpages, lkml, linux-man, netdev, Ondřej Bílka,
	Caitlin Bestler, Neil Horman, Elie De Brauwer, David Miller,
	Steven Whitehouse, David Laight, Paul Moore, Chris Friesen

Hi Arnaldo,

On 05/28/2014 02:20 PM, Michael Kerrisk (man-pages) wrote:
> On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
>> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) escreveu:
>>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>>> <acme@ghostprotocols.net> wrote:
>>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) escreveu:
>>>>> On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
>>>>>> Can you try the attached patch on top of the first one?
>>>>
>>>>> Patches on patches is a way to make your testers work unnecessarily
>>>>> harder. Also, it means that anyone else who was interested in this
>>>>
>>>> It was meant to highlight the changes with regard to the previous patch,
>>>> i.e. to make things easier for reviewing.
>>>
>>> (I don't think that works...)
>>
>> Lets try both then, 
> 
> That's better!
> 
>> attached goes the updated patch, and this is the
>> diff to the last combined one:
>>
>> diff --git a/net/socket.c b/net/socket.c
>> index 310a50971769..379be43879db 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -2478,8 +2478,7 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
>>  
>>  	datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys);
>>  
>> -	if (datagrams > 0 &&
>> -	    copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
>> +	if (copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys)))
>>  		datagrams = -EFAULT;
>>  
>>  	return datagrams;
>>  
>> ------------------------------------------
>>
>> This is a quick thing just to show where the problem lies, need to think
>> how to report an -EFAULT at this point properly, i.e. look at
>> __sys_recvmmsg for something related (returning the number of
>> successfully copied datagrams to userspace while storing the error for
>> subsequent reporting):
>>
>>         if (err == 0)
>>                 return datagrams;
>>
>>         if (datagrams != 0) {
>>                 /*
>>                  * We may return less entries than requested (vlen) if
>>                  * the
>>                  * sock is non block and there aren't enough
>>                  * datagrams...
>>                  */
>>                 if (err != -EAGAIN) {
>>                         /*
>>                          * ... or  if recvmsg returns an error after we
>>                          * received some datagrams, where we record the
>>                          * error to return on the next call or if the
>>                          * app asks about it using getsockopt(SO_ERROR).
>>                          */
>>                         sock->sk->sk_err = -err;
>>                 }
>>
>>                 return datagrams;
>>         }
>>
>> I.e. userspace would have to use getsockopt(SO_ERROR)... need to think
>> more about it, sidetracked now, will be back to this.
>>
>> Anyway, attached goes the current combined patch.
> 
> So, I applied against net-next as you suggested offlist.
> Builds and generally tests fine. Some observations:
> 
> * In the case that the call is interrupted by a signal handler and no
>   datagrams have been received, the call fails with EINTR, as expected.
> 
> * The call always updates 'timeout', both in the success case and in the
>   EINTR case. (That seems fine.)

So, returning to your recvmmsg-timeout-v3.patch. I think the behavior as
implemented, and described above is okay.

> But, another question...
> 
> In the case that the call is interrupted by a signal handler and some
> datagrams have already been received, then the call succeeds, and
> returns the number of datagrams received, and 'timeout' is updated with
> the remaining time. Maybe that's the right behavior, but I just want to
> check. There is at least one other possibility:
>
> * Fetch no datagrams (i.e., the datagrams are left to receive in a
>   future call), and the call fails with EINTR, and 'timeout' is updated.
> 
> Maybe that possibility is hard to implement (not sure). But my main point
> is to make the current behavior clear, note the alternative, and ask:
> is the current behavior the best choice. (I'm not saying it's not, but I
> do want the choice to be a conscious one.)

So, I think (can't find the mail right now) that you explained elsewhere
that the above would be hard to implement. And in any case, I'm not sure
it's desirable; I only wanted to check that the choice was a deliberate one.

However, there is still a weirdness, which relates to the discussion you
and David Laight had. 

Suppose the following scenario.

1. We do a recvmmsg() with 10 second timeout, asking for 5 messages.
2. 3 messages arrive
3. 6 seconds after the call, a signal handler interrupts the call.
4. recvmmsg() returns success, telling us it got 3 messages.

So far, so good. But

5. We make a further recvmmsg() call.
6. That call returns immediately, with an EINTR error.

That really should not be happening. As noted elsewhere in this
thread, EINTR is a property of a specific system call, not of the
thread or the socket. By the time of step 5, the kernel should
already have forgotten about the signal that occurred at step 3.
I don't think I saw any other patch that fixes that behavior.

I recall now that this was why I was waiting for you to follow up 
in this thread with a new patch.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2014-06-27 11:37 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 13:59 recvmmsg() timeout behavior strangeness [RESEND] Michael Kerrisk (man-pages)
2014-05-03 10:28 ` Michael Kerrisk (man-pages)
2014-05-03 11:29   ` Florian Westphal
2014-05-03 11:39     ` Michael Kerrisk (man-pages)
2014-05-12 10:15 ` Michael Kerrisk (man-pages)
2014-05-12 14:34   ` Arnaldo Carvalho de Melo
2014-05-21 21:05     ` [PATCH/RFC] " Arnaldo Carvalho de Melo
2014-05-22 14:27       ` Michael Kerrisk (man-pages)
2014-05-24  6:13         ` Michael Kerrisk (man-pages)
2014-05-26 13:46         ` Arnaldo Carvalho de Melo
2014-05-26 21:17           ` Arnaldo Carvalho de Melo
2014-05-27 16:35             ` Michael Kerrisk (man-pages)
2014-05-27 19:21               ` Arnaldo Carvalho de Melo
2014-05-27 19:22                 ` Arnaldo Carvalho de Melo
2014-05-27 19:28                 ` Michael Kerrisk (man-pages)
2014-05-27 20:30                   ` Arnaldo Carvalho de Melo
2014-05-28  5:00                     ` Michael Kerrisk (man-pages)
2014-05-28 12:20                     ` Michael Kerrisk (man-pages)
2014-05-28 15:07                       ` Arnaldo Carvalho de Melo
2014-05-28 15:17                         ` David Laight
2014-05-28 19:50                           ` 'Arnaldo Carvalho de Melo'
2014-05-28 21:33                             ` Chris Friesen
2014-05-28 21:49                               ` 'Arnaldo Carvalho de Melo'
2014-05-29 10:53                             ` David Laight
2014-05-29 13:55                               ` 'Arnaldo Carvalho de Melo'
2014-05-29 14:06                                 ` David Laight
2014-05-29 14:17                                   ` 'Arnaldo Carvalho de Melo'
2014-05-29 14:40                                     ` David Laight
2014-05-29 15:33                                     ` [PATCH/RFC] Handle EFAULT in partial recvmmsg was " 'Arnaldo Carvalho de Melo'
2014-06-16  9:58                                     ` Michael Kerrisk (man-pages)
2014-06-24 20:25                                       ` Arnaldo Carvalho de Melo
2014-06-27 11:29                                         ` Michael Kerrisk (man-pages)
2014-05-29 14:07                               ` Michael Kerrisk (man-pages)
2014-06-27 11:37                       ` Michael Kerrisk (man-pages)
2014-05-23 19:00       ` David Miller
2014-05-23 19:55         ` Arnaldo Carvalho de Melo
2014-05-24  6:13           ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).