All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	DaeLyong Jeong <threeearcat@gmail.com>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Byoungyoung Lee <byoungyoung@purdue.edu>,
	Kyungtae Kim <kt0755@gmail.com>,
	bammanag@purdue.edu, Willem de Bruijn <willemb@google.com>
Subject: Re: WARNING in ip_recv_error
Date: Wed, 23 May 2018 11:40:26 -0400	[thread overview]
Message-ID: <CAF=yD-KTfUbXGvU7qQy4=eHbuUB88=g_tQ8sp8TEebhW=rzKVQ@mail.gmail.com> (raw)
In-Reply-To: <CAF=yD-JObh3u+41V609=zph0XfcgOxpmqm3hL5WPKWFrSnnV3Q@mail.gmail.com>

On Sun, May 20, 2018 at 7:13 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Fri, May 18, 2018 at 2:59 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>> On Fri, May 18, 2018 at 11:44 AM, David Miller <davem@davemloft.net> wrote:
>>>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>>>> Date: Fri, 18 May 2018 08:30:43 -0700
>>>>>>
>>>>>>> We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>>>>>
>>>>>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>>>>>
>>>>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>>>>> atomic operation, so that the socket is neither fully ipv4 nor fully
>>>>> ipv6 by the time it reaches ip_recv_error.
>>>>>
>>>>>   sk->sk_socket->ops = &inet_dgram_ops;
>>>>>   < HERE >
>>>>>   sk->sk_family = PF_INET;
>>>>>
>>>>> Even calling inet_recv_error to demux would not necessarily help.
>>>>>
>>>>> Safest would be to look up by skb->protocol, similar to what
>>>>> ipv6_recv_error does to handle v4-mapped-v6.
>>>>>
>>>>> Or to make that function safe with PF_INET and swap the order
>>>>> of the above two operations.
>>>>>
>>>>> All sound needlessly complicated for this rare socket option, but
>>>>> I don't have a better idea yet. Dropping on the floor is not nice,
>>>>> either.
>>>>
>>>> Ensuring that ip_recv_error correctly handles packets from either
>>>> socket and removing the warning should indeed be good.
>>>>
>>>> It is robust against v4-mapped packets from an AF_INET6 socket,
>>>> but see caveat on reconnect below.
>>>>
>>>> The code between ipv6_recv_error for v4-mapped addresses and
>>>> ip_recv_error is essentially the same, the main difference being
>>>> whether to return network headers as sockaddr_in with SOL_IP
>>>> or sockaddr_in6 with SOL_IPV6.
>>>>
>>>> There are very few other locations in the stack that explicitly test
>>>> sk_family in this way and thus would be vulnerable to races with
>>>> IPV6_ADDRFORM.
>>>>
>>>> I'm not sure whether it is possible for a udpv6 socket to queue a
>>>> real ipv6 packet on the error queue, disconnect, connect to an
>>>> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
>>>> on a true ipv6 packet. That would return buggy data, e.g., in
>>>> msg_name.
>>>
>>> In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
>>> error queue is empty, and then take its lock for the duration of the
>>> operation.
>>
>> Actually, no reason to hold the lock. This setsockopt holds the socket
>> lock, which connect would need, too. So testing that the queue
>> is empty after testing that it is connected to a v4 address is
>> sufficient to ensure that no ipv6 packets are queued for reception.
>>
>> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
>> index 4d780c7f0130..a975d6311341 100644
>> --- a/net/ipv6/ipv6_sockglue.c
>> +++ b/net/ipv6/ipv6_sockglue.c
>> @@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk,
>> int level, int optname,
>>
>>                         if (ipv6_only_sock(sk) ||
>>                             !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) {
>>                                 retv = -EADDRNOTAVAIL;
>>                                 break;
>>                         }
>>
>> +                       if (!skb_queue_empty(&sk->sk_error_queue)) {
>> +                               retv = -EBUSY;
>> +                               break;
>> +                       }
>> +
>>                         fl6_free_socklist(sk);
>>                         __ipv6_sock_mc_close(sk);
>>
>> After this it should be safe to remove the warning in ip_recv_error.
>
> Hmm.. nope.
>
> This ensures that the socket cannot produce any new true v6 packets.
> But it does not guarantee that they are not already in the system, e.g.
> queued in tc, and will find their way to the error queue later.
>
> We'll have to just be able to handle ipv6 packets in ip_recv_error.
> Since IPV6_ADDRFORM is used to pass to legacy v4-only
> processes and those likely are only confused by SOL_IPV6
> error messages, it is probably best to just drop them and perhaps
> WARN_ONCE.

Even more fun, this is not limited to the error queue.

I can queue a v6 packet for reception on a socket, connect to a v4
address, call IPV6_ADDRFORM and then a regular recvfrom will
return a partial v6 address as AF_INET.

We definitely do not want to have to add a check

  if (skb->protocol == htons(ETH_P_IPV6)) {
    kfree_skb(skb);
    goto try_again;
  }

to the normal recvmsg path.

An alternative may be to tighten the check on when to allow
IPV6_ADDRFORM. Not only return EBUSY if a packet is pending,
but also if any sk_{rmem, omem, wmem}_alloc is non-zero. Only,
these tightened constraints could break a legacy application.

Either way, this race is somewhat tangential to the one that
RaceFuzzer found. The sk changes that IPV6_ADDRFORM makes
to sk_prot, sk_socket->ops and sk_family are not atomic and will
not be. They need not be, because no other code assumes this
consistency.

So I'll start by removing the warning as Eric suggested.

  reply	other threads:[~2018-05-23 15:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18 12:08 WARNING in ip_recv_error DaeRyong Jeong
2018-05-18 15:30 ` Eric Dumazet
2018-05-18 15:44   ` David Miller
2018-05-18 17:09     ` Willem de Bruijn
2018-05-18 18:44       ` Willem de Bruijn
2018-05-18 18:46         ` Willem de Bruijn
2018-05-18 18:59           ` Willem de Bruijn
2018-05-20 23:13             ` Willem de Bruijn
2018-05-23 15:40               ` Willem de Bruijn [this message]
2018-05-23 18:30                 ` Willem de Bruijn
2018-05-24  8:00                 ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-KTfUbXGvU7qQy4=eHbuUB88=g_tQ8sp8TEebhW=rzKVQ@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=bammanag@purdue.edu \
    --cc=byoungyoung@purdue.edu \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kt0755@gmail.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=threeearcat@gmail.com \
    --cc=willemb@google.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.