Re: [PATCH net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL

From: Alexander Duyck <alexander.duyck@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Ido Schimmel <idosch@idosch.org>,
	chouhan.shreyansh630@gmail.com
Subject: Re: [PATCH net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL
Date: Fri, 3 Sep 2021 16:13:54 -0700	[thread overview]
Message-ID: <CAKgT0Ucx+i6prW5n95dYRF=+7hz2pzNDpQfwwUY607MyQh1gGg@mail.gmail.com> (raw)
In-Reply-To: <CA+FuTSeHAd4ouwYd9tL2FHa1YdB3aLznOTnAJt+PShnr+Zd7yw@mail.gmail.com>

On Fri, Sep 3, 2021 at 12:38 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

<snip>

> > whereas if the offset is stored somewhere in the unstripped data we
> > could then drop the packet and count it as a drop without having to
> > modify the frame via the skb_pull.
>
> This is a broader issue that userspace can pass any csum_start as long
> as it is within packet bounds. We could address it here specifically
> for the GRE header. But that still leaves many potentially bad offsets
> further in the packet in this case, and all the other cases. Checking
> that specific header seems a bit arbitrary to me, and might actually
> give false confidence.
>
> We could certainly move the validation from gre_handle_offloads to
> before skb_pull, to make it more obvious *why* the check exists.

Agreed. My main concern is that the csum_start is able to be located
somewhere where the userspace didn't write. For the most part the
csum_start and csum_offset just needs to be restricted to the regions
that the userspace actually wrote to.

> > > The negative underflow and kernel crash is very specific to
> > > lco_csum, which calculates the offset between csum_offset
> > > and transport_offset. Standard checksum offset doesn't.
> > >
> > > One alternative fix, then, would be to add a negative overflow
> > > check in lco_csum itself.
> > > That only has three callers and unfortunately by the time we hit
> > > that for GRE in gre_build_header, there no longer is a path to
> > > cleanly dropping the packet and returning an error.
> >
> > Right. We could find the problem there but it doesn't solve the issue.
> > The problem is the expectation that the offset exists in the area
> > after the checksum for the tunnel header, not before.
> >
> > > I don't find this bad input whack-a-mole elegant either. But I
> > > thought it was the least bad option..
> >
> > The part that has been confusing me is how the virtio_net_hdr could
> > have been pointing as the IP or GRE headers since they shouldn't have
> > been present on the frame provided by the userspace. I think I am
> > starting to see now.
> >
> > So in the case of af_packet it looks like it is calling
> > dev_hard_header which calls header_ops->create before doing the
> > virtio_net_hdr_to_skb call. Do I have that right?
>
> Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller
> is expected to have prepared these headers.
>
> Note that for the ip_gre device to have header_ops, this will be
> ipgre_header_ops, which .create member function comment starts with
> "/* Nice toy. Unfortunately, useless in real life :-)". We recently
> had a small series of fixes related to this special case and packet
> sockets, such as commit fdafed459998 ("ip_gre: set
> dev->hard_header_len and dev->needed_headroom properly"). This case of
> a GRE device that receives packets with outer IP + GRE headers is out
> of the ordinary.

I'm not sure the comment is about the functionality of the function as
much as the addressing. Seems to be about setting up a multicast GRE
configuration.

> > Maybe for those
> > cases we need to look at adding an unsigned int argument to
> > virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> > dev->hard_header_len in the cases where we have something like
> > af_packet that is transmitting over an ipgre tunnel. The general idea
> > is to prevent these virtio_net_hdr_to_skb calls from pointing the
> > csum_start into headers that userspace was not responsible for
> > populating.
>
> One issue with that is that dev->hard_header_len itself is imprecise
> for protocols with variable length link layer headers. There, too, we
> have had a variety of bug fixes in the past.
>
> It also adds cost to every user of virtio_net_hdr, while we only know
> one issue in a rare case of the IP_GRE device.

Quick question, the assumption is that the checksum should always be
performed starting no earlier than the transport header right? Looking
over virtio_net_hdr_to_skb it looks like it is already verifying the
transport header is in the linear portion of the skb. I'm wondering if
we couldn't just look at adding a check to verify the transport offset
is <= csum start? We might also be able to get rid of one of the two
calls to pskb_may_pull by doing that.