Re: [PATCH net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Ido Schimmel <idosch@idosch.org>,
	chouhan.shreyansh630@gmail.com
Subject: Re: [PATCH net] ip_gre: validate csum_start only if CHECKSUM_PARTIAL
Date: Fri, 3 Sep 2021 15:38:17 -0400	[thread overview]
Message-ID: <CA+FuTSeHAd4ouwYd9tL2FHa1YdB3aLznOTnAJt+PShnr+Zd7yw@mail.gmail.com> (raw)
In-Reply-To: <CAKgT0Uf6YrDtvEfL02-P7A3Q_V32MWZ-tV7B=xtkY0ZzxEo9yg@mail.gmail.com>

> > > The issue is drivers with NETIF_F_HW_CSUM would be expecting a
> > > reasonable csum_start and csum_offset. If the hardware is only
> > > advertising that and we are expecting it to offload the checksum we
> > > should probably be doing some sort of validation on user derived
> > > inputs to make sure that they aren't totally ridiculous as is the case
> > > here where the original issue was that the csum_start wasn't even in
> > > the packet data.
> > >
> > > Would it maybe make sense to look at reverting the earlier fixes and
> > > instead updating skb_partial_csum_set so that we cannot write a
> > > csum_start value that is less than the start of skb->data? That way we
> > > are addressing this at the source instead of allowing the garbage data
> > > to propagate further down the stack and having to drop it at the
> > > driver level which is going to have us playing whack a mole trying to
> > > fix it where it pops up. It seems like we are already validating the
> > > upper bounds for these values in that function so why not validate the
> > > lower bounds as well?
> >
> > skb_partial_csum_set verifies that csum_start is within bounds
> > at the time it is called. The offset passed from userspace is
> > an unsigned int added to skb_headroom(skb), so >= skb->data.
> >
> > The issue is with ipgre_xmit calling skb_pull. Only then does it
> > become out of bounds. From what I saw, pulling headers like
> > that is very rare in the transmit path. Indeed, I did not see it in
> > any other tunnels. We could instrument each of these directly,
> > but at non-trivial cost.
>
> So there are some positives and negatives to addressing this in
> ipgre_xmit. Specifically if we address it before the pull we could do
> some other quick checks to verify the offset. If the offset and start
> are both in the region to be pulled we could just drop the offload,

These cases would currently crash the kernel. They are so clearly bad
input that I would not even try to start accommodating them. Drop
them.

> whereas if the offset is stored somewhere in the unstripped data we
> could then drop the packet and count it as a drop without having to
> modify the frame via the skb_pull.

This is a broader issue that userspace can pass any csum_start as long
as it is within packet bounds. We could address it here specifically
for the GRE header. But that still leaves many potentially bad offsets
further in the packet in this case, and all the other cases. Checking
that specific header seems a bit arbitrary to me, and might actually
give false confidence.

We could certainly move the validation from gre_handle_offloads to
before skb_pull, to make it more obvious *why* the check exists.

> > The negative underflow and kernel crash is very specific to
> > lco_csum, which calculates the offset between csum_offset
> > and transport_offset. Standard checksum offset doesn't.
> >
> > One alternative fix, then, would be to add a negative overflow
> > check in lco_csum itself.
> > That only has three callers and unfortunately by the time we hit
> > that for GRE in gre_build_header, there no longer is a path to
> > cleanly dropping the packet and returning an error.
>
> Right. We could find the problem there but it doesn't solve the issue.
> The problem is the expectation that the offset exists in the area
> after the checksum for the tunnel header, not before.
>
> > I don't find this bad input whack-a-mole elegant either. But I
> > thought it was the least bad option..
>
> The part that has been confusing me is how the virtio_net_hdr could
> have been pointing as the IP or GRE headers since they shouldn't have
> been present on the frame provided by the userspace. I think I am
> starting to see now.
>
> So in the case of af_packet it looks like it is calling
> dev_hard_header which calls header_ops->create before doing the
> virtio_net_hdr_to_skb call. Do I have that right?

Mostly yes. That is the case for SOCK_DGRAM. For SOCK_RAW, the caller
is expected to have prepared these headers.

Note that for the ip_gre device to have header_ops, this will be
ipgre_header_ops, which .create member function comment starts with
"/* Nice toy. Unfortunately, useless in real life :-)". We recently
had a small series of fixes related to this special case and packet
sockets, such as commit fdafed459998 ("ip_gre: set
dev->hard_header_len and dev->needed_headroom properly"). This case of
a GRE device that receives packets with outer IP + GRE headers is out
of the ordinary.

> Maybe for those
> cases we need to look at adding an unsigned int argument to
> virtio_net_hdr_to_skb in which we could pass 0 for the unused case or
> dev->hard_header_len in the cases where we have something like
> af_packet that is transmitting over an ipgre tunnel. The general idea
> is to prevent these virtio_net_hdr_to_skb calls from pointing the
> csum_start into headers that userspace was not responsible for
> populating.

One issue with that is that dev->hard_header_len itself is imprecise
for protocols with variable length link layer headers. There, too, we
have had a variety of bug fixes in the past.

It also adds cost to every user of virtio_net_hdr, while we only know
one issue in a rare case of the IP_GRE device.