netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules
@ 2021-10-26 14:24 msizanoen
  2021-10-29 23:53 ` David Ahern
  0 siblings, 1 reply; 3+ messages in thread
From: msizanoen @ 2021-10-26 14:24 UTC (permalink / raw)
  To: davem, yoshfuji, dsahern, kuba; +Cc: netdev, linux-kernel

The kernel leaks memory when a `fib` rule is present in ipv6 nftables firewall rules and a suppress_prefix rule
is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming
packet will leak an allocation in ip6_dst_cache slab cache.

After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to this commit:
	https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26

The problem with that patch is that the generic args->flags always have FIB_LOOKUP_NOREF set[1][2] but the
ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading to fib6_rule_suppress not
decreasing the refcount when needed. This can be fixed by exposing the protocol-specific flags to the
protocol specific `suppress` function, and check the protocol-specific `flags` argument for
RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount.

How to reproduce:
- Add the following nftables rule to a prerouting chain: `meta nfproto ipv6 fib saddr . mark . iif oif missing drop`
- Run `sudo ip -6 rule add table main suppress_prefixlength 0`
- Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase with every incoming ipv6 packet

Example patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f

[1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
[2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules
  2021-10-26 14:24 Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules msizanoen
@ 2021-10-29 23:53 ` David Ahern
  2021-10-30  0:25   ` msizanoen
  0 siblings, 1 reply; 3+ messages in thread
From: David Ahern @ 2021-10-29 23:53 UTC (permalink / raw)
  To: msizanoen, davem, yoshfuji, dsahern, kuba; +Cc: netdev, linux-kernel

On 10/26/21 8:24 AM, msizanoen wrote:
> The kernel leaks memory when a `fib` rule is present in ipv6 nftables
> firewall rules and a suppress_prefix rule
> is present in the IPv6 routing rules (used by certain tools such as
> wg-quick). In such scenarios, every incoming
> packet will leak an allocation in ip6_dst_cache slab cache.
> 
> After some hours of `bpftrace`-ing and source code reading, I tracked
> down the issue to this commit:
>     https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26
> 
> 
> The problem with that patch is that the generic args->flags always have
> FIB_LOOKUP_NOREF set[1][2] but the
> ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading
> to fib6_rule_suppress not
> decreasing the refcount when needed. This can be fixed by exposing the
> protocol-specific flags to the
> protocol specific `suppress` function, and check the protocol-specific
> `flags` argument for
> RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when
> decreasing the refcount.
> 
> How to reproduce:
> - Add the following nftables rule to a prerouting chain: `meta nfproto
> ipv6 fib saddr . mark . iif oif missing drop`

exact command? I have not played with nftables. Do you have a stack
trace of where the dst reference is getting taken?


> - Run `sudo ip -6 rule add table main suppress_prefixlength 0`
> - Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase
> with every incoming ipv6 packet
> 
> Example
> patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f
> 
> [1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
> 
> [2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
> 
> 
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules
  2021-10-29 23:53 ` David Ahern
@ 2021-10-30  0:25   ` msizanoen
  0 siblings, 0 replies; 3+ messages in thread
From: msizanoen @ 2021-10-30  0:25 UTC (permalink / raw)
  To: David Ahern, davem, yoshfuji, dsahern, kuba; +Cc: netdev, linux-kernel

 > exact command? I have not played with nftables.

sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook 
prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . 
mark . iif oif missing drop

 > Do you have a stack
 > trace of where the dst reference is getting taken?

         ip6_dst_alloc+5
         ip6_create_rt_rcu+107
         ip6_pol_route_lookup+741
         fib6_rule_action+707
         fib_rules_lookup+342
         fib6_rule_lookup+150
         nft_fib6_eval+354
         nft_do_chain+339
         nft_do_chain_inet+123
         nf_hook_slow+63
         nf_hook_slow_list+129
         ip6_sublist_rcv+606
         ipv6_list_rcv+296
         __netif_receive_skb_list_core+489
         netif_receive_skb_list_internal+433
         napi_complete_done+111
         virtnet_poll+771
         __napi_poll+42
         net_rx_action+547
         __softirqentry_text_start+208
         __irq_exit_rcu+199
         common_interrupt+131
         asm_common_interrupt+30
         native_safe_halt+11
         default_idle+10
         default_idle_call+53
         do_idle+487
         cpu_startup_entry+25
         secondary_startup_64_no_verify+194

Collected using the following bpftrace script:

kretfunc:ip6_dst_alloc { @[(uint64)retval] = kstack(); }
kfunc:ip6_dst_destroy { delete(@[(uint64)args->dst]); }

On 10/30/21 06:53, David Ahern wrote:
> On 10/26/21 8:24 AM, msizanoen wrote:
>> The kernel leaks memory when a `fib` rule is present in ipv6 nftables
>> firewall rules and a suppress_prefix rule
>> is present in the IPv6 routing rules (used by certain tools such as
>> wg-quick). In such scenarios, every incoming
>> packet will leak an allocation in ip6_dst_cache slab cache.
>>
>> After some hours of `bpftrace`-ing and source code reading, I tracked
>> down the issue to this commit:
>>      https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26
>>
>>
>> The problem with that patch is that the generic args->flags always have
>> FIB_LOOKUP_NOREF set[1][2] but the
>> ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading
>> to fib6_rule_suppress not
>> decreasing the refcount when needed. This can be fixed by exposing the
>> protocol-specific flags to the
>> protocol specific `suppress` function, and check the protocol-specific
>> `flags` argument for
>> RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when
>> decreasing the refcount.
>>
>> How to reproduce:
>> - Add the following nftables rule to a prerouting chain: `meta nfproto
>> ipv6 fib saddr . mark . iif oif missing drop`
> exact command? I have not played with nftables. Do you have a stack
> trace of where the dst reference is getting taken?
>
>
>> - Run `sudo ip -6 rule add table main suppress_prefixlength 0`
>> - Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase
>> with every incoming ipv6 packet
>>
>> Example
>> patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f
>>
>> [1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
>>
>> [2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
>>
>>
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-30  0:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 14:24 Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules msizanoen
2021-10-29 23:53 ` David Ahern
2021-10-30  0:25   ` msizanoen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).