netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply
@ 2021-03-25 15:35 Antoine Tenart
  2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Antoine Tenart @ 2021-03-25 15:35 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, echaudro, sbrivio, netdev

Hi,

The series fixes an issue were a shared ip_tunnel_info is modified when
PMTU triggers an ICMP reply in vxlan and geneve, making following
packets in that flow to have a wrong destination address if the flow
isn't updated. A detailled information is given in each of the two
commits.

This was tested manually with OVS and I ran the PTMU selftests with
kmemleak enabled (all OK, none was skipped).

Thanks!
Antoine

Antoine Tenart (2):
  vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP
    reply
  geneve: do not modify the shared tunnel info when PMTU triggers an
    ICMP reply

 drivers/net/geneve.c | 24 ++++++++++++++++++++----
 drivers/net/vxlan.c  | 18 ++++++++++++++----
 2 files changed, 34 insertions(+), 8 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
@ 2021-03-25 15:35 ` Antoine Tenart
  2022-01-20  7:38   ` Vlad Buslov
  2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Antoine Tenart @ 2021-03-25 15:35 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, echaudro, sbrivio, netdev

When the interface is part of a bridge or an Open vSwitch port and a
packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
using the external mode (collect metadata) the source and destination
addresses are reversed, so that Open vSwitch can match the packet
against an existing (reverse) flow.

But inverting the source and destination addresses in the shared
ip_tunnel_info will make following packets of the flow to use a wrong
destination address (packets will be tunnelled to itself), if the flow
isn't updated. Which happens with Open vSwitch, until the flow times
out.

Fixes this by uncloning the skb's ip_tunnel_info before inverting its
source and destination addresses, so that the modification will only be
made for the PTMU packet, not the following ones.

Fixes: fc68c99577cc ("vxlan: Support for PMTU discovery on directly bridged links")
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 drivers/net/vxlan.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 666dd201c3d5..53dbc67e8a34 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			goto tx_error;
 		} else if (err) {
 			if (info) {
+				struct ip_tunnel_info *unclone;
 				struct in_addr src, dst;
 
+				unclone = skb_tunnel_info_unclone(skb);
+				if (unlikely(!unclone))
+					goto tx_error;
+
 				src = remote_ip.sin.sin_addr;
 				dst = local_ip.sin.sin_addr;
-				info->key.u.ipv4.src = src.s_addr;
-				info->key.u.ipv4.dst = dst.s_addr;
+				unclone->key.u.ipv4.src = src.s_addr;
+				unclone->key.u.ipv4.dst = dst.s_addr;
 			}
 			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
 			dst_release(ndst);
@@ -2781,12 +2786,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			goto tx_error;
 		} else if (err) {
 			if (info) {
+				struct ip_tunnel_info *unclone;
 				struct in6_addr src, dst;
 
+				unclone = skb_tunnel_info_unclone(skb);
+				if (unlikely(!unclone))
+					goto tx_error;
+
 				src = remote_ip.sin6.sin6_addr;
 				dst = local_ip.sin6.sin6_addr;
-				info->key.u.ipv6.src = src;
-				info->key.u.ipv6.dst = dst;
+				unclone->key.u.ipv6.src = src;
+				unclone->key.u.ipv6.dst = dst;
 			}
 
 			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/2] geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
  2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
@ 2021-03-25 15:35 ` Antoine Tenart
  2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
  2021-03-26  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2021-03-25 15:35 UTC (permalink / raw)
  To: davem, kuba; +Cc: Antoine Tenart, echaudro, sbrivio, netdev

When the interface is part of a bridge or an Open vSwitch port and a
packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
using the external mode (collect metadata) the source and destination
addresses are reversed, so that Open vSwitch can match the packet
against an existing (reverse) flow.

But inverting the source and destination addresses in the shared
ip_tunnel_info will make following packets of the flow to use a wrong
destination address (packets will be tunnelled to itself), if the flow
isn't updated. Which happens with Open vSwitch, until the flow times
out.

Fixes this by uncloning the skb's ip_tunnel_info before inverting its
source and destination addresses, so that the modification will only be
made for the PTMU packet, not the following ones.

Fixes: c1a800e88dbf ("geneve: Support for PMTU discovery on directly bridged links")
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 drivers/net/geneve.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 4ac0373326ef..d5b1e48e0c09 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -908,8 +908,16 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 		info = skb_tunnel_info(skb);
 		if (info) {
-			info->key.u.ipv4.dst = fl4.saddr;
-			info->key.u.ipv4.src = fl4.daddr;
+			struct ip_tunnel_info *unclone;
+
+			unclone = skb_tunnel_info_unclone(skb);
+			if (unlikely(!unclone)) {
+				dst_release(&rt->dst);
+				return -ENOMEM;
+			}
+
+			unclone->key.u.ipv4.dst = fl4.saddr;
+			unclone->key.u.ipv4.src = fl4.daddr;
 		}
 
 		if (!pskb_may_pull(skb, ETH_HLEN)) {
@@ -993,8 +1001,16 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		struct ip_tunnel_info *info = skb_tunnel_info(skb);
 
 		if (info) {
-			info->key.u.ipv6.dst = fl6.saddr;
-			info->key.u.ipv6.src = fl6.daddr;
+			struct ip_tunnel_info *unclone;
+
+			unclone = skb_tunnel_info_unclone(skb);
+			if (unlikely(!unclone)) {
+				dst_release(dst);
+				return -ENOMEM;
+			}
+
+			unclone->key.u.ipv6.dst = fl6.saddr;
+			unclone->key.u.ipv6.src = fl6.daddr;
 		}
 
 		if (!pskb_may_pull(skb, ETH_HLEN)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
  2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
  2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
@ 2021-03-25 20:28 ` Stefano Brivio
  2021-03-26  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 14+ messages in thread
From: Stefano Brivio @ 2021-03-25 20:28 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, netdev

On Thu, 25 Mar 2021 16:35:31 +0100
Antoine Tenart <atenart@kernel.org> wrote:

> Hi,
> 
> The series fixes an issue were a shared ip_tunnel_info is modified when
> PMTU triggers an ICMP reply in vxlan and geneve, making following
> packets in that flow to have a wrong destination address if the flow
> isn't updated. A detailled information is given in each of the two
> commits.
> 
> This was tested manually with OVS and I ran the PTMU selftests with
> kmemleak enabled (all OK, none was skipped).
> 
> Thanks!
> Antoine
> 
> Antoine Tenart (2):
>   vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP
>     reply
>   geneve: do not modify the shared tunnel info when PMTU triggers an
>     ICMP reply

For the series,

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

Thanks for fixing this!

-- 
Stefano


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
                   ` (2 preceding siblings ...)
  2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
@ 2021-03-26  0:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-03-26  0:40 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, sbrivio, netdev

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Thu, 25 Mar 2021 16:35:31 +0100 you wrote:
> Hi,
> 
> The series fixes an issue were a shared ip_tunnel_info is modified when
> PMTU triggers an ICMP reply in vxlan and geneve, making following
> packets in that flow to have a wrong destination address if the flow
> isn't updated. A detailled information is given in each of the two
> commits.
> 
> [...]

Here is the summary with links:
  - [net,1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
    https://git.kernel.org/netdev/net/c/30a93d2b7d5a
  - [net,2/2] geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
    https://git.kernel.org/netdev/net/c/68c1a943ef37

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
@ 2022-01-20  7:38   ` Vlad Buslov
  2022-01-20 10:27     ` Antoine Tenart
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2022-01-20  7:38 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, sbrivio, netdev

Hello Antoine,

On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel.org> wrote:
> When the interface is part of a bridge or an Open vSwitch port and a
> packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
> using the external mode (collect metadata) the source and destination
> addresses are reversed, so that Open vSwitch can match the packet
> against an existing (reverse) flow.
>
> But inverting the source and destination addresses in the shared
> ip_tunnel_info will make following packets of the flow to use a wrong
> destination address (packets will be tunnelled to itself), if the flow
> isn't updated. Which happens with Open vSwitch, until the flow times
> out.
>
> Fixes this by uncloning the skb's ip_tunnel_info before inverting its
> source and destination addresses, so that the modification will only be
> made for the PTMU packet, not the following ones.
>
> Fixes: fc68c99577cc ("vxlan: Support for PMTU discovery on directly bridged links")
> Tested-by: Eelco Chaudron <echaudro@redhat.com>
> Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>  drivers/net/vxlan.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 666dd201c3d5..53dbc67e8a34 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  			goto tx_error;
>  		} else if (err) {
>  			if (info) {
> +				struct ip_tunnel_info *unclone;
>  				struct in_addr src, dst;
>  
> +				unclone = skb_tunnel_info_unclone(skb);
> +				if (unlikely(!unclone))
> +					goto tx_error;
> +

We have been getting memleaks in one of our tests that point to this
code (test deletes vxlan device while running traffic redirected by OvS
TC at the same time):

unreferenced object 0xffff8882d0114200 (size 256):
  comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
    a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
  backtrace:
    [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
    [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
    [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
    [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
    [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
    [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
    [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
    [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
    [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
    [<0000000011d3f765>] tcf_classify+0x33d/0x800
    [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
    [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
    [<0000000065d43bd6>] process_backlog+0x2e3/0x710
    [<00000000964357ae>] __napi_poll+0x9f/0x560
    [<0000000059a93cf6>] net_rx_action+0x357/0xa60
    [<00000000766481bc>] __do_softirq+0x282/0x94e

Looking at the code the potential issue seems to be that
tun_dst_unclone() creates new metadata_dst instance with refcount==1,
increments the refcount with dst_hold() to value 2, then returns it.
This seems to imply that caller is expected to release one of the
references (second one if for skb), but none of the callers (including
original dev_fill_metadata_dst()) do that, so I guess I'm
misunderstanding something here.

Any tips or suggestions?

>  				src = remote_ip.sin.sin_addr;
>  				dst = local_ip.sin.sin_addr;
> -				info->key.u.ipv4.src = src.s_addr;
> -				info->key.u.ipv4.dst = dst.s_addr;
> +				unclone->key.u.ipv4.src = src.s_addr;
> +				unclone->key.u.ipv4.dst = dst.s_addr;
>  			}
>  			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
>  			dst_release(ndst);
> @@ -2781,12 +2786,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  			goto tx_error;
>  		} else if (err) {
>  			if (info) {
> +				struct ip_tunnel_info *unclone;
>  				struct in6_addr src, dst;
>  
> +				unclone = skb_tunnel_info_unclone(skb);
> +				if (unlikely(!unclone))
> +					goto tx_error;
> +
>  				src = remote_ip.sin6.sin6_addr;
>  				dst = local_ip.sin6.sin6_addr;
> -				info->key.u.ipv6.src = src;
> -				info->key.u.ipv6.dst = dst;
> +				unclone->key.u.ipv6.src = src;
> +				unclone->key.u.ipv6.dst = dst;
>  			}
>  
>  			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-20  7:38   ` Vlad Buslov
@ 2022-01-20 10:27     ` Antoine Tenart
  2022-01-20 12:58       ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Antoine Tenart @ 2022-01-20 10:27 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

Hello Vlad,

Quoting Vlad Buslov (2022-01-20 08:38:05)
> On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel.org> wrote:
> >
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index 666dd201c3d5..53dbc67e8a34 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> >                       goto tx_error;
> >               } else if (err) {
> >                       if (info) {
> > +                             struct ip_tunnel_info *unclone;
> >                               struct in_addr src, dst;
> >  
> > +                             unclone = skb_tunnel_info_unclone(skb);
> > +                             if (unlikely(!unclone))
> > +                                     goto tx_error;
> > +
> 
> We have been getting memleaks in one of our tests that point to this
> code (test deletes vxlan device while running traffic redirected by OvS
> TC at the same time):
> 
> unreferenced object 0xffff8882d0114200 (size 256):
>   comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
>     a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
>   backtrace:
>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>     [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
>     [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
>     [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
>     [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
>     [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
>     [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
>     [<0000000011d3f765>] tcf_classify+0x33d/0x800
>     [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
>     [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
>     [<0000000065d43bd6>] process_backlog+0x2e3/0x710
>     [<00000000964357ae>] __napi_poll+0x9f/0x560
>     [<0000000059a93cf6>] net_rx_action+0x357/0xa60
>     [<00000000766481bc>] __do_softirq+0x282/0x94e
> 
> Looking at the code the potential issue seems to be that
> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
> increments the refcount with dst_hold() to value 2, then returns it.
> This seems to imply that caller is expected to release one of the
> references (second one if for skb), but none of the callers (including
> original dev_fill_metadata_dst()) do that, so I guess I'm
> misunderstanding something here.
> 
> Any tips or suggestions?

I'd say there is no need to increase the dst refcount here after calling
metadata_dst_alloc, as the metadata is local to the skb and the dst
refcount was already initialized to 1. This might be an issue with
commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
Pravin, he might recall if there was a reason to increase the refcount.

Thanks,
Antoine

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-20 10:27     ` Antoine Tenart
@ 2022-01-20 12:58       ` Vlad Buslov
  2022-01-28 17:01         ` Antoine Tenart
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2022-01-20 12:58 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@kernel.org> wrote:
> Hello Vlad,
>
> Quoting Vlad Buslov (2022-01-20 08:38:05)
>> On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel.org> wrote:
>> >
>> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> > index 666dd201c3d5..53dbc67e8a34 100644
>> > --- a/drivers/net/vxlan.c
>> > +++ b/drivers/net/vxlan.c
>> > @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>> >                       goto tx_error;
>> >               } else if (err) {
>> >                       if (info) {
>> > +                             struct ip_tunnel_info *unclone;
>> >                               struct in_addr src, dst;
>> >  
>> > +                             unclone = skb_tunnel_info_unclone(skb);
>> > +                             if (unlikely(!unclone))
>> > +                                     goto tx_error;
>> > +
>> 
>> We have been getting memleaks in one of our tests that point to this
>> code (test deletes vxlan device while running traffic redirected by OvS
>> TC at the same time):
>> 
>> unreferenced object 0xffff8882d0114200 (size 256):
>>   comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
>>     a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
>>   backtrace:
>>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>>     [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
>>     [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
>>     [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
>>     [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
>>     [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
>>     [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
>>     [<0000000011d3f765>] tcf_classify+0x33d/0x800
>>     [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
>>     [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
>>     [<0000000065d43bd6>] process_backlog+0x2e3/0x710
>>     [<00000000964357ae>] __napi_poll+0x9f/0x560
>>     [<0000000059a93cf6>] net_rx_action+0x357/0xa60
>>     [<00000000766481bc>] __do_softirq+0x282/0x94e
>> 
>> Looking at the code the potential issue seems to be that
>> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
>> increments the refcount with dst_hold() to value 2, then returns it.
>> This seems to imply that caller is expected to release one of the
>> references (second one if for skb), but none of the callers (including
>> original dev_fill_metadata_dst()) do that, so I guess I'm
>> misunderstanding something here.
>> 
>> Any tips or suggestions?
>
> I'd say there is no need to increase the dst refcount here after calling
> metadata_dst_alloc, as the metadata is local to the skb and the dst
> refcount was already initialized to 1. This might be an issue with
> commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
> Pravin, he might recall if there was a reason to increase the refcount.

I tried to remove the dst_hold(), but that caused underflows[0], so I
guess the current reference counting is required at least for some
use-cases.

[0]:

[  118.803011] ------------[ cut here ]------------                                    
[  118.803011] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803019] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803027] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803041] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803046] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803060] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803065] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803078] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803083] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803096] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803920] dst_release underflow                                                   
[  118.803937] WARNING: CPU: 4 PID: 0 at net/core/dst.c:173 dst_release+0x79/0x90                                                                                             
[  118.815961] Modules linked in: act_tunnel_key act_mirred act_skbedit veth vxlan ip6_udp_tunnel udp_tunnel act_gact cls_flower sch_ingress openvswitch nsh nf_conncount mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_connt
rack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs sunrpc ib_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core virtio_net 
net_failover failover i2c_i801 i2c_smbus kvm_intel kvm pcspkr irqbypass crc32_pclmul ghash_clmulni_intel sch_fq_codel drm i2c_core ip_tables crc32c_intel serio_raw fuse [last unloaded: mlxfw]
[  118.829464] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.16.0+ #3                                                                                                           
[  118.830567] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014                                                                                                                                                                                                                              
[  118.832541] RIP: 0010:dst_release+0x79/0x90                                         
[  118.833372] Code: 04 e8 db 14 01 00 8b 4c 24 04 85 c0 74 c7 e9 4d 60 22 00 48 c7 c7 35 00 32 82 89 4c 24 04 c6 05 c5 47 e5 00 01 e8 6f c2 1b 00 <0f> 0b 8b 4c 24 04 eb cb 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40                                                                                                                                       
[  118.836566] RSP: 0018:ffffc90000180ee0 EFLAGS: 00010282                             
[  118.837546] RAX: 0000000000000000 RBX: ffff88813a139ab0 RCX: 0000000000000000                                                                                              
[  118.838912] RDX: 0000000000000102 RSI: ffffffff82286273 RDI: 00000000ffffffff                                                                                              
[  118.840278] RBP: 0000000000000004 R08: 0000000000000015 R09: ffffc90000180e78                                                                                              
[  118.841646] R10: ffffffff825c7000 R11: 0000000000000001 R12: ffff88811d3ab480                                                                                              
[  118.843008] R13: 0000000000000170 R14: 0000000000000000 R15: ffff8882f5a2e1b8                                                                                              
[  118.844371] FS:  0000000000000000(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000                                                                                   
[  118.845968] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                              
[  118.847100] CR2: 00007fa5e5abd7e0 CR3: 0000000175408005 CR4: 0000000000770ee0                                                                                              
[  118.848461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                              
[  118.849834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                              
[  118.851198] PKRU: 55555554                                                          
[  118.851815] Call Trace:                                                             
[  118.852387]  <IRQ>                                                                  
[  118.852894]  dst_cache_destroy+0x33/0x60                                            
[  118.853728]  dst_destroy+0xaa/0xb0                                                  
[  118.854465]  rcu_core+0x2a3/0x960                                                   
[  118.855197]  __do_softirq+0xf0/0x2f9                                                
[  118.855976]  __irq_exit_rcu+0xcc/0x120                                              
[  118.856765]  sysvec_apic_timer_interrupt+0xa2/0xd0                                  
[  118.857746]  </IRQ>                                                                 
[  118.858270]  <TASK>                                                                 
[  118.858775]  asm_sysvec_apic_timer_interrupt+0x12/0x20                              
[  118.859801] RIP: 0010:native_safe_halt+0xb/0x10                                     
[  118.860731] Code: 7e ff ff ff 7f 5b c3 65 48 8b 04 25 c0 cb 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c3 eb 80 cc eb 07 0f 00 2d 8f f5 4f 00 fb f4 <c3> 0f 1f 40 00 eb 07 0f 00 2d 7f f5 4f 00 f4 c3 cc cc cc cc cc 0f                                                                                                                                       
[  118.864175] RSP: 0018:ffffc9000009fef0 EFLAGS: 00000206                             
[  118.865223] RAX: 0000000000027f6c RBX: 0000000000000004 RCX: 0000000000000000                                                                                              
[  118.866504] RDX: ffff88817c090d50 RSI: ffffffff82286273 RDI: ffffffff8225bf7f                                                                                              
[  118.867774] RBP: ffff8881009d2e80 R08: 0000000000027f6b R09: 0000000000000000                                                                                              
[  118.869051] R10: 00000000fffd3b17 R11: 0000000000000001 R12: 0000000000000000                                                                                              
[  118.870326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000                                                                                              
[  118.871601]  default_idle+0xa/0x10                                                  
[  118.872300]  default_idle_call+0x33/0xe0                                            
[  118.873081]  do_idle+0x208/0x270                                                    
[  118.873741]  cpu_startup_entry+0x19/0x20                                            
[  118.874562]  secondary_startup_64_no_verify+0xc3/0xcb                               
[  118.875604]  </TASK>                                                                
[  118.876140] ---[ end trace dec5061c76371ce7 ]---   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-20 12:58       ` Vlad Buslov
@ 2022-01-28 17:01         ` Antoine Tenart
  2022-01-31 11:26           ` Vlad Buslov
  0 siblings, 1 reply; 14+ messages in thread
From: Antoine Tenart @ 2022-01-28 17:01 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

Hi Vlad,

Quoting Vlad Buslov (2022-01-20 13:58:18)
> On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@kernel.org> wrote:
> > Quoting Vlad Buslov (2022-01-20 08:38:05)
> >> 
> >> We have been getting memleaks in one of our tests that point to this
> >> code (test deletes vxlan device while running traffic redirected by OvS
> >> TC at the same time):
> >> 
> >> unreferenced object 0xffff8882d0114200 (size 256):
> >>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
> >>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
> >>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]

[...]

> >> Looking at the code the potential issue seems to be that
> >> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
> >> increments the refcount with dst_hold() to value 2, then returns it.
> >> This seems to imply that caller is expected to release one of the
> >> references (second one if for skb), but none of the callers (including
> >> original dev_fill_metadata_dst()) do that, so I guess I'm
> >> misunderstanding something here.
> >> 
> >> Any tips or suggestions?
> >
> > I'd say there is no need to increase the dst refcount here after calling
> > metadata_dst_alloc, as the metadata is local to the skb and the dst
> > refcount was already initialized to 1. This might be an issue with
> > commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
> > Pravin, he might recall if there was a reason to increase the refcount.
> 
> I tried to remove the dst_hold(), but that caused underflows[0], so I
> guess the current reference counting is required at least for some
> use-cases.
> 
> [0]:
> 
> [  118.803011] dst_release: dst:000000001fc13e61 refcnt:-2                             

[...]

I finally had some time to look at this. Does the diff below fix your
issue?

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 14efa0ded75d..90a7a4daea9c 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
 static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 {
        struct metadata_dst *md_dst = skb_metadata_dst(skb);
-       int md_size;
        struct metadata_dst *new_md;
+       int md_size, ret;
 
        if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
                return ERR_PTR(-EINVAL);
@@ -123,8 +123,15 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 
        memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
               sizeof(struct ip_tunnel_info) + md_size);
+#ifdef CONFIG_DST_CACHE
+       ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
+       if (ret) {
+               metadata_dst_free(new_md);
+               return ERR_PTR(ret);
+       }
+#endif
+
        skb_dst_drop(skb);
-       dst_hold(&new_md->dst);
        skb_dst_set(skb, &new_md->dst);
        return new_md;
 }

Antoine

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-28 17:01         ` Antoine Tenart
@ 2022-01-31 11:26           ` Vlad Buslov
  2022-01-31 13:26             ` Antoine Tenart
  0 siblings, 1 reply; 14+ messages in thread
From: Vlad Buslov @ 2022-01-31 11:26 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

On Fri 28 Jan 2022 at 19:01, Antoine Tenart <atenart@kernel.org> wrote:
> Hi Vlad,
>
> Quoting Vlad Buslov (2022-01-20 13:58:18)
>> On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@kernel.org> wrote:
>> > Quoting Vlad Buslov (2022-01-20 08:38:05)
>> >> 
>> >> We have been getting memleaks in one of our tests that point to this
>> >> code (test deletes vxlan device while running traffic redirected by OvS
>> >> TC at the same time):
>> >> 
>> >> unreferenced object 0xffff8882d0114200 (size 256):
>> >>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>> >>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>> >>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>
> [...]
>
>> >> Looking at the code the potential issue seems to be that
>> >> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
>> >> increments the refcount with dst_hold() to value 2, then returns it.
>> >> This seems to imply that caller is expected to release one of the
>> >> references (second one if for skb), but none of the callers (including
>> >> original dev_fill_metadata_dst()) do that, so I guess I'm
>> >> misunderstanding something here.
>> >> 
>> >> Any tips or suggestions?
>> >
>> > I'd say there is no need to increase the dst refcount here after calling
>> > metadata_dst_alloc, as the metadata is local to the skb and the dst
>> > refcount was already initialized to 1. This might be an issue with
>> > commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
>> > Pravin, he might recall if there was a reason to increase the refcount.
>> 
>> I tried to remove the dst_hold(), but that caused underflows[0], so I
>> guess the current reference counting is required at least for some
>> use-cases.
>> 
>> [0]:
>> 
>> [  118.803011] dst_release: dst:000000001fc13e61 refcnt:-2                             
>
> [...]
>
> I finally had some time to look at this. Does the diff below fix your
> issue?

Yes, with the patch applied I'm no longer able to reproduce memory leak.
Thanks for fixing this!

>
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 14efa0ded75d..90a7a4daea9c 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  {
>         struct metadata_dst *md_dst = skb_metadata_dst(skb);
> -       int md_size;
>         struct metadata_dst *new_md;
> +       int md_size, ret;
>  
>         if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>                 return ERR_PTR(-EINVAL);
> @@ -123,8 +123,15 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>  
>         memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>                sizeof(struct ip_tunnel_info) + md_size);
> +#ifdef CONFIG_DST_CACHE
> +       ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> +       if (ret) {
> +               metadata_dst_free(new_md);
> +               return ERR_PTR(ret);
> +       }
> +#endif
> +
>         skb_dst_drop(skb);
> -       dst_hold(&new_md->dst);
>         skb_dst_set(skb, &new_md->dst);
>         return new_md;
>  }
>
> Antoine


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-31 11:26           ` Vlad Buslov
@ 2022-01-31 13:26             ` Antoine Tenart
  2022-01-31 14:04               ` Stefano Brivio
  2022-01-31 17:55               ` Vlad Buslov
  0 siblings, 2 replies; 14+ messages in thread
From: Antoine Tenart @ 2022-01-31 13:26 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

Quoting Vlad Buslov (2022-01-31 12:26:47)
> On Fri 28 Jan 2022 at 19:01, Antoine Tenart <atenart@kernel.org> wrote:
> >
> > I finally had some time to look at this. Does the diff below fix your
> > issue?
> 
> Yes, with the patch applied I'm no longer able to reproduce memory leak.
> Thanks for fixing this!

Thanks for testing. I'll send a formal patch, can I add your Tested-by?

Also, do you know how to trigger the following code path in OVS
https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L944
? Would be good (not required) to test it, to ensure the fix doesn't
break it.

Thanks,
Antoine

> > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> > index 14efa0ded75d..90a7a4daea9c 100644
> > --- a/include/net/dst_metadata.h
> > +++ b/include/net/dst_metadata.h
> > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
> >  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >  {
> >         struct metadata_dst *md_dst = skb_metadata_dst(skb);
> > -       int md_size;
> >         struct metadata_dst *new_md;
> > +       int md_size, ret;
> >  
> >         if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
> >                 return ERR_PTR(-EINVAL);
> > @@ -123,8 +123,15 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >  
> >         memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >                sizeof(struct ip_tunnel_info) + md_size);
> > +#ifdef CONFIG_DST_CACHE
> > +       ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
> > +       if (ret) {
> > +               metadata_dst_free(new_md);
> > +               return ERR_PTR(ret);
> > +       }
> > +#endif
> > +
> >         skb_dst_drop(skb);
> > -       dst_hold(&new_md->dst);
> >         skb_dst_set(skb, &new_md->dst);
> >         return new_md;
> >  }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-31 13:26             ` Antoine Tenart
@ 2022-01-31 14:04               ` Stefano Brivio
  2022-01-31 14:42                 ` Antoine Tenart
  2022-01-31 17:55               ` Vlad Buslov
  1 sibling, 1 reply; 14+ messages in thread
From: Stefano Brivio @ 2022-01-31 14:04 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: Vlad Buslov, davem, kuba, echaudro, netdev, pshelar

Hi Antoine,

On Mon, 31 Jan 2022 14:26:47 +0100
Antoine Tenart <atenart@kernel.org> wrote:

> Quoting Vlad Buslov (2022-01-31 12:26:47)
> > On Fri 28 Jan 2022 at 19:01, Antoine Tenart <atenart@kernel.org> wrote:  
> > >
> > > I finally had some time to look at this. Does the diff below fix your
> > > issue?  
> > 
> > Yes, with the patch applied I'm no longer able to reproduce memory leak.
> > Thanks for fixing this!  
> 
> Thanks for testing. I'll send a formal patch, can I add your Tested-by?
> 
> Also, do you know how to trigger the following code path in OVS
> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L944

I guess the selftests pmtu_ipv{4,6}_ovs_vxlan{4,6}_exception and
pmtu_ipv{4,6}_ovs_geneve{4,6}_exception from net/pmtu.sh:

	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/net/pmtu.sh?id=ece1278a9b81bdfc088f087f8372a072b7010956#n81

should trigger that path once or twice per test, but I haven't tried
recently.

-- 
Stefano


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-31 14:04               ` Stefano Brivio
@ 2022-01-31 14:42                 ` Antoine Tenart
  0 siblings, 0 replies; 14+ messages in thread
From: Antoine Tenart @ 2022-01-31 14:42 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Vlad Buslov, davem, kuba, echaudro, netdev, pshelar

Hi Stefano,

Quoting Stefano Brivio (2022-01-31 15:04:18)
> On Mon, 31 Jan 2022 14:26:47 +0100
> Antoine Tenart <atenart@kernel.org> wrote:
> > Quoting Vlad Buslov (2022-01-31 12:26:47)
> > > On Fri 28 Jan 2022 at 19:01, Antoine Tenart <atenart@kernel.org> wrote:  
> > > >
> > > > I finally had some time to look at this. Does the diff below fix your
> > > > issue?  
> > > 
> > > Yes, with the patch applied I'm no longer able to reproduce memory leak.
> > > Thanks for fixing this!  
> > 
> > Thanks for testing. I'll send a formal patch, can I add your Tested-by?
> > 
> > Also, do you know how to trigger the following code path in OVS
> > https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L944
> 
> I guess the selftests pmtu_ipv{4,6}_ovs_vxlan{4,6}_exception and
> pmtu_ipv{4,6}_ovs_geneve{4,6}_exception from net/pmtu.sh:
> 
>         https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/net/pmtu.sh?id=ece1278a9b81bdfc088f087f8372a072b7010956#n81
> 
> should trigger that path once or twice per test, but I haven't tried
> recently.

Thanks for the suggestion! I did run all 8 ptmu_*_ovs_* tests, they all
passed but didn't trigger a call to dev_fill_metadata_dst in
net/openvswitch/actions.c.

To be sure there wasn't a misunderstanding: I did test the PTMU code
path in Geneve/VXLAN (while one of the endpoint is an OVS port); but the
net/openvswitch/actions.c code path is something different, used to
retrieve tunnel egress info. I don't know when/how this is used by OVS.

Thanks,
Antoine

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
  2022-01-31 13:26             ` Antoine Tenart
  2022-01-31 14:04               ` Stefano Brivio
@ 2022-01-31 17:55               ` Vlad Buslov
  1 sibling, 0 replies; 14+ messages in thread
From: Vlad Buslov @ 2022-01-31 17:55 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: davem, kuba, echaudro, sbrivio, netdev, pshelar

On Mon 31 Jan 2022 at 15:26, Antoine Tenart <atenart@kernel.org> wrote:
> Quoting Vlad Buslov (2022-01-31 12:26:47)
>> On Fri 28 Jan 2022 at 19:01, Antoine Tenart <atenart@kernel.org> wrote:
>> >
>> > I finally had some time to look at this. Does the diff below fix your
>> > issue?
>> 
>> Yes, with the patch applied I'm no longer able to reproduce memory leak.
>> Thanks for fixing this!
>
> Thanks for testing. I'll send a formal patch, can I add your Tested-by?

Sure!

Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>

>
> Also, do you know how to trigger the following code path in OVS
> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L944
> ? Would be good (not required) to test it, to ensure the fix doesn't
> break it.

Sorry, I don't. We mostly concentrate on testing hardware
offload-specific code paths (e.g. TC).

>
> Thanks,
> Antoine
>
>> > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> > index 14efa0ded75d..90a7a4daea9c 100644
>> > --- a/include/net/dst_metadata.h
>> > +++ b/include/net/dst_metadata.h
>> > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size)
>> >  static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>> >  {
>> >         struct metadata_dst *md_dst = skb_metadata_dst(skb);
>> > -       int md_size;
>> >         struct metadata_dst *new_md;
>> > +       int md_size, ret;
>> >  
>> >         if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
>> >                 return ERR_PTR(-EINVAL);
>> > @@ -123,8 +123,15 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>> >  
>> >         memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>> >                sizeof(struct ip_tunnel_info) + md_size);
>> > +#ifdef CONFIG_DST_CACHE
>> > +       ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC);
>> > +       if (ret) {
>> > +               metadata_dst_free(new_md);
>> > +               return ERR_PTR(ret);
>> > +       }
>> > +#endif
>> > +
>> >         skb_dst_drop(skb);
>> > -       dst_hold(&new_md->dst);
>> >         skb_dst_set(skb, &new_md->dst);
>> >         return new_md;
>> >  }


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-31 17:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
2022-01-20  7:38   ` Vlad Buslov
2022-01-20 10:27     ` Antoine Tenart
2022-01-20 12:58       ` Vlad Buslov
2022-01-28 17:01         ` Antoine Tenart
2022-01-31 11:26           ` Vlad Buslov
2022-01-31 13:26             ` Antoine Tenart
2022-01-31 14:04               ` Stefano Brivio
2022-01-31 14:42                 ` Antoine Tenart
2022-01-31 17:55               ` Vlad Buslov
2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
2021-03-26  0:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).