All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: take care of bonding in build_skb_flow_key (v3)
@ 2016-01-20  5:32 Wengang Wang
  2016-01-20  6:24 ` zhuyj
  2016-01-20 15:18 ` Sabrina Dubroca
  0 siblings, 2 replies; 13+ messages in thread
From: Wengang Wang @ 2016-01-20  5:32 UTC (permalink / raw)
  To: netdev; +Cc: wen.gang.wang, jay.vosburgh

In a bonding setting, we determines fragment size according to MTU and
PMTU associated to the bonding master. If the slave finds the fragment
size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
passing _skb_ and _pmtu_, trying to update the path MTU.
Problem is that the target device that function ip_rt_update_pmtu actually
tries to update is the slave (skb->dev), not the master. Thus since no
PMTU change happens on master, the fragment size for later packets doesn't
change so all later fragments/packets are dropped too.

The fix is letting build_skb_flow_key() take care of the transition of
device index from bonding slave to the master. That makes the master become
the target device that ip_rt_update_pmtu tries to update PMTU to.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 net/ipv4/route.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 85f184e..c59fb0d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
 			       const struct sock *sk)
 {
 	const struct iphdr *iph = ip_hdr(skb);
-	int oif = skb->dev->ifindex;
+	struct net_device *master = NULL;
 	u8 tos = RT_TOS(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
+	int oif;
+
+	if (skb->dev->flags & IFF_SLAVE) {
+		rtnl_lock();
+		master = netdev_master_upper_dev_get(skb->dev);
+		rtnl_unlock();
+	}
+	if (master)
+		oif = master->ifindex;
+	else
+		oif = skb->dev->ifindex;
 
 	__build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  5:32 [PATCH] net: take care of bonding in build_skb_flow_key (v3) Wengang Wang
@ 2016-01-20  6:24 ` zhuyj
  2016-01-20  6:29   ` zhuyj
  2016-01-20  7:38   ` Wengang Wang
  2016-01-20 15:18 ` Sabrina Dubroca
  1 sibling, 2 replies; 13+ messages in thread
From: zhuyj @ 2016-01-20  6:24 UTC (permalink / raw)
  To: Wengang Wang, netdev; +Cc: jay.vosburgh

On 01/20/2016 01:32 PM, Wengang Wang wrote:
> In a bonding setting, we determines fragment size according to MTU and
> PMTU associated to the bonding master. If the slave finds the fragment
> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
> passing _skb_ and _pmtu_, trying to update the path MTU.
> Problem is that the target device that function ip_rt_update_pmtu actually
> tries to update is the slave (skb->dev), not the master. Thus since no
> PMTU change happens on master, the fragment size for later packets doesn't
> change so all later fragments/packets are dropped too.
>
> The fix is letting build_skb_flow_key() take care of the transition of
> device index from bonding slave to the master. That makes the master become
> the target device that ip_rt_update_pmtu tries to update PMTU to.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>   net/ipv4/route.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 85f184e..c59fb0d 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>   			       const struct sock *sk)
>   {
>   	const struct iphdr *iph = ip_hdr(skb);
> -	int oif = skb->dev->ifindex;
> +	struct net_device *master = NULL;
>   	u8 tos = RT_TOS(iph->tos);
>   	u8 prot = iph->protocol;
>   	u32 mark = skb->mark;
> +	int oif;
> +
> +	if (skb->dev->flags & IFF_SLAVE) {
> +		rtnl_lock();
> +		master = netdev_master_upper_dev_get(skb->dev);
> +		rtnl_unlock();
update_pmtu is called very frequently. Is it appropriate to use 
rtnl_lock here?
That is, rtnl_lock is called frequently. Maybe other functions have 
little chance to call rtnl_lock.

Best Regards!
Zhu Yanjun
> +	}
> +	if (master)
> +		oif = master->ifindex;
> +	else
> +		oif = skb->dev->ifindex;
>   
>   	__build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>   }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  6:24 ` zhuyj
@ 2016-01-20  6:29   ` zhuyj
  2016-01-20  6:32     ` zhuyj
  2016-01-20  7:38   ` Wengang Wang
  1 sibling, 1 reply; 13+ messages in thread
From: zhuyj @ 2016-01-20  6:29 UTC (permalink / raw)
  To: Wengang Wang, netdev; +Cc: jay.vosburgh

On 01/20/2016 02:24 PM, zhuyj wrote:
> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>> In a bonding setting, we determines fragment size according to MTU and
>> PMTU associated to the bonding master. If the slave finds the fragment
>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>> passing _skb_ and _pmtu_, trying to update the path MTU.
>> Problem is that the target device that function ip_rt_update_pmtu 
>> actually
>> tries to update is the slave (skb->dev), not the master. Thus since no
>> PMTU change happens on master, the fragment size for later packets 
>> doesn't
>> change so all later fragments/packets are dropped too.
>>
>> The fix is letting build_skb_flow_key() take care of the transition of
>> device index from bonding slave to the master. That makes the master 
>> become
>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>>   net/ipv4/route.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 85f184e..c59fb0d 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>> *fl4, const struct sk_buff *skb,
>>                      const struct sock *sk)
>>   {
>>       const struct iphdr *iph = ip_hdr(skb);
>> -    int oif = skb->dev->ifindex;
>> +    struct net_device *master = NULL;
>>       u8 tos = RT_TOS(iph->tos);
>>       u8 prot = iph->protocol;
>>       u32 mark = skb->mark;
>> +    int oif;
>> +
>> +    if (skb->dev->flags & IFF_SLAVE) {
>> +        rtnl_lock();
>> +        master = netdev_master_upper_dev_get(skb->dev);
>> +        rtnl_unlock();
> update_pmtu is called very frequently. Is it appropriate to use 
> rtnl_lock here?
> That is, rtnl_lock is called frequently. Maybe other functions have 
> little chance to call rtnl_lock.

Maybe this function netdev_upper_get_next_dev_rcu is better? I am not sure.

>
> Best Regards!
> Zhu Yanjun
>> +    }
>> +    if (master)
>> +        oif = master->ifindex;
>> +    else
>> +        oif = skb->dev->ifindex;
>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>   }
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  6:29   ` zhuyj
@ 2016-01-20  6:32     ` zhuyj
  0 siblings, 0 replies; 13+ messages in thread
From: zhuyj @ 2016-01-20  6:32 UTC (permalink / raw)
  To: Wengang Wang, netdev; +Cc: jay.vosburgh

On 01/20/2016 02:29 PM, zhuyj wrote:
> On 01/20/2016 02:24 PM, zhuyj wrote:
>> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>>> In a bonding setting, we determines fragment size according to MTU and
>>> PMTU associated to the bonding master. If the slave finds the fragment
>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>> Problem is that the target device that function ip_rt_update_pmtu 
>>> actually
>>> tries to update is the slave (skb->dev), not the master. Thus since no
>>> PMTU change happens on master, the fragment size for later packets 
>>> doesn't
>>> change so all later fragments/packets are dropped too.
>>>
>>> The fix is letting build_skb_flow_key() take care of the transition of
>>> device index from bonding slave to the master. That makes the master 
>>> become
>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> ---
>>>   net/ipv4/route.c | 13 ++++++++++++-
>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>> index 85f184e..c59fb0d 100644
>>> --- a/net/ipv4/route.c
>>> +++ b/net/ipv4/route.c
>>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>>> *fl4, const struct sk_buff *skb,
>>>                      const struct sock *sk)
>>>   {
>>>       const struct iphdr *iph = ip_hdr(skb);
>>> -    int oif = skb->dev->ifindex;
>>> +    struct net_device *master = NULL;
>>>       u8 tos = RT_TOS(iph->tos);
>>>       u8 prot = iph->protocol;
>>>       u32 mark = skb->mark;
>>> +    int oif;
>>> +
>>> +    if (skb->dev->flags & IFF_SLAVE) {
>>> +        rtnl_lock();
>>> +        master = netdev_master_upper_dev_get(skb->dev);
>>> +        rtnl_unlock();
>> update_pmtu is called very frequently. Is it appropriate to use 
>> rtnl_lock here?
>> That is, rtnl_lock is called frequently. Maybe other functions have 
>> little chance to call rtnl_lock.
>
> Maybe this function netdev_master_upper_dev_get_rcu is better? I am 
> not sure.

Maybe this function netdev_master_upper_dev_get_rcu is better? I am not 
sure.

>
>>
>> Best Regards!
>> Zhu Yanjun
>>> +    }
>>> +    if (master)
>>> +        oif = master->ifindex;
>>> +    else
>>> +        oif = skb->dev->ifindex;
>>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>>   }
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  6:24 ` zhuyj
  2016-01-20  6:29   ` zhuyj
@ 2016-01-20  7:38   ` Wengang Wang
  2016-01-20  7:54     ` zhuyj
  1 sibling, 1 reply; 13+ messages in thread
From: Wengang Wang @ 2016-01-20  7:38 UTC (permalink / raw)
  To: zhuyj, netdev; +Cc: jay.vosburgh



在 2016年01月20日 14:24, zhuyj 写道:
> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>> In a bonding setting, we determines fragment size according to MTU and
>> PMTU associated to the bonding master. If the slave finds the fragment
>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>> passing _skb_ and _pmtu_, trying to update the path MTU.
>> Problem is that the target device that function ip_rt_update_pmtu 
>> actually
>> tries to update is the slave (skb->dev), not the master. Thus since no
>> PMTU change happens on master, the fragment size for later packets 
>> doesn't
>> change so all later fragments/packets are dropped too.
>>
>> The fix is letting build_skb_flow_key() take care of the transition of
>> device index from bonding slave to the master. That makes the master 
>> become
>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>>   net/ipv4/route.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 85f184e..c59fb0d 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>> *fl4, const struct sk_buff *skb,
>>                      const struct sock *sk)
>>   {
>>       const struct iphdr *iph = ip_hdr(skb);
>> -    int oif = skb->dev->ifindex;
>> +    struct net_device *master = NULL;
>>       u8 tos = RT_TOS(iph->tos);
>>       u8 prot = iph->protocol;
>>       u32 mark = skb->mark;
>> +    int oif;
>> +
>> +    if (skb->dev->flags & IFF_SLAVE) {
>> +        rtnl_lock();
>> +        master = netdev_master_upper_dev_get(skb->dev);
>> +        rtnl_unlock();
> update_pmtu is called very frequently. Is it appropriate to use 
> rtnl_lock here?
By "very frequently", how frequently it is expected? And what situation 
can cause that?
For my case, the update_pmtu is called only once.

thanks,
wengang

> That is, rtnl_lock is called frequently. Maybe other functions have 
> little chance to call rtnl_lock.
>
> Best Regards!
> Zhu Yanjun
>> +    }
>> +    if (master)
>> +        oif = master->ifindex;
>> +    else
>> +        oif = skb->dev->ifindex;
>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>   }
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  7:38   ` Wengang Wang
@ 2016-01-20  7:54     ` zhuyj
  2016-01-20  9:47       ` Wengang Wang
  0 siblings, 1 reply; 13+ messages in thread
From: zhuyj @ 2016-01-20  7:54 UTC (permalink / raw)
  To: Wengang Wang, netdev; +Cc: jay.vosburgh

On 01/20/2016 03:38 PM, Wengang Wang wrote:
>
>
> 在 2016年01月20日 14:24, zhuyj 写道:
>> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>>> In a bonding setting, we determines fragment size according to MTU and
>>> PMTU associated to the bonding master. If the slave finds the fragment
>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>> Problem is that the target device that function ip_rt_update_pmtu 
>>> actually
>>> tries to update is the slave (skb->dev), not the master. Thus since no
>>> PMTU change happens on master, the fragment size for later packets 
>>> doesn't
>>> change so all later fragments/packets are dropped too.
>>>
>>> The fix is letting build_skb_flow_key() take care of the transition of
>>> device index from bonding slave to the master. That makes the master 
>>> become
>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> ---
>>>   net/ipv4/route.c | 13 ++++++++++++-
>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>> index 85f184e..c59fb0d 100644
>>> --- a/net/ipv4/route.c
>>> +++ b/net/ipv4/route.c
>>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>>> *fl4, const struct sk_buff *skb,
>>>                      const struct sock *sk)
>>>   {
>>>       const struct iphdr *iph = ip_hdr(skb);
>>> -    int oif = skb->dev->ifindex;
>>> +    struct net_device *master = NULL;
>>>       u8 tos = RT_TOS(iph->tos);
>>>       u8 prot = iph->protocol;
>>>       u32 mark = skb->mark;
>>> +    int oif;
>>> +
>>> +    if (skb->dev->flags & IFF_SLAVE) {
>>> +        rtnl_lock();
>>> +        master = netdev_master_upper_dev_get(skb->dev);
>>> +        rtnl_unlock();
>> update_pmtu is called very frequently. Is it appropriate to use 
>> rtnl_lock here?
> By "very frequently", how frequently it is expected? And what 
> situation can cause that?
> For my case, the update_pmtu is called only once.
ip_tunnel_xmit

Zhu Yanjun

>
> thanks,
> wengang
>
>> That is, rtnl_lock is called frequently. Maybe other functions have 
>> little chance to call rtnl_lock.
>>
>> Best Regards!
>> Zhu Yanjun
>>> +    }
>>> +    if (master)
>>> +        oif = master->ifindex;
>>> +    else
>>> +        oif = skb->dev->ifindex;
>>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>>   }
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  7:54     ` zhuyj
@ 2016-01-20  9:47       ` Wengang Wang
  2016-01-20  9:56         ` zhuyj
  0 siblings, 1 reply; 13+ messages in thread
From: Wengang Wang @ 2016-01-20  9:47 UTC (permalink / raw)
  To: zhuyj, netdev; +Cc: jay.vosburgh



在 2016年01月20日 15:54, zhuyj 写道:
> On 01/20/2016 03:38 PM, Wengang Wang wrote:
>>
>>
>> 在 2016年01月20日 14:24, zhuyj 写道:
>>> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>>>> In a bonding setting, we determines fragment size according to MTU and
>>>> PMTU associated to the bonding master. If the slave finds the fragment
>>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>>> Problem is that the target device that function ip_rt_update_pmtu 
>>>> actually
>>>> tries to update is the slave (skb->dev), not the master. Thus since no
>>>> PMTU change happens on master, the fragment size for later packets 
>>>> doesn't
>>>> change so all later fragments/packets are dropped too.
>>>>
>>>> The fix is letting build_skb_flow_key() take care of the transition of
>>>> device index from bonding slave to the master. That makes the 
>>>> master become
>>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>>
>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>> ---
>>>>   net/ipv4/route.c | 13 ++++++++++++-
>>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>> index 85f184e..c59fb0d 100644
>>>> --- a/net/ipv4/route.c
>>>> +++ b/net/ipv4/route.c
>>>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>>>> *fl4, const struct sk_buff *skb,
>>>>                      const struct sock *sk)
>>>>   {
>>>>       const struct iphdr *iph = ip_hdr(skb);
>>>> -    int oif = skb->dev->ifindex;
>>>> +    struct net_device *master = NULL;
>>>>       u8 tos = RT_TOS(iph->tos);
>>>>       u8 prot = iph->protocol;
>>>>       u32 mark = skb->mark;
>>>> +    int oif;
>>>> +
>>>> +    if (skb->dev->flags & IFF_SLAVE) {
>>>> +        rtnl_lock();
>>>> +        master = netdev_master_upper_dev_get(skb->dev);
>>>> +        rtnl_unlock();
>>> update_pmtu is called very frequently. Is it appropriate to use 
>>> rtnl_lock here?
>> By "very frequently", how frequently it is expected? And what 
>> situation can cause that?
>> For my case, the update_pmtu is called only once.
> ip_tunnel_xmit
>
Can you please explain with more details?

thanks,
wengang


> Zhu Yanjun
>
>>
>> thanks,
>> wengang
>>
>>> That is, rtnl_lock is called frequently. Maybe other functions have 
>>> little chance to call rtnl_lock.
>>>
>>> Best Regards!
>>> Zhu Yanjun
>>>> +    }
>>>> +    if (master)
>>>> +        oif = master->ifindex;
>>>> +    else
>>>> +        oif = skb->dev->ifindex;
>>>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>>>   }
>>>
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  9:47       ` Wengang Wang
@ 2016-01-20  9:56         ` zhuyj
  2016-01-21  2:40           ` Wengang Wang
  0 siblings, 1 reply; 13+ messages in thread
From: zhuyj @ 2016-01-20  9:56 UTC (permalink / raw)
  To: Wengang Wang, netdev; +Cc: jay.vosburgh

On 01/20/2016 05:47 PM, Wengang Wang wrote:
>
>
> 在 2016年01月20日 15:54, zhuyj 写道:
>> On 01/20/2016 03:38 PM, Wengang Wang wrote:
>>>
>>>
>>> 在 2016年01月20日 14:24, zhuyj 写道:
>>>> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>>>>> In a bonding setting, we determines fragment size according to MTU 
>>>>> and
>>>>> PMTU associated to the bonding master. If the slave finds the 
>>>>> fragment
>>>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>>>> Problem is that the target device that function ip_rt_update_pmtu 
>>>>> actually
>>>>> tries to update is the slave (skb->dev), not the master. Thus 
>>>>> since no
>>>>> PMTU change happens on master, the fragment size for later packets 
>>>>> doesn't
>>>>> change so all later fragments/packets are dropped too.
>>>>>
>>>>> The fix is letting build_skb_flow_key() take care of the 
>>>>> transition of
>>>>> device index from bonding slave to the master. That makes the 
>>>>> master become
>>>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>>>
>>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>>> ---
>>>>>   net/ipv4/route.c | 13 ++++++++++++-
>>>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>>> index 85f184e..c59fb0d 100644
>>>>> --- a/net/ipv4/route.c
>>>>> +++ b/net/ipv4/route.c
>>>>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 
>>>>> *fl4, const struct sk_buff *skb,
>>>>>                      const struct sock *sk)
>>>>>   {
>>>>>       const struct iphdr *iph = ip_hdr(skb);
>>>>> -    int oif = skb->dev->ifindex;
>>>>> +    struct net_device *master = NULL;
>>>>>       u8 tos = RT_TOS(iph->tos);
>>>>>       u8 prot = iph->protocol;
>>>>>       u32 mark = skb->mark;
>>>>> +    int oif;
>>>>> +
>>>>> +    if (skb->dev->flags & IFF_SLAVE) {
>>>>> +        rtnl_lock();
>>>>> +        master = netdev_master_upper_dev_get(skb->dev);
>>>>> +        rtnl_unlock();
>>>> update_pmtu is called very frequently. Is it appropriate to use 
>>>> rtnl_lock here?
>>> By "very frequently", how frequently it is expected? And what 
>>> situation can cause that?
>>> For my case, the update_pmtu is called only once.
>> ip_tunnel_xmit
>>
> Can you please explain with more details?

dev_queue_xmit->ipip_tunnel_xmit->ip_tunnel_xmit->tnl_update_pmtu-> 
skb_dst(skb)->ops->update_pmtu

>
> thanks,
> wengang
>
>
>> Zhu Yanjun
>>
>>>
>>> thanks,
>>> wengang
>>>
>>>> That is, rtnl_lock is called frequently. Maybe other functions have 
>>>> little chance to call rtnl_lock.
>>>>
>>>> Best Regards!
>>>> Zhu Yanjun
>>>>> +    }
>>>>> +    if (master)
>>>>> +        oif = master->ifindex;
>>>>> +    else
>>>>> +        oif = skb->dev->ifindex;
>>>>>         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>>>>>   }
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  5:32 [PATCH] net: take care of bonding in build_skb_flow_key (v3) Wengang Wang
  2016-01-20  6:24 ` zhuyj
@ 2016-01-20 15:18 ` Sabrina Dubroca
  2016-01-21  5:15   ` Wengang Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Sabrina Dubroca @ 2016-01-20 15:18 UTC (permalink / raw)
  To: Wengang Wang; +Cc: netdev, jay.vosburgh

2016-01-20, 13:32:13 +0800, Wengang Wang wrote:
> In a bonding setting, we determines fragment size according to MTU and
> PMTU associated to the bonding master. If the slave finds the fragment
> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
> passing _skb_ and _pmtu_, trying to update the path MTU.
> Problem is that the target device that function ip_rt_update_pmtu actually
> tries to update is the slave (skb->dev), not the master. Thus since no
> PMTU change happens on master, the fragment size for later packets doesn't
> change so all later fragments/packets are dropped too.
> 
> The fix is letting build_skb_flow_key() take care of the transition of
> device index from bonding slave to the master. That makes the master become
> the target device that ip_rt_update_pmtu tries to update PMTU to.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  net/ipv4/route.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 85f184e..c59fb0d 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>  			       const struct sock *sk)
>  {
>  	const struct iphdr *iph = ip_hdr(skb);
> -	int oif = skb->dev->ifindex;
> +	struct net_device *master = NULL;
>  	u8 tos = RT_TOS(iph->tos);
>  	u8 prot = iph->protocol;
>  	u32 mark = skb->mark;
> +	int oif;
> +
> +	if (skb->dev->flags & IFF_SLAVE) {

Maybe use netif_is_bond_slave here instead, since you have this
problem with bonding slaves?


> +		rtnl_lock();
> +		master = netdev_master_upper_dev_get(skb->dev);
> +		rtnl_unlock();
> +	}

As zhuyj said, this is called from dev_queue_xmit, so you cannot take
rtnl_lock here.

> +	if (master)
> +		oif = master->ifindex;

You cannot dereference master after you release the rtnl lock.

So it would probably be best to use netdev_master_upper_dev_get_rcu,
as zhuyj suggested earlier, and make sure that you only use the result
between rcu_read_lock()/rcu_read_unlock():

    rcu_read_lock();
    master = netdev_master_upper_dev_get_rcu(skb->dev);
    if (master)
        oif = master->ifindex;
    rcu_read_unlock();


Thanks,

-- 
Sabrina

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20  9:56         ` zhuyj
@ 2016-01-21  2:40           ` Wengang Wang
  2016-01-21  4:05             ` Jay Vosburgh
  0 siblings, 1 reply; 13+ messages in thread
From: Wengang Wang @ 2016-01-21  2:40 UTC (permalink / raw)
  To: zhuyj, netdev; +Cc: jay.vosburgh



在 2016年01月20日 17:56, zhuyj 写道:
> On 01/20/2016 05:47 PM, Wengang Wang wrote:
>>
>>
>> 在 2016年01月20日 15:54, zhuyj 写道:
>>> On 01/20/2016 03:38 PM, Wengang Wang wrote:
>>>>
>>>>
>>>> 在 2016年01月20日 14:24, zhuyj 写道:
>>>>> On 01/20/2016 01:32 PM, Wengang Wang wrote:
>>>>>> In a bonding setting, we determines fragment size according to 
>>>>>> MTU and
>>>>>> PMTU associated to the bonding master. If the slave finds the 
>>>>>> fragment
>>>>>> size is too big, it drops the fragment and calls 
>>>>>> ip_rt_update_pmtu(),
>>>>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>>>>> Problem is that the target device that function ip_rt_update_pmtu 
>>>>>> actually
>>>>>> tries to update is the slave (skb->dev), not the master. Thus 
>>>>>> since no
>>>>>> PMTU change happens on master, the fragment size for later 
>>>>>> packets doesn't
>>>>>> change so all later fragments/packets are dropped too.
>>>>>>
>>>>>> The fix is letting build_skb_flow_key() take care of the 
>>>>>> transition of
>>>>>> device index from bonding slave to the master. That makes the 
>>>>>> master become
>>>>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>>>>
>>>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>>>> ---
>>>>>>   net/ipv4/route.c | 13 ++++++++++++-
>>>>>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>>>> index 85f184e..c59fb0d 100644
>>>>>> --- a/net/ipv4/route.c
>>>>>> +++ b/net/ipv4/route.c
>>>>>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct 
>>>>>> flowi4 *fl4, const struct sk_buff *skb,
>>>>>>                      const struct sock *sk)
>>>>>>   {
>>>>>>       const struct iphdr *iph = ip_hdr(skb);
>>>>>> -    int oif = skb->dev->ifindex;
>>>>>> +    struct net_device *master = NULL;
>>>>>>       u8 tos = RT_TOS(iph->tos);
>>>>>>       u8 prot = iph->protocol;
>>>>>>       u32 mark = skb->mark;
>>>>>> +    int oif;
>>>>>> +
>>>>>> +    if (skb->dev->flags & IFF_SLAVE) {
>>>>>> +        rtnl_lock();
>>>>>> +        master = netdev_master_upper_dev_get(skb->dev);
>>>>>> +        rtnl_unlock();
>>>>> update_pmtu is called very frequently. Is it appropriate to use 
>>>>> rtnl_lock here?
>>>> By "very frequently", how frequently it is expected? And what 
>>>> situation can cause that?
>>>> For my case, the update_pmtu is called only once.
>>> ip_tunnel_xmit
>>>
>> Can you please explain with more details?
>
> dev_queue_xmit->ipip_tunnel_xmit->ip_tunnel_xmit->tnl_update_pmtu-> 
> skb_dst(skb)->ops->update_pmtu
For ipip,  yes seems update_pmtu is called in line for each call of 
queue_xmit.  Do you know if it's a good configuration for ipip + bonding?
Other's comment and suggestion?

thanks,
wengang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-21  2:40           ` Wengang Wang
@ 2016-01-21  4:05             ` Jay Vosburgh
  2016-01-21  5:17               ` Wengang Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Jay Vosburgh @ 2016-01-21  4:05 UTC (permalink / raw)
  To: Wengang Wang; +Cc: zhuyj, netdev

Wengang Wang <wen.gang.wang@oracle.com> wrote:
[...]
>For ipip,  yes seems update_pmtu is called in line for each call of
>queue_xmit.  Do you know if it's a good configuration for ipip + bonding?

	Yes, it is.

>Other's comment and suggestion?

	I agree with Sabrina Dubroca <sd@queasysnail.net>'s suggestions
from yesterday.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-20 15:18 ` Sabrina Dubroca
@ 2016-01-21  5:15   ` Wengang Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Wengang Wang @ 2016-01-21  5:15 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev, jay.vosburgh



在 2016年01月20日 23:18, Sabrina Dubroca 写道:
> 2016-01-20, 13:32:13 +0800, Wengang Wang wrote:
>> In a bonding setting, we determines fragment size according to MTU and
>> PMTU associated to the bonding master. If the slave finds the fragment
>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>> passing _skb_ and _pmtu_, trying to update the path MTU.
>> Problem is that the target device that function ip_rt_update_pmtu actually
>> tries to update is the slave (skb->dev), not the master. Thus since no
>> PMTU change happens on master, the fragment size for later packets doesn't
>> change so all later fragments/packets are dropped too.
>>
>> The fix is letting build_skb_flow_key() take care of the transition of
>> device index from bonding slave to the master. That makes the master become
>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>>   net/ipv4/route.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 85f184e..c59fb0d 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>>   			       const struct sock *sk)
>>   {
>>   	const struct iphdr *iph = ip_hdr(skb);
>> -	int oif = skb->dev->ifindex;
>> +	struct net_device *master = NULL;
>>   	u8 tos = RT_TOS(iph->tos);
>>   	u8 prot = iph->protocol;
>>   	u32 mark = skb->mark;
>> +	int oif;
>> +
>> +	if (skb->dev->flags & IFF_SLAVE) {
> Maybe use netif_is_bond_slave here instead, since you have this
> problem with bonding slaves?
>
>
>> +		rtnl_lock();
>> +		master = netdev_master_upper_dev_get(skb->dev);
>> +		rtnl_unlock();
>> +	}
> As zhuyj said, this is called from dev_queue_xmit, so you cannot take
> rtnl_lock here.
>
>> +	if (master)
>> +		oif = master->ifindex;
> You cannot dereference master after you release the rtnl lock.
>
> So it would probably be best to use netdev_master_upper_dev_get_rcu,
> as zhuyj suggested earlier, and make sure that you only use the result
> between rcu_read_lock()/rcu_read_unlock():
>
>      rcu_read_lock();
>      master = netdev_master_upper_dev_get_rcu(skb->dev);
>      if (master)
>          oif = master->ifindex;
>      rcu_read_unlock();
>
OK, thanks for advising.

thanks,
wengang

> Thanks,
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
  2016-01-21  4:05             ` Jay Vosburgh
@ 2016-01-21  5:17               ` Wengang Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Wengang Wang @ 2016-01-21  5:17 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: zhuyj, netdev



在 2016年01月21日 12:05, Jay Vosburgh 写道:
> Wengang Wang <wen.gang.wang@oracle.com> wrote:
> [...]
>> For ipip,  yes seems update_pmtu is called in line for each call of
>> queue_xmit.  Do you know if it's a good configuration for ipip + bonding?
> 	Yes, it is.
>
>> Other's comment and suggestion?
> 	I agree with Sabrina Dubroca <sd@queasysnail.net>'s suggestions
> from yesterday.

Thank you! I will follow.

thanks,
wengang
> 	-J
>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-21  5:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20  5:32 [PATCH] net: take care of bonding in build_skb_flow_key (v3) Wengang Wang
2016-01-20  6:24 ` zhuyj
2016-01-20  6:29   ` zhuyj
2016-01-20  6:32     ` zhuyj
2016-01-20  7:38   ` Wengang Wang
2016-01-20  7:54     ` zhuyj
2016-01-20  9:47       ` Wengang Wang
2016-01-20  9:56         ` zhuyj
2016-01-21  2:40           ` Wengang Wang
2016-01-21  4:05             ` Jay Vosburgh
2016-01-21  5:17               ` Wengang Wang
2016-01-20 15:18 ` Sabrina Dubroca
2016-01-21  5:15   ` Wengang Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.