All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fan Du <fan.du@windriver.com>
To: Vlad Yasevich <vyasevich@gmail.com>
Cc: <nhorman@tuxdriver.com>, <nicolas.dichtel@6wind.com>,
	<davem@davemloft.net>, <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH] sctp: Don't lookup dst if transport dst is still valid
Date: Wed, 17 Jul 2013 15:04:29 +0800	[thread overview]
Message-ID: <51E641FD.4010908@windriver.com> (raw)
In-Reply-To: <51E14652.7090001@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 13032 bytes --]



On 2013年07月13日 20:21, Vlad Yasevich wrote:
> On 07/10/2013 01:26 AM, Fan Du wrote:
>>
>>
>> On 2013年07月09日 23:11, Vlad Yasevich wrote:
>>> On 07/05/2013 10:09 AM, Vlad Yasevich wrote:
>>>> On 07/03/2013 10:33 PM, Fan Du wrote:
>>>>>
>>>>>
>>>>> On 2013年07月03日 21:23, Vlad Yasevich wrote:
>>>>>> On 07/02/2013 10:18 PM, Fan Du wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2013年07月02日 22:29, Vlad Yasevich wrote:
>>>>>>>> On 07/02/2013 02:39 AM, Fan Du wrote:
>>>>>>>>> When sctp sits on IPv6, sctp_transport_dst_check pass cookie as
>>>>>>>>> ZERO,
>>>>>>>>> as a result ip6_dst_check always fail out. This behaviour makes
>>>>>>>>> transport->dst useless, because every sctp_packet_transmit must
>>>>>>>>> look
>>>>>>>>> for valid dst(Is this what supposed to be?)
>>>>>>>>>
>>>>>>>>> One aggressive way is to call rt_genid_bump which invalid all
>>>>>>>>> dst to
>>>>>>>>> make new dst for transport, apparently it also hurts others.
>>>>>>>>> I'm sure this may not be the best for all, so any commnets?
>>>>>>>>>
>>>>>>>>> Signed-off-by: Fan Du <fan.du@windriver.com>
>>>>>>>>> ---
>>>>>>>>> include/net/sctp/sctp.h | 18 ++++++++++++------
>>>>>>>>> net/sctp/ipv6.c | 2 ++
>>>>>>>>> 2 files changed, 14 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>>>>>>>>> index cd89510..f05af01 100644
>>>>>>>>> --- a/include/net/sctp/sctp.h
>>>>>>>>> +++ b/include/net/sctp/sctp.h
>>>>>>>>> @@ -719,14 +719,20 @@ static inline void sctp_v4_map_v6(union
>>>>>>>>> sctp_addr *addr)
>>>>>>>>> addr->v6.sin6_addr.s6_addr32[2] = htonl(0x0000ffff);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> -/* The cookie is always 0 since this is how it's used in the
>>>>>>>>> - * pmtu code.
>>>>>>>>> - */
>>>>>>>>> +/* Set cookie with the right one for IPv6 and zero for others */
>>>>>>>>> static inline struct dst_entry *sctp_transport_dst_check(struct
>>>>>>>>> sctp_transport *t)
>>>>>>>>> {
>>>>>>>>> - if (t->dst && !dst_check(t->dst, 0)) {
>>>>>>>>> - dst_release(t->dst);
>>>>>>>>> - t->dst = NULL;
>>>>>>>>> +
>>>>>>>>> + if (t->dst) {
>>>>>>>>> + struct rt6_info *rt = (struct rt6_info *)t->dst;
>>>>>>>>> + u32 cookie = 0;
>>>>>>>>> +
>>>>>>>>> + if ((t->af_specific->sa_family == AF_INET6) && rt->rt6i_node)
>>>>>>>>> + cookie = rt->rt6i_node->fn_sernum;
>>>>>>>>> + if (!dst_check(t->dst, cookie)) {
>>>>>>>>> + dst_release(t->dst);
>>>>>>>>> + t->dst = NULL;
>>>>>>>>> + }
>>>>>>>>> }
>>>>>>>>
>>>>>>>> I think it would be better if we stored the dst_cookie in the
>>>>>>>> transport structure and initialized it at lookup time. If you do
>>>>>>>> that,
>>>>>>>> then if the route table changes, we'd correctly detect it without
>>>>>>>> artificially bumping rt_genid (and hurting ipv4).
>>>>>>>
>>>>>>> Hi Vlad/Neil
>>>>>>>
>>>>>>> Is this what you mean?
>>>>>>
>>>>>> Yes, exactly.
>>>>>>
>>>>>
>>>>> Hi Vlad
>>>>>
>>>>> I thinks twice about below patch, this is actually a chicken-egg issue.
>>>>> Look below scenario:
>>>>> (1) The first time we push packet through a transport, dst_cookie is 0,
>>>>> so sctp_transport_dst_check also pass cookie as 0, then return dst
>>>>> as NULL.
>>>>> Then we lookup dst by sctp_transport_route, and in there we
>>>>> initiate dst_cookie
>>>>> with rt->rt6i_node->fn_sernum
>>>>>
>>>>> (2) Then the next time we push packet through this transport again,
>>>>> we pass dst_cookie(rt->rt6i_node->fn_sernum) to ip6_dst_check, and
>>>>> return valid dst without bothering to lookup dst again.
>>>>
>>>> No, if the route was removed rt6i_node will be NULL, and NULL will be
>>>> returned from ip6_dst_check(). If the route still exists then we'll
>>>> compare the serial number with a cookie.
>>>>
>>>>>
>>>>> BUT, suppose when deleting the source address of this dst after
>>>>> transport->dst_cookie
>>>>> has been well initialized. transport->dst_cookie still holds
>>>>> rt->rt6i_node->fn_sernum,
>>>>> meaning ip6_dst_check will return valid dst, which it shouldn't in this
>>>>> case, the
>>>>> result will be association ABORT.
>>>>
>>>> No, removing the address cause the route for that prefix to be removed
>>>> as well. This will set rt6i_node to NULL.
>>>>
>>>>>
>>>>> Other way is invalid all transport->dst which using the deleting
>>>>> address
>>>>> as source address
>>>>> without bumping gen_id, problem is the traverse times depends
>>>>> heavily on
>>>>> transport number,
>>>>> and also need to take account locking issue it will introduce.
>>>>>
>>>>> >
>>>>> > No, you are not missing anything. IPv4 doesn't use the cookie and
>>>>> always seems to pass it as 0.
>>>>> >
>>>>> > Yes, ipv4 will bump the gen_id thus invalidating all routes (there
>>>>> has been disagreement about it).
>>>>> > IPv6 doesn't do that. In ipv6, when the addresses are added or
>>>>> removed, routes are also added or removed and
>>>>> > any time the route is added it will have a new serial number. So, you
>>>>> don't have to disturb ipv4 cache when ipv6 routing info changes.
>>>>>
>>>>> Thank you very much for your explanation!
>>>>>
>>>>> IPv6 don't bump gen_id, when adding/deleting address, and tag an serial
>>>>> number with each route.
>>>>> Doing this way loose the semantic of dst_check, because SCTP depends no
>>>>> dst_check fulfill its
>>>>> duty to actually check whether the holding dst is still valid, well
>>>>> most
>>>>> other Layer 4 protocol
>>>>> simply rely on ip6_route_output/ip6_dst_lookup_flow to grab dst every
>>>>> time sending data out.
>>>>
>>>> Look at how other protocols (tcp, dccp) do this. It is sufficient to
>>>> cache the route serial number into the dst_cookie at the time the route
>>>> was lookup-up and cached. Then the cookie is passed to dst_check to
>>>> validate the route.
>>>
>>>
>>> Hi Fan
>>>
>>> Have you tried the updated patch you sent? Based on what the tcp/udp
>>> code is doing, the updated patch should work correctly. If it does,
>>> can you re-post with attribution/sign-off
>>>
>>
>> Hello Vlad and Neil
>>
>> I don't know whether my test procedure has drawbacks or something else.
>> It seems the updated patch does not works well, but my first patch is ok
>> under below configuration.
>>
>> Host A:
>> ifconfig eth1 inet6 add 2001::803/64
>> ifconfig eth1 inet6 add 2001::804/64
>> sctp_darn -H 2001::803 -B 2001::804 -P 5001 -l
>>
>> Host B:
>> ifconfig eth1 inet6 add 2001::805/64
>> ifconfig eth1 inet6 add 2001::806/64
>> sctp_darn -H 2001::805 -B 2001::806 -P 5002 -h 2001::803 -p
>
> Hi Fan
>
> Could you try using different prefixes. I think we are running into some routing issue.
>
> For example, configure 2001:1::803/64 and 2001:2::804/64. I think that's going to make this work (it seems to for me).

Hello Vlad

By using different prefix as below configuration

      Server           Client
2001:1::803/64  <-> 2001:1::805/64
2001:2::804/64  <-> 2001:2::806/64

After association setup, heartbeat path is different than using same prefix
as before. Yes, delete 2001:2::804/64 won't cause association ABORT. That's
because, delete 2001:2::804/64 will also make the prefix route got deleted as
well, which in turn invalidate rt destinate to 2001:2::806/64. So subsequent
HEART_BEAT checks from 2001:2::804/64 -> 2001:2::806/64 will use ip6_null_entry
as rt, which actually discard packet. See sctp_ipv6_route_2prefix.jpeg for
route table in this configuration.

As is shown in sctp_ipv6_route_same_prefix.jpeg for using same prefix configuration,
Delete 2001::804/64 won't cause prefix route deleted as well as rt in (4) destinate
to 2001::806 with source address as 2001::804/64. That's because 2001::803/64 is
still alive, which make onlink=1 in ipv6_del_addr, this is where the substantial
difference between same prefix configuration and different prefix configuration :)
So packet are still transmitted out to 2001::806 with source address as 2001::804/64.

Current IPv6 route table implementation cannot prudently rudely traverse all the node
below the prefix route to mark new fn_sernum, that's is way too much time-consuming,
neither could we delete all the node below the prefix route, same reason as before.
As you putted earlier, bump_genid is not allowed for IPv6, I think there isn't a better
way here to enforce SCTP multi-home so far than the patch I posted here:
http://marc.info/?l=linux-netdev&m=137359898232255&w=2

Or should I only repost dst_cookie style patch in here ?
http://marc.info/?l=linux-netdev&m=137281805012583&w=2

Thanks for reading this long story...


> -vlad
>> 5001 -s
>>
>> hbinterval set to 2 seconds, after setup the association, primary
>> address: 2001::804 <--> 2001::806
>>
>>
>> Step 1:
>>
>> After adding in IPv6 address to an interface, we populate
>> struct inet6_ifaddr *ifp->rt->rt6i_node, this rt6i_node is stored in a
>> radix tree.
>>
>> addrconf_add_ifaddr
>> ->inet6_addr_add
>> ->ipv6_add_addr <-- Do DAD checking afterward.
>> ->addrconf_prefix_route
>> ->ip6_route_add
>> ->__ip6_ins_rt
>> ->fib6_add
>> ->fib6_add_1 <-- Create struct fib6_node *fn
>> ->fib6_add_rt2node
>> ->rt->rt6i_node = fn; (*1*)
>>
>> Step 2:
>>
>> In host A ,for the packet destinate for 2001::806 using source address
>> 2001::804,
>> In sctp_v6_get_dst, let's print the rt, rt->rt6i_node, and
>> rt->rt6i_node->fn_sernum
>> after ip6_dst_lookup_flow got valid dst. The problem is the
>> transport->dst->rt6i_node
>> we have looked for is *not* the rt6i_node in (*1*), I'm not drunk here.....
>> So in Step 3, let's be naughty: ifconfig eth1 inet6 del 2001::804/64
>>
>>
>> Step 3:
>>
>> Then we delete the IPv6 address, certainly this struct inet6_ifaddr
>> *ifp->rt
>> will be delete as well.
>>
>> addrconf_del_ifaddr
>> ->inet6_addr_del
>> ->ipv6_del_addr
>> ->ipv6_ifa_notify /RTM_DELADDR
>> ->ip6_del_rt
>> ->__ip6_del_rt
>> ->fib6_del
>> ->fib6_del_route
>> -> rt->rt6i_node = NULL; (*2*) here we undo the
>> operation in (*1*),
>> but rt6i_node in Step2 is
>> untouched!
>>
>> Step 4:
>> Then we do the dst checking, for sctp_tranport_dst_check it looks like
>> nothing in Step3
>> has consequence on it. So transport->dst is still valid, let's ship the
>> packet out
>> using 2001::804. The real world test result is ASSOCIATION ABORT after a
>> while.
>> As far as I can tell, rt in Step 2 got deleted *after* the ASSOCIATION
>> ABORT.
>>
>> sctp_tranport_dst_check
>> ->dst_check
>> -> ip6_dst_check
>>
>>
>> Root node
>> *
>> / \
>> * <-- fn node we are using.
>> / \
>> *
>> / \
>> *
>> / \
>> * <--- Add a new fn node, on the path from the root node down to
>> here, each fn
>> now has new fn_sernum, which force dst_check return NULL
>> for lookup again.
>> Then previously save fn_sernum will take effect now. But
>> when delete address
>> only the ifp->rt->rt6i_node is set to NULL.
>>
>>
>>> Thanks
>>> -vlad
>>>
>>>>
>>>> -vlad
>>>>>
>>>>> So please pronounce a final judgment.
>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>>
>>>>>>> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
>>>>>>> index cd89510..0a646a5 100644
>>>>>>> --- a/include/net/sctp/sctp.h
>>>>>>> +++ b/include/net/sctp/sctp.h
>>>>>>> @@ -724,7 +724,7 @@ static inline void sctp_v4_map_v6(union sctp_addr
>>>>>>> *addr)
>>>>>>> */
>>>>>>> static inline struct dst_entry *sctp_transport_dst_check(struct
>>>>>>> sctp_transport *t)
>>>>>>> {
>>>>>>> - if (t->dst && !dst_check(t->dst, 0)) {
>>>>>>> + if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
>>>>>>> dst_release(t->dst);
>>>>>>> t->dst = NULL;
>>>>>>> }
>>>>>>> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
>>>>>>> index 1bd4c41..cafdd19 100644
>>>>>>> --- a/include/net/sctp/structs.h
>>>>>>> +++ b/include/net/sctp/structs.h
>>>>>>> @@ -946,6 +946,8 @@ struct sctp_transport {
>>>>>>> __u64 hb_nonce;
>>>>>>>
>>>>>>> struct rcu_head rcu;
>>>>>>> +
>>>>>>> + u32 dst_cookie;
>>>>>>> };
>>>>>>>
>>>>>>> struct sctp_transport *sctp_transport_new(struct net *, const union
>>>>>>> sctp_addr *,
>>>>>>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>>>>>>> index 8ee553b..82a420f 100644
>>>>>>> --- a/net/sctp/ipv6.c
>>>>>>> +++ b/net/sctp/ipv6.c
>>>>>>> @@ -350,6 +350,7 @@ out:
>>>>>>> struct rt6_info *rt;
>>>>>>> rt = (struct rt6_info *)dst;
>>>>>>> t->dst = dst;
>>>>>>> + t->dst_cookie = rt->rt6i_node ? rt->rt6i_node->fn_sernum
>>>>>>> : 0;
>>>>>>> SCTP_DEBUG_PRINTK("rt6_dst:%pI6 rt6_src:%pI6\n",
>>>>>>> &rt->rt6i_dst.addr, &fl6->saddr);
>>>>>>> } else {
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>>
>>>>>>>>> return t->dst;
>>>>>>>>> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
>>>>>>>>> index 8ee553b..cfae77e 100644
>>>>>>>>> --- a/net/sctp/ipv6.c
>>>>>>>>> +++ b/net/sctp/ipv6.c
>>>>>>>>> @@ -137,6 +137,8 @@ static int sctp_inet6addr_event(struct
>>>>>>>>> notifier_block *this, unsigned long ev,
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> + /* invalid all transport dst forcing to look up new dst */
>>>>>>>>> + rt_genid_bump(net);
>>>>>>>>> return NOTIFY_DONE;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

-- 
浮沉随浪只记今朝笑

--fan

[-- Attachment #2: sctp_ipv6_route_same_prefix.jpeg --]
[-- Type: image/jpeg, Size: 71731 bytes --]

[-- Attachment #3: sctp_ipv6_route_2prefix.jpeg --]
[-- Type: image/jpeg, Size: 79537 bytes --]

  reply	other threads:[~2013-07-17  7:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-02  6:39 [RFC PATCH] sctp: Don't lookup dst if transport dst is still valid Fan Du
2013-07-02 14:29 ` Vlad Yasevich
2013-07-02 15:55   ` Neil Horman
2013-07-03  2:39     ` Fan Du
2013-07-03 13:48       ` Vlad Yasevich
2013-07-02 19:47   ` David Miller
2013-07-03  2:18   ` Fan Du
2013-07-03 13:23     ` Vlad Yasevich
2013-07-03 14:11       ` Vlad Yasevich
2013-07-04  2:33       ` Fan Du
2013-07-05 13:03         ` Neil Horman
2013-07-09  7:11           ` Fan Du
2013-07-09 11:38             ` Neil Horman
2013-07-05 14:09         ` Vlad Yasevich
2013-07-09 15:11           ` Vlad Yasevich
2013-07-10  5:26             ` Fan Du
2013-07-12 11:19               ` Neil Horman
2013-07-16  9:13                 ` Fan Du
2013-07-13 12:21               ` Vlad Yasevich
2013-07-17  7:04                 ` Fan Du [this message]
2013-07-12  3:15             ` Fan Du
2013-07-12 22:58               ` David Miller
2013-07-13 12:18                 ` Vlad Yasevich
2013-07-16  9:58                 ` Fan Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E641FD.4010908@windriver.com \
    --to=fan.du@windriver.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=nicolas.dichtel@6wind.com \
    --cc=vyasevich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.