From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Ahern Subject: Re: [PATCH net-next 1/4] ipv6: Calculate hash thresholds for IPv6 nexthops Date: Thu, 3 May 2018 19:13:36 -0600 Message-ID: References: <20180109144028.30133-1-idosch@mellanox.com> <20180109144028.30133-2-idosch@mellanox.com> <5550c628-5014-427b-60c9-71cf80462723@gmail.com> <20180502172106.GA12986@splinter> <20180502175244.GA14587@splinter> <20180502185310.GA31998@splinter> <017f58ee-8655-60d1-19aa-a6276c639065@gmail.com> <20180502190401.GA470@splinter> <1525294137082.85951@alliedtelesis.co.nz> <84f1a89f-20f0-a559-8a1b-da1400794f29@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Eric Dumazet , "davem@davemloft.net" , Ido Schimmel , "netdev@vger.kernel.org" , "roopa@cumulusnetworks.com" , "nikolay@cumulusnetworks.com" , "pch@ordbogen.com" , "jkbs@redhat.com" , "yoshfuji@linux-ipv6.org" , "mlxsw@mellanox.com" To: Thomas Winter , Ido Schimmel Return-path: Received: from mail-pf0-f179.google.com ([209.85.192.179]:40603 "EHLO mail-pf0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751190AbeEDBNk (ORCPT ); Thu, 3 May 2018 21:13:40 -0400 Received: by mail-pf0-f179.google.com with SMTP id f189so16110214pfa.7 for ; Thu, 03 May 2018 18:13:39 -0700 (PDT) In-Reply-To: <84f1a89f-20f0-a559-8a1b-da1400794f29@gmail.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 5/2/18 2:56 PM, David Ahern wrote: > On 5/2/18 2:48 PM, Thomas Winter wrote: >> Should I look at reworking this? It would be great to have these ECMP routes for other purposes. > > Looking at my IPv6 bug list this change is on it -- allowing ECMP routes > to have a device only hop. > > Let me take a look at it at the same time as a few other bugs. > I see the problem: the multipath code for IPv6 tries to helpful and auto-determine that a new route can be appended to an existing one -- basically adding another nexthop if it already exists. What it should be doing is requiring the NLM_F_APPEND to modify an existing route. If the same prefix and metric comes down and APPEND or REPLACE is not set it should fail EEXISTS rather than consolidating into an ECMP. Fixing it to do the right thing will break existing userspace, but as it stands it prevents dev only nexthops (no gateway) and replace with a REJECT route ends up adding another route e.g., ip -6 ro replace unreachable 2001:db8:104::/64 leaves the existing route and adds a new entry which can never be hit: $ ip -6 ro ls ... 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024 pref medium unreachable 2001:db8:104::/64 dev lo metric 1024 pref medium ...