All of lore.kernel.org
 help / color / mirror / Atom feed
* [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
@ 2016-07-29  4:18 John Stultz
  2016-07-29  4:20 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2016-07-29  4:18 UTC (permalink / raw)
  To: Mateusz Bajorski
  Cc: David Ahern, David S. Miller, lkml, Guodong Xu, Dmitry Shmidt

After moving my HiKey tree to pre-v4.8-rc, I noticed when using
Android that I was getting routing errors after toggling networking on
and off (or entering suspend). Wifi associated, but I got some
rounting errors in the logcat the connection manager wouldn't detect a
valid network.

Not being able to figure out exactly what was going wrong from the
userspace side, I bisected (manually rebasing a 70 patch stack each
step :P) it down and it seems that the commit: 153380ec4b9b
("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") is causing
the problem.

Reverting that patch seems to make things work again.

I'm no networking guru, but I'm happy to help debug this further if
folks can walk me through it a bit.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-07-29  4:18 [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace John Stultz
@ 2016-07-29  4:20 ` David Miller
  2016-07-29 14:12   ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2016-07-29  4:20 UTC (permalink / raw)
  To: john.stultz; +Cc: mateusz.bajorski, dsa, linux-kernel, guodong.xu, dimitrysh

From: John Stultz <john.stultz@linaro.org>
Date: Thu, 28 Jul 2016 21:18:16 -0700

> After moving my HiKey tree to pre-v4.8-rc, I noticed when using
> Android that I was getting routing errors after toggling networking on
> and off (or entering suspend). Wifi associated, but I got some
> rounting errors in the logcat the connection manager wouldn't detect a
> valid network.
> 
> Not being able to figure out exactly what was going wrong from the
> userspace side, I bisected (manually rebasing a 70 patch stack each
> step :P) it down and it seems that the commit: 153380ec4b9b
> ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") is causing
> the problem.
> 
> Reverting that patch seems to make things work again.
> 
> I'm no networking guru, but I'm happy to help debug this further if
> folks can walk me through it a bit.

It simply sounds like Android's userspace is specifying NLM_F_EXCL when
it shouldn't be during FIB rule netlink operations.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-07-29  4:20 ` David Miller
@ 2016-07-29 14:12   ` David Ahern
  2016-07-29 16:57     ` John Stultz
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2016-07-29 14:12 UTC (permalink / raw)
  To: David Miller, john.stultz
  Cc: mateusz.bajorski, linux-kernel, guodong.xu, dimitrysh

On 7/28/16 10:20 PM, David Miller wrote:
> From: John Stultz <john.stultz@linaro.org>
> Date: Thu, 28 Jul 2016 21:18:16 -0700
>
>> After moving my HiKey tree to pre-v4.8-rc, I noticed when using
>> Android that I was getting routing errors after toggling networking on
>> and off (or entering suspend). Wifi associated, but I got some
>> rounting errors in the logcat the connection manager wouldn't detect a
>> valid network.
>>
>> Not being able to figure out exactly what was going wrong from the
>> userspace side, I bisected (manually rebasing a 70 patch stack each
>> step :P) it down and it seems that the commit: 153380ec4b9b
>> ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") is causing
>> the problem.
>>
>> Reverting that patch seems to make things work again.
>>
>> I'm no networking guru, but I'm happy to help debug this further if
>> folks can walk me through it a bit.
>
> It simply sounds like Android's userspace is specifying NLM_F_EXCL when
> it shouldn't be during FIB rule netlink operations.
>

I take Android userspace inserts the same rule multiple times? (ip rule ls)

If so and multiple components expect to manage their own 'copy' of the 
rule they will need to remove the NLM_F_EXCL flag.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-07-29 14:12   ` David Ahern
@ 2016-07-29 16:57     ` John Stultz
  2016-07-29 19:10       ` David Ahern
  2016-08-01  1:42       ` Lorenzo Colitti
  0 siblings, 2 replies; 15+ messages in thread
From: John Stultz @ 2016-07-29 16:57 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Mateusz Bajorski, lkml, Guodong Xu, Dmitry Shmidt,
	Chih-Hung Hsieh, Eric Caruso, Lorenzo Colitti

On Fri, Jul 29, 2016 at 7:12 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 7/28/16 10:20 PM, David Miller wrote:
>>
>> From: John Stultz <john.stultz@linaro.org>
>> Date: Thu, 28 Jul 2016 21:18:16 -0700
>>
>>> After moving my HiKey tree to pre-v4.8-rc, I noticed when using
>>> Android that I was getting routing errors after toggling networking on
>>> and off (or entering suspend). Wifi associated, but I got some
>>> rounting errors in the logcat the connection manager wouldn't detect a
>>> valid network.
>>>
>>> Not being able to figure out exactly what was going wrong from the
>>> userspace side, I bisected (manually rebasing a 70 patch stack each
>>> step :P) it down and it seems that the commit: 153380ec4b9b
>>> ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") is causing
>>> the problem.
>>>
>>> Reverting that patch seems to make things work again.
>>>
>>> I'm no networking guru, but I'm happy to help debug this further if
>>> folks can walk me through it a bit.
>>
>>
>> It simply sounds like Android's userspace is specifying NLM_F_EXCL when
>> it shouldn't be during FIB rule netlink operations.
>>
>
> I take Android userspace inserts the same rule multiple times? (ip rule ls)

With the patch reverted, and the system working,  I see:

# ip rule ls
0:      from all lookup local
10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
13000:  from all fwmark 0x10063/0x1ffff lookup local_network
13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
14000:  from all oif wlan0 lookup wlan0
14000:  from all oif wlan0 lookup wlan0
15000:  from all fwmark 0x0/0x10000 lookup legacy_system
16000:  from all fwmark 0x0/0x10000 lookup legacy_network
17000:  from all fwmark 0x0/0x10000 lookup local_network
19000:  from all fwmark 0x64/0x1ffff lookup wlan0
19000:  from all fwmark 0x65/0x1ffff lookup wlan0
22000:  from all fwmark 0x0/0xffff lookup wlan0
32000:  from all unreachable

With the patch applied, and after toggling wifi, when I see the problem:

# ip rule ls
0:      from all lookup local
10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
13000:  from all fwmark 0x10063/0x1ffff lookup local_network
13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
14000:  from all oif wlan0 lookup wlan0
15000:  from all fwmark 0x0/0x10000 lookup legacy_system
16000:  from all fwmark 0x0/0x10000 lookup legacy_network
17000:  from all fwmark 0x0/0x10000 lookup local_network
19000:  from all fwmark 0x64/0x1ffff lookup wlan0
32000:  from all unreachable


> If so and multiple components expect to manage their own 'copy' of the rule
> they will need to remove the NLM_F_EXCL flag.

Adding more networky Android folks to the CC.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-07-29 16:57     ` John Stultz
@ 2016-07-29 19:10       ` David Ahern
  2016-08-01  1:42       ` Lorenzo Colitti
  1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2016-07-29 19:10 UTC (permalink / raw)
  To: John Stultz
  Cc: David Miller, Mateusz Bajorski, lkml, Guodong Xu, Dmitry Shmidt,
	Chih-Hung Hsieh, Eric Caruso, Lorenzo Colitti

On 7/29/16 10:57 AM, John Stultz wrote:
>> I take Android userspace inserts the same rule multiple times? (ip rule ls)
>
> With the patch reverted, and the system working,  I see:
>
> # ip rule ls
> 0:      from all lookup local
> 10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
> 13000:  from all fwmark 0x10063/0x1ffff lookup local_network
> 13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
> 14000:  from all oif wlan0 lookup wlan0
> 14000:  from all oif wlan0 lookup wlan0

duplicate rules.

> 15000:  from all fwmark 0x0/0x10000 lookup legacy_system
> 16000:  from all fwmark 0x0/0x10000 lookup legacy_network
> 17000:  from all fwmark 0x0/0x10000 lookup local_network
> 19000:  from all fwmark 0x64/0x1ffff lookup wlan0
> 19000:  from all fwmark 0x65/0x1ffff lookup wlan0
> 22000:  from all fwmark 0x0/0xffff lookup wlan0
> 32000:  from all unreachable
>
> With the patch applied, and after toggling wifi, when I see the problem:
>
> # ip rule ls
> 0:      from all lookup local
> 10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
> 13000:  from all fwmark 0x10063/0x1ffff lookup local_network
> 13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
> 14000:  from all oif wlan0 lookup wlan0

The failure would be happening on the insertion of the second wlan0 rule.

> 15000:  from all fwmark 0x0/0x10000 lookup legacy_system
> 16000:  from all fwmark 0x0/0x10000 lookup legacy_network
> 17000:  from all fwmark 0x0/0x10000 lookup local_network
> 19000:  from all fwmark 0x64/0x1ffff lookup wlan0
> 32000:  from all unreachable
>
>
>> If so and multiple components expect to manage their own 'copy' of the rule
>> they will need to remove the NLM_F_EXCL flag.
>
> Adding more networky Android folks to the CC.
>
> thanks
> -john
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-07-29 16:57     ` John Stultz
  2016-07-29 19:10       ` David Ahern
@ 2016-08-01  1:42       ` Lorenzo Colitti
  2016-08-02 16:37         ` John Stultz
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Colitti @ 2016-08-01  1:42 UTC (permalink / raw)
  To: John Stultz
  Cc: David Ahern, David Miller, Mateusz Bajorski, lkml, Guodong Xu,
	Dmitry Shmidt, Chih-Hung Hsieh, Eric Caruso

On Sat, Jul 30, 2016 at 1:57 AM, John Stultz <john.stultz@linaro.org> wrote:
>
> With the patch reverted, and the system working,  I see:
>
> # ip rule ls
> 0:      from all lookup local
> 10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
> 13000:  from all fwmark 0x10063/0x1ffff lookup local_network
> 13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
> 14000:  from all oif wlan0 lookup wlan0
> 14000:  from all oif wlan0 lookup wlan0
> 15000:  from all fwmark 0x0/0x10000 lookup legacy_system
> 16000:  from all fwmark 0x0/0x10000 lookup legacy_network
> 17000:  from all fwmark 0x0/0x10000 lookup local_network
> 19000:  from all fwmark 0x64/0x1ffff lookup wlan0
> 19000:  from all fwmark 0x65/0x1ffff lookup wlan0
> 22000:  from all fwmark 0x0/0xffff lookup wlan0
> 32000:  from all unreachable


This is not correct, you're missing "uidrange 0-0" qualifiers on some
of the rules.

Does the kernel pass the networking unit tests at
https://source.android.com/devices/tech/config/kernel_network_tests.html
? If not, the Android network stack will not work correctly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-01  1:42       ` Lorenzo Colitti
@ 2016-08-02 16:37         ` John Stultz
  2016-08-02 17:03           ` John Stultz
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2016-08-02 16:37 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: David Ahern, David Miller, Mateusz Bajorski, lkml, Guodong Xu,
	Dmitry Shmidt, Chih-Hung Hsieh, Eric Caruso

On Sun, Jul 31, 2016 at 6:42 PM, Lorenzo Colitti <lorenzo@google.com> wrote:
> On Sat, Jul 30, 2016 at 1:57 AM, John Stultz <john.stultz@linaro.org> wrote:
>>
>> With the patch reverted, and the system working,  I see:
>>
>> # ip rule ls
>> 0:      from all lookup local
>> 10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
>> 13000:  from all fwmark 0x10063/0x1ffff lookup local_network
>> 13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
>> 14000:  from all oif wlan0 lookup wlan0
>> 14000:  from all oif wlan0 lookup wlan0
>> 15000:  from all fwmark 0x0/0x10000 lookup legacy_system
>> 16000:  from all fwmark 0x0/0x10000 lookup legacy_network
>> 17000:  from all fwmark 0x0/0x10000 lookup local_network
>> 19000:  from all fwmark 0x64/0x1ffff lookup wlan0
>> 19000:  from all fwmark 0x65/0x1ffff lookup wlan0
>> 22000:  from all fwmark 0x0/0xffff lookup wlan0
>> 32000:  from all unreachable
>
>
> This is not correct, you're missing "uidrange 0-0" qualifiers on some
> of the rules.
>
> Does the kernel pass the networking unit tests at
> https://source.android.com/devices/tech/config/kernel_network_tests.html
> ? If not, the Android network stack will not work correctly.

So I looked into getting the tests above to run (had to get a UML fix
for recent kernels).

Against vanilla v4.7 (plus that one UML fix), all the tests pass.

Against linus/master (plus that one UML fix), I see 7 failures, which
all look very similar:

Traceback (most recent call last):
  File "./multinetwork_test.py", line 28, in <module>
    import multinetwork_base
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
line 84, in <module>
    HAVE_UID_ROUTING = HaveUidRouting()
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
line 78, in HaveUidRouting
    iproute.IPRoute().UidRangeRule(6, False, 1000, 2000, 100, 10000)
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
line 380, in UidRangeRule
    return self._Rule(version, is_add, RTN_UNICAST, table, nlattr, priority)
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
line 348, in _Rule
    self._SendNlRequest(command, rtmsg)
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
line 318, in _SendNlRequest
    super(IPRoute, self)._SendNlRequest(command, data, flags)
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
line 183, in _SendNlRequest
    self._ExpectAck()
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
line 170, in _ExpectAck
    self._ParseAck(response)
  File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
line 164, in _ParseAck
    raise IOError(error, os.strerror(-error))
IOError: [Errno -2] No such file or directory


Interestingly, reverting 153380ec4b9b ("fib_rules: Added NLM_F_EXCL
support to fib_nl_newrule"), does not seem to fix it, and I get the
same errors.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 16:37         ` John Stultz
@ 2016-08-02 17:03           ` John Stultz
  2016-08-02 17:09             ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2016-08-02 17:03 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: David Ahern, David Miller, Mateusz Bajorski, lkml, Guodong Xu,
	Dmitry Shmidt, Chih-Hung Hsieh, Eric Caruso

On Tue, Aug 2, 2016 at 9:37 AM, John Stultz <john.stultz@linaro.org> wrote:
> On Sun, Jul 31, 2016 at 6:42 PM, Lorenzo Colitti <lorenzo@google.com> wrote:
>> On Sat, Jul 30, 2016 at 1:57 AM, John Stultz <john.stultz@linaro.org> wrote:
>>>
>>> With the patch reverted, and the system working,  I see:
>>>
>>> # ip rule ls
>>> 0:      from all lookup local
>>> 10000:  from all fwmark 0xc0000/0xd0000 lookup legacy_system
>>> 13000:  from all fwmark 0x10063/0x1ffff lookup local_network
>>> 13000:  from all fwmark 0x10065/0x1ffff lookup wlan0
>>> 14000:  from all oif wlan0 lookup wlan0
>>> 14000:  from all oif wlan0 lookup wlan0
>>> 15000:  from all fwmark 0x0/0x10000 lookup legacy_system
>>> 16000:  from all fwmark 0x0/0x10000 lookup legacy_network
>>> 17000:  from all fwmark 0x0/0x10000 lookup local_network
>>> 19000:  from all fwmark 0x64/0x1ffff lookup wlan0
>>> 19000:  from all fwmark 0x65/0x1ffff lookup wlan0
>>> 22000:  from all fwmark 0x0/0xffff lookup wlan0
>>> 32000:  from all unreachable
>>
>>
>> This is not correct, you're missing "uidrange 0-0" qualifiers on some
>> of the rules.
>>
>> Does the kernel pass the networking unit tests at
>> https://source.android.com/devices/tech/config/kernel_network_tests.html
>> ? If not, the Android network stack will not work correctly.
>
> So I looked into getting the tests above to run (had to get a UML fix
> for recent kernels).
>
> Against vanilla v4.7 (plus that one UML fix), all the tests pass.
>
> Against linus/master (plus that one UML fix), I see 7 failures, which
> all look very similar:
>
> Traceback (most recent call last):
>   File "./multinetwork_test.py", line 28, in <module>
>     import multinetwork_base
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
> line 84, in <module>
>     HAVE_UID_ROUTING = HaveUidRouting()
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/multinetwork_base.py",
> line 78, in HaveUidRouting
>     iproute.IPRoute().UidRangeRule(6, False, 1000, 2000, 100, 10000)
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 380, in UidRangeRule
>     return self._Rule(version, is_add, RTN_UNICAST, table, nlattr, priority)
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 348, in _Rule
>     self._SendNlRequest(command, rtmsg)
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/iproute.py",
> line 318, in _SendNlRequest
>     super(IPRoute, self)._SendNlRequest(command, data, flags)
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 183, in _SendNlRequest
>     self._ExpectAck()
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 170, in _ExpectAck
>     self._ParseAck(response)
>   File "/host/home/jstultz/projects/android/hikey/kernel/tests/net/test/netlink.py",
> line 164, in _ParseAck
>     raise IOError(error, os.strerror(-error))
> IOError: [Errno -2] No such file or directory
>
>
> Interestingly, reverting 153380ec4b9b ("fib_rules: Added NLM_F_EXCL
> support to fib_nl_newrule"), does not seem to fix it, and I get the
> same errors.


So bisecting between v4.7 and linus/HEAD with the test above, it seems like:
96c63fa7393d ("net: Add l3mdev rule")  is what breaks the tests.

The l3mdev rule patch is a bit tangled with the fib_rules one, but if
I revert both of those, the only thing that fails is the
./neighbour_test.py (which I need to dig further into). But those two
changes seem to be connected to the regression I'm seeing with
Android.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 17:03           ` John Stultz
@ 2016-08-02 17:09             ` David Ahern
  2016-08-02 17:51               ` John Stultz
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2016-08-02 17:09 UTC (permalink / raw)
  To: John Stultz, Lorenzo Colitti
  Cc: David Miller, Mateusz Bajorski, lkml, Guodong Xu, Dmitry Shmidt,
	Chih-Hung Hsieh, Eric Caruso

On 8/2/16 11:03 AM, John Stultz wrote:
> So bisecting between v4.7 and linus/HEAD with the test above, it seems like:
> 96c63fa7393d ("net: Add l3mdev rule")  is what breaks the tests.
>
> The l3mdev rule patch is a bit tangled with the fib_rules one, but if
> I revert both of those, the only thing that fails is the
> ./neighbour_test.py (which I need to dig further into). But those two
> changes seem to be connected to the regression I'm seeing with
> Android.
That is surprising since the l3mdev rule should not exist on Android 
unless it has created a VRF.

Does Android have custom FRA types in <linux/fib_rules.h>? Perhaps there 
is a collision on attribute number?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 17:09             ` David Ahern
@ 2016-08-02 17:51               ` John Stultz
  2016-08-02 18:00                 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2016-08-02 17:51 UTC (permalink / raw)
  To: David Ahern
  Cc: Lorenzo Colitti, David Miller, Mateusz Bajorski, lkml,
	Guodong Xu, Dmitry Shmidt, Chih-Hung Hsieh, Eric Caruso,
	Rom Lemarchand

On Tue, Aug 2, 2016 at 10:09 AM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 8/2/16 11:03 AM, John Stultz wrote:
>>
>> So bisecting between v4.7 and linus/HEAD with the test above, it seems
>> like:
>> 96c63fa7393d ("net: Add l3mdev rule")  is what breaks the tests.
>>
>> The l3mdev rule patch is a bit tangled with the fib_rules one, but if
>> I revert both of those, the only thing that fails is the
>> ./neighbour_test.py (which I need to dig further into). But those two
>> changes seem to be connected to the regression I'm seeing with
>> Android.
>
> That is surprising since the l3mdev rule should not exist on Android unless
> it has created a VRF.
>
> Does Android have custom FRA types in <linux/fib_rules.h>? Perhaps there is
> a collision on attribute number?

Sigh.

Yea, it looks like they do in their tree w/ their uid based routing:
https://android.googlesource.com/kernel/common.git/+/fd2cf795f3ab193752781be7372949ac1780d0ed%5E%21/

index 96161b8..ce19c5b 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -49,6 +49,8 @@ enum {
        FRA_TABLE,      /* Extended table id */
        FRA_FWMASK,     /* mask for netfilter mark */
        FRA_OIFNAME,
+       FRA_UID_START,  /* UID range */
+       FRA_UID_END,
        __FRA_MAX
 };

Without that change, networking would work with upstream kernels, but
now that new valid ids are upstream, their userspace is getting
confused.

Apologies for raising this as a regression.

Lorenzo/Rom: Fyi, you've got another upstream feature collision to work out.

thanks
-john

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 17:51               ` John Stultz
@ 2016-08-02 18:00                 ` David Miller
  2016-08-02 18:22                   ` John Stultz
  2016-08-03  0:58                   ` Lorenzo Colitti
  0 siblings, 2 replies; 15+ messages in thread
From: David Miller @ 2016-08-02 18:00 UTC (permalink / raw)
  To: john.stultz
  Cc: dsa, lorenzo, mateusz.bajorski, linux-kernel, guodong.xu,
	dimitrysh, chh, ejcaruso, romlem

From: John Stultz <john.stultz@linaro.org>
Date: Tue, 2 Aug 2016 10:51:26 -0700

> Yea, it looks like they do in their tree w/ their uid based routing:
> https://android.googlesource.com/kernel/common.git/+/fd2cf795f3ab193752781be7372949ac1780d0ed%5E%21/
> 
> index 96161b8..ce19c5b 100644
> --- a/include/uapi/linux/fib_rules.h
> +++ b/include/uapi/linux/fib_rules.h
> @@ -49,6 +49,8 @@ enum {
>         FRA_TABLE,      /* Extended table id */
>         FRA_FWMASK,     /* mask for netfilter mark */
>         FRA_OIFNAME,
> +       FRA_UID_START,  /* UID range */
> +       FRA_UID_END,
>         __FRA_MAX
>  };
...
> Lorenzo/Rom: Fyi, you've got another upstream feature collision to work out.

It is very difficult for us to take Android networking bug reports
against upstream seriously as long as these kind of situations
continue to exist.

Just FYI...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 18:00                 ` David Miller
@ 2016-08-02 18:22                   ` John Stultz
  2016-08-02 18:28                     ` David Miller
  2016-08-03  0:58                   ` Lorenzo Colitti
  1 sibling, 1 reply; 15+ messages in thread
From: John Stultz @ 2016-08-02 18:22 UTC (permalink / raw)
  To: David Miller
  Cc: David Ahern, Lorenzo Colitti, Mateusz Bajorski, lkml, Guodong Xu,
	Dmitry Shmidt, Chih-hung Hsieh, Eric Caruso, Rom Lemarchand

On Tue, Aug 2, 2016 at 11:00 AM, David Miller <davem@davemloft.net> wrote:
> From: John Stultz <john.stultz@linaro.org>
> Date: Tue, 2 Aug 2016 10:51:26 -0700
>
>> Yea, it looks like they do in their tree w/ their uid based routing:
>> https://android.googlesource.com/kernel/common.git/+/fd2cf795f3ab193752781be7372949ac1780d0ed%5E%21/
>>
>> index 96161b8..ce19c5b 100644
>> --- a/include/uapi/linux/fib_rules.h
>> +++ b/include/uapi/linux/fib_rules.h
>> @@ -49,6 +49,8 @@ enum {
>>         FRA_TABLE,      /* Extended table id */
>>         FRA_FWMASK,     /* mask for netfilter mark */
>>         FRA_OIFNAME,
>> +       FRA_UID_START,  /* UID range */
>> +       FRA_UID_END,
>>         __FRA_MAX
>>  };
> ...
>> Lorenzo/Rom: Fyi, you've got another upstream feature collision to work out.
>
> It is very difficult for us to take Android networking bug reports
> against upstream seriously as long as these kind of situations
> continue to exist.
>
> Just FYI...

Very much agreed, its frustrating.

Part of my efforts trying to run Android environments against mainline
kernels (including reporting apparent regressions) is to try to
improve interactions with the upstream community since if there is a
real regression (like we've seen with cgroup locking performance
recently), learning about it a year or two later when vendors start
using a kernel isn't super helpful.  And I feel like I've got a number
of real issues with this approach recently (asix and wlcore driver
regressions, iptables alignment issue on arm, etc).

But trying to filter out these sorts of issues where a lot of testing
can work w/o the android features, until the collision occurs, isn't
trivial w/o being a domain expert.  So, again, my apologies.
Generating noise like this is the *opposite* of what I'm trying to do.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 18:22                   ` John Stultz
@ 2016-08-02 18:28                     ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2016-08-02 18:28 UTC (permalink / raw)
  To: john.stultz
  Cc: dsa, lorenzo, mateusz.bajorski, linux-kernel, guodong.xu,
	dimitrysh, chh, ejcaruso, romlem

From: John Stultz <john.stultz@linaro.org>
Date: Tue, 2 Aug 2016 11:22:08 -0700

> Generating noise like this is the *opposite* of what I'm trying to
> do.

Understood.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-02 18:00                 ` David Miller
  2016-08-02 18:22                   ` John Stultz
@ 2016-08-03  0:58                   ` Lorenzo Colitti
  2016-08-03  1:04                     ` David Miller
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Colitti @ 2016-08-03  0:58 UTC (permalink / raw)
  To: David Miller
  Cc: John Stultz, David Ahern, Mateusz Bajorski, lkml, Guodong Xu,
	Dmitry Shmidt, Chih-hung Hsieh, Eric Caruso, Rom Lemarchand

On Wed, Aug 3, 2016 at 3:00 AM, David Miller <davem@davemloft.net> wrote:
> > index 96161b8..ce19c5b 100644
> > --- a/include/uapi/linux/fib_rules.h
> > +++ b/include/uapi/linux/fib_rules.h
> > @@ -49,6 +49,8 @@ enum {
> >         FRA_TABLE,      /* Extended table id */
> >         FRA_FWMASK,     /* mask for netfilter mark */
> >         FRA_OIFNAME,
> > +       FRA_UID_START,  /* UID range */
> > +       FRA_UID_END,
> >         __FRA_MAX
> >  };
> ...
> > Lorenzo/Rom: Fyi, you've got another upstream feature collision to work out.
>
> It is very difficult for us to take Android networking bug reports
> against upstream seriously as long as these kind of situations
> continue to exist.

I'm to blame for this one. The specific issue here is the RTA_xxx enum
- Android code assumes RTA_UID = 18, but the MPLS change that John
pointed out added RTA_VIA = 18. A bad cherry-pick to android-4.4 put
RTA_UID after RTA_VIA.

The problem is that the UID routing patches were never accepted
upstream, and because FRA_xxx and RTA_xxx are enums, there's no safe
way to add a new value out of tree like there would be with other
identifiers such as setsockopt values.

I'm going to make another attempt to upstream the UID routing code
soon. If that doesn't make it, then we could attempt to reserve a
couple RTA_xxx and FRA_xxx values for local use.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace
  2016-08-03  0:58                   ` Lorenzo Colitti
@ 2016-08-03  1:04                     ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2016-08-03  1:04 UTC (permalink / raw)
  To: lorenzo
  Cc: john.stultz, dsa, mateusz.bajorski, linux-kernel, guodong.xu,
	dimitrysh, chh, ejcaruso, romlem

From: Lorenzo Colitti <lorenzo@google.com>
Date: Wed, 3 Aug 2016 09:58:09 +0900

> The problem is that the UID routing patches were never accepted
> upstream, and because FRA_xxx and RTA_xxx are enums, there's no safe
> way to add a new value out of tree like there would be with other
> identifiers such as setsockopt values.
> 
> I'm going to make another attempt to upstream the UID routing code
> soon. If that doesn't make it, then we could attempt to reserve a
> couple RTA_xxx and FRA_xxx values for local use.

If upstream rejects the feature, it's because you'll be given a
suggested alternative implementation in order to achieve your
goals.

In that respect, it is inappropriate to expect that we will "reserve"
netlink ID space for such a feature that is rejected in that manner
upstream.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-03  1:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-29  4:18 [Regression?] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule breaks Android userspace John Stultz
2016-07-29  4:20 ` David Miller
2016-07-29 14:12   ` David Ahern
2016-07-29 16:57     ` John Stultz
2016-07-29 19:10       ` David Ahern
2016-08-01  1:42       ` Lorenzo Colitti
2016-08-02 16:37         ` John Stultz
2016-08-02 17:03           ` John Stultz
2016-08-02 17:09             ` David Ahern
2016-08-02 17:51               ` John Stultz
2016-08-02 18:00                 ` David Miller
2016-08-02 18:22                   ` John Stultz
2016-08-02 18:28                     ` David Miller
2016-08-03  0:58                   ` Lorenzo Colitti
2016-08-03  1:04                     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.