* [PATCH net] net: fix pos incrementment in ipv6_route_seq_next
@ 2020-10-13 0:09 Yonghong Song
2020-10-13 6:56 ` Vasily Averin
0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2020-10-13 0:09 UTC (permalink / raw)
To: bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Vasily Averin,
Andrii Nakryiko
Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
- increase pos for all seq_ops->start()
- increase pos for all seq_ops->next()
For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().
For example, I have 7 ipv6 route entries,
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
0+1 records in
0+1 records out
1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
root@arch-fb-vm1:~/net-next
In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.
If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.
root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
4+1 records in
4+1 records out
600 bytes copied, 0.00127758 s, 470 kB/s
To fix the problem, do not increase pos for seq_ops->start() and the
above `dd` command with `bs=128` will show correct result.
Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
net/ipv6/ip6_fib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 141c0a4c569a..5aac5094bc41 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2582,10 +2582,10 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
struct net *net = seq_file_net(seq);
struct ipv6_route_iter *iter = seq->private;
- ++(*pos);
if (!v)
goto iter_table;
+ ++(*pos);
n = rcu_dereference_bh(((struct fib6_info *)v)->fib6_next);
if (n)
return n;
--
2.24.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] net: fix pos incrementment in ipv6_route_seq_next
2020-10-13 0:09 [PATCH net] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
@ 2020-10-13 6:56 ` Vasily Averin
2020-10-13 7:13 ` Yonghong Song
0 siblings, 1 reply; 3+ messages in thread
From: Vasily Averin @ 2020-10-13 6:56 UTC (permalink / raw)
To: Yonghong Song, bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko
Dear Yonghong Song,
thank you for reporting the problem.
As far as I understand the problem here is that pos is incremented in .start function.
I do not,like an idea to avoid increment in ipv6_route_seq_next()
however ipv6_route_seq_start can provide fake argument instead.
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2618,8 +2618,9 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
iter->skip = *pos;
if (iter->tbl) {
+ loff_t p;
ipv6_route_seq_setup_walk(iter, net);
- return ipv6_route_seq_next(seq, NULL, pos);
+ return ipv6_route_seq_next(seq, NULL, &p);
} else {
return NULL;
}
In this case patch subject should be changed accordingly.
Thank you,
Vasily Averin
On 10/13/20 3:09 AM, Yonghong Song wrote:
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
> https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
> - increase pos for all seq_ops->start()
> - increase pos for all seq_ops->next()
>
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
> iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().
>
> For example, I have 7 ipv6 route entries,
> root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
> 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
> fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
> fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
> ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0
> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
> 0+1 records in
> 0+1 records out
> 1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
> root@arch-fb-vm1:~/net-next
>
> In the above, I specify buffer size 4096, so all records can be returned
> to user space with a single trip to the kernel.
>
> If I use buffer size 128, since each record size is 149, internally
> kernel seq_read() will read 149 into its internal buffer and return the data
> to user space in two read() syscalls. Then user read() syscall will trigger
> next seq_ops->start(). Since the current implementation increased pos even
> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
> record is #1.
>
> root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
> 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
> fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
> 4+1 records in
> 4+1 records out
> 600 bytes copied, 0.00127758 s, 470 kB/s
>
> To fix the problem, do not increase pos for seq_ops->start() and the
> above `dd` command with `bs=128` will show correct result.
>
> Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> Cc: Vasily Averin <vvs@virtuozzo.com>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
> net/ipv6/ip6_fib.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 141c0a4c569a..5aac5094bc41 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -2582,10 +2582,10 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> struct net *net = seq_file_net(seq);
> struct ipv6_route_iter *iter = seq->private;
>
> - ++(*pos);
> if (!v)
> goto iter_table;
>
> + ++(*pos);
> n = rcu_dereference_bh(((struct fib6_info *)v)->fib6_next);
> if (n)
> return n;
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] net: fix pos incrementment in ipv6_route_seq_next
2020-10-13 6:56 ` Vasily Averin
@ 2020-10-13 7:13 ` Yonghong Song
0 siblings, 0 replies; 3+ messages in thread
From: Yonghong Song @ 2020-10-13 7:13 UTC (permalink / raw)
To: Vasily Averin, bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Andrii Nakryiko
On 10/12/20 11:56 PM, Vasily Averin wrote:
> Dear Yonghong Song,
> thank you for reporting the problem.
> As far as I understand the problem here is that pos is incremented in .start function.
Yes.
>
> I do not,like an idea to avoid increment in ipv6_route_seq_next()
> however ipv6_route_seq_start can provide fake argument instead.
>
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -2618,8 +2618,9 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
> iter->skip = *pos;
>
> if (iter->tbl) {
> + loff_t p;
> ipv6_route_seq_setup_walk(iter, net);
> - return ipv6_route_seq_next(seq, NULL, pos);
> + return ipv6_route_seq_next(seq, NULL, &p);
> } else {
> return NULL;
> }
This should work too.
I am fine with the change. I will wait until tomorrow for
additional comments, if any, about this fake point approach vs.
my approach, before sending v2.
>
> In this case patch subject should be changed accordingly.
>
> Thank you,
> Vasily Averin
>
> On 10/13/20 3:09 AM, Yonghong Song wrote:
>> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
>> tried to fix the issue where seq_file pos is not increased
>> if a NULL element is returned with seq_ops->next(). See bug
>> https://bugzilla.kernel.org/show_bug.cgi?id=206283
>> The commit effectively does:
>> - increase pos for all seq_ops->start()
>> - increase pos for all seq_ops->next()
>>
>> For ipv6_route, increasing pos for all seq_ops->next() is correct.
>> But increasing pos for seq_ops->start() is not correct
>> since pos is used to determine how many items to skip during
>> seq_ops->start():
>> iter->skip = *pos;
>> seq_ops->start() just fetches the *current* pos item.
>> The item can be skipped only after seq_ops->show() which essentially
>> is the beginning of seq_ops->next().
>>
>> For example, I have 7 ipv6 route entries,
>> root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>> 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
>> fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
>> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
>> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
>> fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
>> ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0
>> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
>> 0+1 records in
>> 0+1 records out
>> 1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>> root@arch-fb-vm1:~/net-next
>>
>> In the above, I specify buffer size 4096, so all records can be returned
>> to user space with a single trip to the kernel.
>>
>> If I use buffer size 128, since each record size is 149, internally
>> kernel seq_read() will read 149 into its internal buffer and return the data
>> to user space in two read() syscalls. Then user read() syscall will trigger
>> next seq_ops->start(). Since the current implementation increased pos even
>> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
>> record is #1.
>>
>> root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
>> 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0
>> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
>> fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
>> 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
>> 4+1 records in
>> 4+1 records out
>> 600 bytes copied, 0.00127758 s, 470 kB/s
>>
>> To fix the problem, do not increase pos for seq_ops->start() and the
>> above `dd` command with `bs=128` will show correct result.
>>
>> Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
>> Cc: Vasily Averin <vvs@virtuozzo.com>
>> Cc: Andrii Nakryiko <andriin@fb.com>
>> Cc: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> net/ipv6/ip6_fib.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
>> index 141c0a4c569a..5aac5094bc41 100644
>> --- a/net/ipv6/ip6_fib.c
>> +++ b/net/ipv6/ip6_fib.c
>> @@ -2582,10 +2582,10 @@ static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>> struct net *net = seq_file_net(seq);
>> struct ipv6_route_iter *iter = seq->private;
>>
>> - ++(*pos);
>> if (!v)
>> goto iter_table;
>>
>> + ++(*pos);
>> n = rcu_dereference_bh(((struct fib6_info *)v)->fib6_next);
>> if (n)
>> return n;
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-10-13 7:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 0:09 [PATCH net] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
2020-10-13 6:56 ` Vasily Averin
2020-10-13 7:13 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).