BPF Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next
@ 2020-10-14 14:46 Yonghong Song
  2020-10-14 17:29 ` Martin KaFai Lau
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Yonghong Song @ 2020-10-14 14:46 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Andrii Nakryiko, Vasily Averin

Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
tried to fix the issue where seq_file pos is not increased
if a NULL element is returned with seq_ops->next(). See bug
  https://bugzilla.kernel.org/show_bug.cgi?id=206283
The commit effectively does:
  - increase pos for all seq_ops->start()
  - increase pos for all seq_ops->next()

For ipv6_route, increasing pos for all seq_ops->next() is correct.
But increasing pos for seq_ops->start() is not correct
since pos is used to determine how many items to skip during
seq_ops->start():
  iter->skip = *pos;
seq_ops->start() just fetches the *current* pos item.
The item can be skipped only after seq_ops->show() which essentially
is the beginning of seq_ops->next().

For example, I have 7 ipv6 route entries,
  root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
  00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
  fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
  fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
  ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  0+1 records in
  0+1 records out
  1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
  root@arch-fb-vm1:~/net-next

In the above, I specify buffer size 4096, so all records can be returned
to user space with a single trip to the kernel.

If I use buffer size 128, since each record size is 149, internally
kernel seq_read() will read 149 into its internal buffer and return the data
to user space in two read() syscalls. Then user read() syscall will trigger
next seq_ops->start(). Since the current implementation increased pos even
for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
record is #1.

  root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
  00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
4+1 records in
4+1 records out
600 bytes copied, 0.00127758 s, 470 kB/s

To fix the problem, create a fake pos pointer so seq_ops->start()
won't actually increase seq_file pos. With this fix, the
above `dd` command with `bs=128` will show correct result.

Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 net/ipv6/ip6_fib.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Changelog:
 v2 -> v3:
  - initialize local variable "p" to avoid potential syzbot complaint. (Eric)
 v1 -> v2:
  - instead of push increment of *pos in ipv6_route_seq_next() for
    seq_ops->next() only. Add a face pos pointer in seq_ops->start()
    and use it when calling ipv6_route_seq_next().

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 141c0a4c569a..605cdd38a919 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2622,8 +2622,10 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
 	iter->skip = *pos;
 
 	if (iter->tbl) {
+		loff_t p = 0;
+
 		ipv6_route_seq_setup_walk(iter, net);
-		return ipv6_route_seq_next(seq, NULL, pos);
+		return ipv6_route_seq_next(seq, NULL, &p);
 	} else {
 		return NULL;
 	}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next
  2020-10-14 14:46 [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
@ 2020-10-14 17:29 ` Martin KaFai Lau
  2020-10-14 23:14 ` Andrii Nakryiko
  2020-10-15 17:23 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2020-10-14 17:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Andrii Nakryiko, Vasily Averin

On Wed, Oct 14, 2020 at 07:46:12AM -0700, Yonghong Song wrote:
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
>   https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
>   - increase pos for all seq_ops->start()
>   - increase pos for all seq_ops->next()
> 
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
>   iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().
> 
> For example, I have 7 ipv6 route entries,
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>   00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
>   fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>   00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
>   fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
>   ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>   0+1 records in
>   0+1 records out
>   1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>   root@arch-fb-vm1:~/net-next
> 
> In the above, I specify buffer size 4096, so all records can be returned
> to user space with a single trip to the kernel.
> 
> If I use buffer size 128, since each record size is 149, internally
> kernel seq_read() will read 149 into its internal buffer and return the data
> to user space in two read() syscalls. Then user read() syscall will trigger
> next seq_ops->start(). Since the current implementation increased pos even
> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
> record is #1.
> 
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
>   00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>   fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
> 4+1 records in
> 4+1 records out
> 600 bytes copied, 0.00127758 s, 470 kB/s
> 
> To fix the problem, create a fake pos pointer so seq_ops->start()
> won't actually increase seq_file pos. With this fix, the
> above `dd` command with `bs=128` will show correct result.
> 
> Fixes: 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Suggested-by: Vasily Averin <vvs@virtuozzo.com>
> Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  net/ipv6/ip6_fib.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Changelog:
>  v2 -> v3:
>   - initialize local variable "p" to avoid potential syzbot complaint. (Eric)
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next
  2020-10-14 14:46 [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
  2020-10-14 17:29 ` Martin KaFai Lau
@ 2020-10-14 23:14 ` Andrii Nakryiko
  2020-10-14 23:49   ` Yonghong Song
  2020-10-15 17:23 ` Jakub Kicinski
  2 siblings, 1 reply; 5+ messages in thread
From: Andrii Nakryiko @ 2020-10-14 23:14 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Andrii Nakryiko, Vasily Averin

On Wed, Oct 14, 2020 at 2:53 PM Yonghong Song <yhs@fb.com> wrote:
>
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
>   https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
>   - increase pos for all seq_ops->start()
>   - increase pos for all seq_ops->next()
>
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
>   iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().
>
> For example, I have 7 ipv6 route entries,
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>   00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
>   fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>   00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
>   fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
>   ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
>   00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>   0+1 records in
>   0+1 records out
>   1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>   root@arch-fb-vm1:~/net-next
>
> In the above, I specify buffer size 4096, so all records can be returned
> to user space with a single trip to the kernel.
>
> If I use buffer size 128, since each record size is 149, internally
> kernel seq_read() will read 149 into its internal buffer and return the data
> to user space in two read() syscalls. Then user read() syscall will trigger
> next seq_ops->start(). Since the current implementation increased pos even
> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
> record is #1.
>
>   root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128

Did you test with non-zero skip= parameter as well (to force lseek)?
To make sure we don't break the scenario that original fix tried to
fix.

If that works:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 141c0a4c569a..605cdd38a919 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -2622,8 +2622,10 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
>         iter->skip = *pos;
>
>         if (iter->tbl) {
> +               loff_t p = 0;
> +
>                 ipv6_route_seq_setup_walk(iter, net);
> -               return ipv6_route_seq_next(seq, NULL, pos);
> +               return ipv6_route_seq_next(seq, NULL, &p);

nit: comment here wouldn't hurt for the next guy stumbling upon this
code and wondering why we ignore p afterwards

>         } else {
>                 return NULL;
>         }
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next
  2020-10-14 23:14 ` Andrii Nakryiko
@ 2020-10-14 23:49   ` Yonghong Song
  0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2020-10-14 23:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Andrii Nakryiko, Vasily Averin



On 10/14/20 4:14 PM, Andrii Nakryiko wrote:
> On Wed, Oct 14, 2020 at 2:53 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
>> tried to fix the issue where seq_file pos is not increased
>> if a NULL element is returned with seq_ops->next(). See bug
>>    https://bugzilla.kernel.org/show_bug.cgi?id=206283
>> The commit effectively does:
>>    - increase pos for all seq_ops->start()
>>    - increase pos for all seq_ops->next()
>>
>> For ipv6_route, increasing pos for all seq_ops->next() is correct.
>> But increasing pos for seq_ops->start() is not correct
>> since pos is used to determine how many items to skip during
>> seq_ops->start():
>>    iter->skip = *pos;
>> seq_ops->start() just fetches the *current* pos item.
>> The item can be skipped only after seq_ops->show() which essentially
>> is the beginning of seq_ops->next().
>>
>> For example, I have 7 ipv6 route entries,
>>    root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096
>>    00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001     eth0
>>    fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
>>    00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>>    00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
>>    fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
>>    ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001     eth0
>>    00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
>>    0+1 records in
>>    0+1 records out
>>    1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s
>>    root@arch-fb-vm1:~/net-next
>>
>> In the above, I specify buffer size 4096, so all records can be returned
>> to user space with a single trip to the kernel.
>>
>> If I use buffer size 128, since each record size is 149, internally
>> kernel seq_read() will read 149 into its internal buffer and return the data
>> to user space in two read() syscalls. Then user read() syscall will trigger
>> next seq_ops->start(). Since the current implementation increased pos even
>> for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first
>> record is #1.
>>
>>    root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128
> 
> Did you test with non-zero skip= parameter as well (to force lseek)?
> To make sure we don't break the scenario that original fix tried to
> fix.

I did with skip=1 and it won't show the last line any more. And I
did not really change that logic (increasing pos even when returning 
NULL for seq_ops->next()).

> 
> If that works:
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> [...]
> 
>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
>> index 141c0a4c569a..605cdd38a919 100644
>> --- a/net/ipv6/ip6_fib.c
>> +++ b/net/ipv6/ip6_fib.c
>> @@ -2622,8 +2622,10 @@ static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos)
>>          iter->skip = *pos;
>>
>>          if (iter->tbl) {
>> +               loff_t p = 0;
>> +
>>                  ipv6_route_seq_setup_walk(iter, net);
>> -               return ipv6_route_seq_next(seq, NULL, pos);
>> +               return ipv6_route_seq_next(seq, NULL, &p);
> 
> nit: comment here wouldn't hurt for the next guy stumbling upon this
> code and wondering why we ignore p afterwards

Typically you won't increase pos from seq_ops->start(). So I think
we are fine here without comments.

> 
>>          } else {
>>                  return NULL;
>>          }
>> --
>> 2.24.1
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next
  2020-10-14 14:46 [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
  2020-10-14 17:29 ` Martin KaFai Lau
  2020-10-14 23:14 ` Andrii Nakryiko
@ 2020-10-15 17:23 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2020-10-15 17:23 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Andrii Nakryiko, Vasily Averin

On Wed, 14 Oct 2020 07:46:12 -0700 Yonghong Song wrote:
> Commit 4fc427e05158 ("ipv6_route_seq_next should increase position index")
> tried to fix the issue where seq_file pos is not increased
> if a NULL element is returned with seq_ops->next(). See bug
>   https://bugzilla.kernel.org/show_bug.cgi?id=206283
> The commit effectively does:
>   - increase pos for all seq_ops->start()
>   - increase pos for all seq_ops->next()
> 
> For ipv6_route, increasing pos for all seq_ops->next() is correct.
> But increasing pos for seq_ops->start() is not correct
> since pos is used to determine how many items to skip during
> seq_ops->start():
>   iter->skip = *pos;
> seq_ops->start() just fetches the *current* pos item.
> The item can be skipped only after seq_ops->show() which essentially
> is the beginning of seq_ops->next().

Applied, queued for stable, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 14:46 [PATCH net v3] net: fix pos incrementment in ipv6_route_seq_next Yonghong Song
2020-10-14 17:29 ` Martin KaFai Lau
2020-10-14 23:14 ` Andrii Nakryiko
2020-10-14 23:49   ` Yonghong Song
2020-10-15 17:23 ` Jakub Kicinski

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git