From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: <kafai@fb.com>
Cc: <kuniyu@amazon.co.jp>, <ast@kernel.org>, <bpf@vger.kernel.org>,
<daniel@iogearbox.net>, <edumazet@google.com>,
<kernel-team@fb.com>, <ncardwell@google.com>,
<netdev@vger.kernel.org>, <ycheng@google.com>, <yhs@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/8] tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
Date: Fri, 23 Jul 2021 00:08:10 +0900 [thread overview]
Message-ID: <20210722150810.74315-1-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <20210722141637.68161-1-kuniyu@amazon.co.jp>
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date: Thu, 22 Jul 2021 23:16:37 +0900
> From: Martin KaFai Lau <kafai@fb.com>
> Date: Thu, 1 Jul 2021 13:05:41 -0700
> > st->bucket stores the current bucket number.
> > st->offset stores the offset within this bucket that is the sk to be
> > seq_show(). Thus, st->offset only makes sense within the same
> > st->bucket.
> >
> > These two variables are an optimization for the common no-lseek case.
> > When resuming the seq_file iteration (i.e. seq_start()),
> > tcp_seek_last_pos() tries to continue from the st->offset
> > at bucket st->bucket.
> >
> > However, it is possible that the bucket pointed by st->bucket
> > has changed and st->offset may end up skipping the whole st->bucket
> > without finding a sk. In this case, tcp_seek_last_pos() currently
> > continues to satisfy the offset condition in the next (and incorrect)
> > bucket. Instead, regardless of the offset value, the first sk of the
> > next bucket should be returned. Thus, "bucket == st->bucket" check is
> > added to tcp_seek_last_pos().
> >
> > The chance of hitting this is small and the issue is a decade old,
> > so targeting for the next tree.
>
> Multiple read()s or lseek()+read() can call tcp_seek_last_pos().
>
> IIUC, the problem happens when the sockets placed before the last shown
> socket in the list are closed between some read()s or lseek() and read().
>
> I think there is still a case where bucket is valid but offset is invalid:
>
> listening_hash[1] -> sk1 -> sk2 -> sk3 -> nulls
> listening_hash[2] -> sk4 -> sk5 -> nulls
>
> read(/proc/net/tcp)
> end up with sk2
>
> close(sk1)
>
> listening_hash[1] -> sk2 -> sk3 -> nulls
> listening_hash[2] -> sk4 -> sk5 -> nulls
>
> read(/proc/net/tcp) (resume)
> offset = 2
>
> listening_get_next() returns sk2
>
> while (offset--)
> 1st loop listening_get_next() returns sk3 (bucket == st->bucket)
> 2nd loop listening_get_next() returns sk4 (bucket != st->bucket)
>
> show() starts from sk4
>
> only is sk3 skipped, but should be shown.
Sorry, this example is wrong.
We can handle this properly by testing bucket != st->bucket.
In the case below, we cannot check if the offset is valid or not by testing
the bucket.
listening_hash[1] -> sk1 -> sk2 -> sk3 -> sk4 -> nulls
read(/proc/net/tcp)
end up with sk2
close(sk1)
listening_hash[1] -> sk2 -> sk3 -> sk4 -> nulls
read(/proc/net/tcp) (resume)
offset = 2
listening_get_first() returns sk2
while (offset--)
1st loop listening_get_next() returns sk3 (bucket == st->bucket)
2nd loop listening_get_next() returns sk4 (bucket == st->bucket)
show() starts from sk4
only is sk3 skipped, but should be shown.
>
> In listening_get_next(), we can check if we passed through sk2, but this
> does not work well if sk2 itself is closed... then there are no way to
> check the offset is valid or not.
>
> Handling this may be too much though, what do you think ?
>
next prev parent reply other threads:[~2021-07-22 15:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 20:05 [PATCH v2 bpf-next 0/8] bpf: Allow bpf tcp iter to do bpf_(get|set)sockopt Martin KaFai Lau
2021-07-01 20:05 ` [PATCH v2 bpf-next 1/8] tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos Martin KaFai Lau
2021-07-22 14:16 ` Kuniyuki Iwashima
2021-07-22 15:08 ` Kuniyuki Iwashima [this message]
2021-07-22 21:42 ` Martin KaFai Lau
2021-07-22 22:06 ` Kuniyuki Iwashima
2021-07-01 20:05 ` [PATCH v2 bpf-next 2/8] tcp: seq_file: Refactor net and family matching Martin KaFai Lau
2021-07-01 20:05 ` [PATCH v2 bpf-next 3/8] bpf: tcp: seq_file: Remove bpf_seq_afinfo from tcp_iter_state Martin KaFai Lau
2021-07-01 20:06 ` [PATCH v2 bpf-next 4/8] tcp: seq_file: Add listening_get_first() Martin KaFai Lau
2021-07-01 20:06 ` [PATCH v2 bpf-next 5/8] tcp: seq_file: Replace listening_hash with lhash2 Martin KaFai Lau
2021-07-01 20:06 ` [PATCH v2 bpf-next 6/8] bpf: tcp: bpf iter batching and lock_sock Martin KaFai Lau
2021-07-01 20:06 ` [PATCH v2 bpf-next 7/8] bpf: tcp: Support bpf_(get|set)sockopt in bpf tcp iter Martin KaFai Lau
2021-07-01 20:06 ` [PATCH v2 bpf-next 8/8] bpf: selftest: Test batching and " Martin KaFai Lau
2021-07-02 10:50 ` [PATCH v2 bpf-next 0/8] bpf: Allow bpf tcp iter to do bpf_(get|set)sockopt David Laight
2021-07-06 15:44 ` Martin KaFai Lau
2021-07-15 1:29 ` Alexei Starovoitov
2021-07-20 18:05 ` Alexei Starovoitov
2021-07-20 18:42 ` Eric Dumazet
2021-07-22 13:25 ` Eric Dumazet
2021-07-22 21:01 ` Martin KaFai Lau
2021-07-22 14:53 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210722150810.74315-1-kuniyu@amazon.co.jp \
--to=kuniyu@amazon.co.jp \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).