Re: [PATCH v2 bpf-next 1/8] tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos

From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: <kafai@fb.com>
Cc: <kuniyu@amazon.co.jp>, <ast@kernel.org>, <bpf@vger.kernel.org>,
	<daniel@iogearbox.net>, <edumazet@google.com>,
	<kernel-team@fb.com>, <ncardwell@google.com>,
	<netdev@vger.kernel.org>, <ycheng@google.com>, <yhs@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/8] tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
Date: Fri, 23 Jul 2021 00:08:10 +0900	[thread overview]
Message-ID: <20210722150810.74315-1-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <20210722141637.68161-1-kuniyu@amazon.co.jp>

From:   Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Date:   Thu, 22 Jul 2021 23:16:37 +0900
> From:   Martin KaFai Lau <kafai@fb.com>
> Date:   Thu, 1 Jul 2021 13:05:41 -0700
> > st->bucket stores the current bucket number.
> > st->offset stores the offset within this bucket that is the sk to be
> > seq_show().  Thus, st->offset only makes sense within the same
> > st->bucket.
> > 
> > These two variables are an optimization for the common no-lseek case.
> > When resuming the seq_file iteration (i.e. seq_start()),
> > tcp_seek_last_pos() tries to continue from the st->offset
> > at bucket st->bucket.
> > 
> > However, it is possible that the bucket pointed by st->bucket
> > has changed and st->offset may end up skipping the whole st->bucket
> > without finding a sk.  In this case, tcp_seek_last_pos() currently
> > continues to satisfy the offset condition in the next (and incorrect)
> > bucket.  Instead, regardless of the offset value, the first sk of the
> > next bucket should be returned.  Thus, "bucket == st->bucket" check is
> > added to tcp_seek_last_pos().
> > 
> > The chance of hitting this is small and the issue is a decade old,
> > so targeting for the next tree.
> 
> Multiple read()s or lseek()+read() can call tcp_seek_last_pos().
> 
> IIUC, the problem happens when the sockets placed before the last shown
> socket in the list are closed between some read()s or lseek() and read().
> 
> I think there is still a case where bucket is valid but offset is invalid:
> 
>   listening_hash[1] -> sk1 -> sk2 -> sk3 -> nulls
>   listening_hash[2] -> sk4 -> sk5 -> nulls
> 
>   read(/proc/net/tcp)
>     end up with sk2
> 
>   close(sk1)
> 
>   listening_hash[1] -> sk2 -> sk3 -> nulls
>   listening_hash[2] -> sk4 -> sk5 -> nulls
> 
>   read(/proc/net/tcp) (resume)
>     offset = 2
> 
>     listening_get_next() returns sk2
> 
>     while (offset--)
>       1st loop listening_get_next() returns sk3 (bucket == st->bucket)
>       2nd loop listening_get_next() returns sk4 (bucket != st->bucket)
> 
>     show() starts from sk4
> 
>     only is sk3 skipped, but should be shown.

Sorry, this example is wrong.
We can handle this properly by testing bucket != st->bucket.

In the case below, we cannot check if the offset is valid or not by testing
the bucket.

  listening_hash[1] -> sk1 -> sk2 -> sk3 -> sk4 -> nulls

  read(/proc/net/tcp)
    end up with sk2

  close(sk1)

  listening_hash[1] -> sk2 -> sk3 -> sk4 -> nulls

  read(/proc/net/tcp) (resume)
    offset = 2

    listening_get_first() returns sk2

    while (offset--)
      1st loop listening_get_next() returns sk3 (bucket == st->bucket)
      2nd loop listening_get_next() returns sk4 (bucket == st->bucket)

    show() starts from sk4

    only is sk3 skipped, but should be shown.

> 
> In listening_get_next(), we can check if we passed through sk2, but this
> does not work well if sk2 itself is closed... then there are no way to
> check the offset is valid or not.
> 
> Handling this may be too much though, what do you think ?
>