All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Kuniyuki Iwashima <kuni1840@gmail.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Satoru Moriya <satoru.moriya@hds.com>
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.
Date: Wed, 6 Jul 2022 09:27:11 -0400	[thread overview]
Message-ID: <20220706092711.28ce57e6@gandalf.local.home> (raw)
In-Reply-To: <20220706091707.07251fd9@gandalf.local.home>

On Wed, 6 Jul 2022 09:17:07 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 5 Jul 2022 22:21:25 -0700
> Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> 
> > --- a/include/trace/events/sock.h
> > +++ b/include/trace/events/sock.h
> > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> >  
> >  	TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> >  		__entry->name,
> > -		__entry->sysctl_mem[0],
> > -		__entry->sysctl_mem[1],
> > -		__entry->sysctl_mem[2],
> > +		READ_ONCE(__entry->sysctl_mem[0]),
> > +		READ_ONCE(__entry->sysctl_mem[1]),
> > +		READ_ONCE(__entry->sysctl_mem[2]),  
> 
> This is not reading anything to do with sysctl. It's reading the content of
> what was recorded in the ring buffer.
> 
> That is, the READ_ONCE() here is not necessary, and if anything will break
> user space parsing, as this is exported to user space to tell it how to
> read the binary format in the ring buffer.

I take that back. Looking at the actual trace event, it is pointing to
sysctl memory, which is a major bug.

TRACE_EVENT(sock_exceed_buf_limit,

        TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),

        TP_ARGS(sk, prot, allocated, kind),

        TP_STRUCT__entry(
                __array(char, name, 32)
                __field(long *, sysctl_mem)

sysctl_mem is a pointer.

                __field(long, allocated)
                __field(int, sysctl_rmem)
                __field(int, rmem_alloc)
                __field(int, sysctl_wmem)
                __field(int, wmem_alloc)
                __field(int, wmem_queued)
                __field(int, kind)
        ),

        TP_fast_assign(
                strncpy(__entry->name, prot->name, 32);

                __entry->sysctl_mem = prot->sysctl_mem;


They save the pointer **IN THE RING BUFFER**!!!

                __entry->allocated = allocated;
                __entry->sysctl_rmem = sk_get_rmem0(sk, prot);
                __entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
                __entry->sysctl_wmem = sk_get_wmem0(sk, prot);
                __entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
                __entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
                __entry->kind = kind;
        ),

        TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
                __entry->name,
                __entry->sysctl_mem[0],
                __entry->sysctl_mem[1],
                __entry->sysctl_mem[2],

They are now reading a stale pointer, which can be read at any time. That
is, you get the information of what is in sysctl_mem at the time the ring
buffer is read (which is useless from user space), and not at the time of
the event.

Thanks for pointing this out. This needs to be fixed.

-- Steve


                __entry->allocated,
                __entry->sysctl_rmem,
                __entry->rmem_alloc,
                __entry->sysctl_wmem,
                __entry->wmem_alloc,
                __entry->wmem_queued,
                show_skmem_kind_names(__entry->kind)
        )

  reply	other threads:[~2022-07-06 13:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06  5:21 [PATCH v1 net 00/16] sysctl: Fix data-races around ipv4_table Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 01/16] sysctl: Clean up proc_handler definitions Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 02/16] sysctl: Add proc_dobool_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 03/16] sysctl: Add proc_dointvec_lockless() Kuniyuki Iwashima
2022-07-06  7:00   ` Eric Dumazet
2022-07-06 16:15     ` Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 04/16] sysctl: Add proc_douintvec_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 05/16] sysctl: Add proc_dointvec_minmax_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 06/16] sysctl: Add proc_douintvec_minmax_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 07/16] sysctl: Add proc_doulongvec_minmax_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 08/16] sysctl: Add proc_dointvec_jiffies_lockless() Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 09/16] tcp: Fix a data-race around sysctl_tcp_max_orphans Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 10/16] inetpeer: Fix data-races around sysctl Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem Kuniyuki Iwashima
2022-07-06 13:17   ` Steven Rostedt
2022-07-06 13:27     ` Steven Rostedt [this message]
2022-07-06 16:27       ` Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 12/16] tcp: Mark sysctl_tcp_low_latency obsolete Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 13/16] cipso: Fix a data-race around cipso_v4_cache_bucketsize Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 14/16] cipso: Fix data-races around boolean sysctl Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 15/16] icmp: Fix data-races around sysctl Kuniyuki Iwashima
2022-07-06  5:21 ` [PATCH v1 net 16/16] ipv4: Fix a data-race around sysctl_fib_sync_mem Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706092711.28ce57e6@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=keescook@chromium.org \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=satoru.moriya@hds.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.