From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A35E9C83000 for ; Wed, 29 Apr 2020 23:33:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 883C62072A for ; Wed, 29 Apr 2020 23:33:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726950AbgD2Xdu (ORCPT ); Wed, 29 Apr 2020 19:33:50 -0400 Received: from www62.your-server.de ([213.133.104.62]:33030 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726935AbgD2Xdt (ORCPT ); Wed, 29 Apr 2020 19:33:49 -0400 Received: from sslproxy05.your-server.de ([78.46.172.2]) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1jTwD4-0000YI-UR; Thu, 30 Apr 2020 01:33:47 +0200 Received: from [178.195.186.98] (helo=pc-9.home) by sslproxy05.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jTwD4-0003Te-LV; Thu, 30 Apr 2020 01:33:46 +0200 Subject: Re: [PATCH v8 bpf-next 1/3] bpf: sharing bpf runtime stats with BPF_ENABLE_STATS To: Song Liu , bpf@vger.kernel.org, netdev@vger.kernel.org Cc: ast@kernel.org, kernel-team@fb.com References: <20200429064543.634465-1-songliubraving@fb.com> <20200429064543.634465-2-songliubraving@fb.com> From: Daniel Borkmann Message-ID: <77523dab-bfd0-45d4-0d03-26a07bb6483e@iogearbox.net> Date: Thu, 30 Apr 2020 01:33:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20200429064543.634465-2-songliubraving@fb.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.2/25797/Wed Apr 29 14:06:14 2020) Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 4/29/20 8:45 AM, Song Liu wrote: > Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats. > Typical userspace tools use kernel.bpf_stats_enabled as follows: > > 1. Enable kernel.bpf_stats_enabled; > 2. Check program run_time_ns; > 3. Sleep for the monitoring period; > 4. Check program run_time_ns again, calculate the difference; > 5. Disable kernel.bpf_stats_enabled. > > The problem with this approach is that only one userspace tool can toggle > this sysctl. If multiple tools toggle the sysctl at the same time, the > measurement may be inaccurate. > > To fix this problem while keep backward compatibility, introduce a new > bpf command BPF_ENABLE_STATS. On success, this command enables stats and > returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently, > only one type, BPF_STATS_RUN_TIME, is supported. We can extend the > command to support other types of stats in the future. > > With BPF_ENABLE_STATS, user space tool would have the following flow: > > 1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid; > 2. Check program run_time_ns; > 3. Sleep for the monitoring period; > 4. Check program run_time_ns again, calculate the difference; > 5. Close the fd. > > Signed-off-by: Song Liu [...] > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index d23c04cbe14f..8691b2cc550d 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -3872,6 +3872,60 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) > return fd; > } > > +DEFINE_MUTEX(bpf_stats_enabled_mutex); > + > +static int bpf_stats_release(struct inode *inode, struct file *file) > +{ > + mutex_lock(&bpf_stats_enabled_mutex); > + static_key_slow_dec(&bpf_stats_enabled_key.key); > + mutex_unlock(&bpf_stats_enabled_mutex); > + return 0; > +} > + > +static const struct file_operations bpf_stats_fops = { > + .release = bpf_stats_release, > +}; > + > +static int bpf_enable_runtime_stats(void) > +{ > + int fd; > + > + mutex_lock(&bpf_stats_enabled_mutex); > + > + /* Set a very high limit to avoid overflow */ > + if (static_key_count(&bpf_stats_enabled_key.key) > INT_MAX / 2) { > + mutex_unlock(&bpf_stats_enabled_mutex); > + return -EBUSY; > + } > + > + fd = anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, 0); Missing O_CLOEXEC or intentional (if latter, I'd have expected a comment here though)? > + if (fd >= 0) > + static_key_slow_inc(&bpf_stats_enabled_key.key); > + > + mutex_unlock(&bpf_stats_enabled_mutex); > + return fd; > +} > + > +#define BPF_ENABLE_STATS_LAST_FIELD enable_stats.type > + [...] > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index e961286d0e14..af08ef0690cb 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -201,6 +201,40 @@ static int max_extfrag_threshold = 1000; > > #endif /* CONFIG_SYSCTL */ > > +#ifdef CONFIG_BPF_SYSCALL > +static int bpf_stats_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, > + loff_t *ppos) > +{ > + struct static_key *key = (struct static_key *)table->data; > + static int saved_val; > + int val, ret; > + struct ctl_table tmp = { > + .data = &val, > + .maxlen = sizeof(val), > + .mode = table->mode, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }; > + > + if (write && !capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + mutex_lock(&bpf_stats_enabled_mutex); > + val = saved_val; > + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); > + if (write && !ret && val != saved_val) { > + if (val) > + static_key_slow_inc(key); > + else > + static_key_slow_dec(key); > + saved_val = val; > + } > + mutex_unlock(&bpf_stats_enabled_mutex); > + return ret; > +} nit: I wonder if most of the logic could have been shared with proc_do_static_key() here and only the mutex passed as an arg to the common helper? > +#endif > + > /* > * /proc/sys support > */ > @@ -2549,7 +2583,7 @@ static struct ctl_table kern_table[] = { > .data = &bpf_stats_enabled_key.key, > .maxlen = sizeof(bpf_stats_enabled_key), > .mode = 0644,