From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754593AbcKUMrP (ORCPT ); Mon, 21 Nov 2016 07:47:15 -0500 Received: from mail-ua0-f195.google.com ([209.85.217.195]:34199 "EHLO mail-ua0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754188AbcKUMrM (ORCPT ); Mon, 21 Nov 2016 07:47:12 -0500 MIME-Version: 1.0 In-Reply-To: <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> References: <20161117085342.GB3142@twins.programming.kicks-ass.net> <20161117161937.GA46515@ast-mbp.thefacebook.com> <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> From: David Windsor Date: Mon, 21 Nov 2016 07:47:09 -0500 Message-ID: Subject: Re: [RFC][PATCH 2/7] kref: Add kref_read() To: "Reshetova, Elena" Cc: Alexei Starovoitov , Peter Zijlstra , Kees Cook , Greg KH , Will Deacon , Arnd Bergmann , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , LKML , Daniel Borkmann Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 18, 2016 at 12:33 PM, Reshetova, Elena wrote: > On Thu, Nov 17, 2016 at 09:53:42AM +0100, Peter Zijlstra wrote: >> On Wed, Nov 16, 2016 at 12:08:52PM -0800, Alexei Starovoitov wrote: >> >> > I prefer to avoid 'fixing' things that are not broken. >> > Note, prog->aux->refcnt already has explicit checks for overflow. >> > locked_vm is used for resource accounting and not refcnt, so I don't >> > see issues there either. >> >> The idea is to use something along the lines of: >> >> >> http://lkml.kernel.org/r/20161115104608.GH3142@twins.programming.kicks >> -ass.net >> >> for all refcounts in the kernel. > >>I understand the idea. I'm advocating to fix refcnts explicitly the way we did in bpf land instead of leaking memory, making processes unkillable and so on. >>If refcnt can be bounds checked, it should be done that way, since it's a clean error path without odd side effects. >>Therefore I'm against unconditionally applying refcount to all atomics. > >> Also note that your: >> >> struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) { >> if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) { >> atomic_sub(i, &prog->aux->refcnt); >> return ERR_PTR(-EBUSY); >> } >> return prog; >> } >> >> is actually broken in the face of an actual overflow. Suppose @i is >> big enough to wrap refcnt into negative space. > >>'i' is not controlled by user. It's a number of nic hw queues and BPF_MAX_REFCNT is 32k, so above is always safe. > > If I understand your code right, you export the bpf_prog_add() and anyone is free to use it > (some crazy buggy driver for example). > Currently only drivers/net/ethernet/mellanox/mlx4/en_netdev.c uses it, but you should > consider any externally exposed interface as an attack vector from security point of view. > So, I would not claim that above construction is always safe since there is a way using API to > supply "i" that would overflow. > > Next question is how to convert the above code sanely to refcount_t interface... Loop of inc(s)? Iikk... > By the way, there are several sites where the use of atomic_t/atomic_wrap_t as a counter ventures beyond the standard (inc, dec, add, sub, read, set) operations we're planning on implementing for both refcount_t and stats_t. While performing the conversion to stats_t, I've found usage of atomic_xchg(), for instance. From kernel/trace/trace_mmiotrace.c:123: unsigned long cnt = atomic_xchg(&dropped_count, 0); stats_xchg() isn't anticipated to go into the stats_t API, and dropped_count clearly appears to be a statistical counter, so we will have to further audit this site to determine whether the atomicity of the atomic_xchg() operation is truly necessary here. If it is, we can either decide to implement stats_xchg(), or we could use a combination of locking, stats_read() and stats_set() to accomplish the same thing as stats_xchg(). > >