From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751496AbdH3TTO (ORCPT ); Wed, 30 Aug 2017 15:19:14 -0400 Received: from mail-yw0-f178.google.com ([209.85.161.178]:32848 "EHLO mail-yw0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751435AbdH3TTN (ORCPT ); Wed, 30 Aug 2017 15:19:13 -0400 X-Google-Smtp-Source: ADKCNb57QAk98Ihu4ngWjmGQgzRmZdA/CeKW+NDTLFbpuGqp6ZO7P0Y2xa4FdAdmKDLDzomYVCNzWBcFt/p9YIC6DLQ= MIME-Version: 1.0 In-Reply-To: <1504115735.5852.11.camel@gmx.de> References: <1503996623.8323.20.camel@gmx.de> <1504025721.6024.25.camel@gmx.de> <1504030207.6560.0.camel@gmx.de> <1504069332.8352.3.camel@gmx.de> <1504113212.5852.6.camel@gmx.de> <1504115735.5852.11.camel@gmx.de> From: Kees Cook Date: Wed, 30 Aug 2017 12:19:11 -0700 X-Google-Sender-Auth: SMU0MIyeM2W38KKblRcQV8ZAsvw Message-ID: Subject: Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection To: Mike Galbraith Cc: LKML , Ingo Molnar , "Reshetova, Elena" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 30, 2017 at 10:55 AM, Mike Galbraith wrote: > On Wed, 2017-08-30 at 10:32 -0700, Kees Cook wrote: >> On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith wrote: >> > On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote: >> >> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith wrote: >> >> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote: >> >> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc >> >> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT >> >> >> disabled)? >> >> > >> >> > Nogo. >> >> >> >> Thanks for checking! >> >> >> >> > [ 44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0 >> >> >> >> This is so odd if 14afee4b6092 is reverted. What is line 374 for you >> >> in net/netlink/af_netlink.c? >> > >> > 374 WARN_ON(atomic_read(&sk->sk_rmem_alloc)); >> > >> > That line is unchanged by 14afee4b6092. >> >> Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the >> warn from the next line with the refcount usage... I will keep >> digging. Thanks! > > I just double checked freshly pulled tip (rapidly moving target), and > it's definitely nogo with CONFIG_ARCH_HAS_REFCOUNT=y and 14afee4b6092 > reverted. Okay, thanks for double-checking. I'd be curious to see if it would help to revert any of these too: fb5c2c17a556 net: convert packet_fanout.sk_ref from atomic_t to refcount_t b4217b82893c net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t c122e14df2d6 net: convert net.passive from atomic_t to refcount_t edcb691871b2 net: convert inet_frag_queue.refcnt from atomic_t to refcount_t 717d1e993ad8 net: convert fib_rule.refcnt from atomic_t to refcount_t 8c9814b97002 net: convert unix_address.refcnt from atomic_t to refcount_t 433cea4d9bbb net: convert netpoll_info.refcnt from atomic_t to refcount_t 7658b36f1b31 net: convert in_device.refcnt from atomic_t to refcount_t 8851ab526791 net: convert ip_mc_list.refcnt from atomic_t to refcount_t 41c6d650f653 net: convert sock.sk_refcnt from atomic_t to refcount_t 14afee4b6092 net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2638595afccf net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t 633547973ffc net: convert sk_buff.users from atomic_t to refcount_t 53869cebce4b net: convert nf_bridge_info.use from atomic_t to refcount_t 6343944bc105 net: convert neigh_params.refcnt from atomic_t to refcount_t 9f23743017d1 net: convert neighbour.refcnt from atomic_t to refcount_t 1cc9a98b59ba net: convert inet_peer.refcnt from atomic_t to refcount_t What I find so strange about this is that it's tripping a warning in an atomic_t state, and works fine with FULL or "none". x86-refcount has almost identical behavior to "none" (though obviously, this seems like those differences must be the source of the bug), but FULL is much more aggressive. So some silent x86-refcount issue is causing a side-effect in af_netlink... Something very subtle is happening here. (So subtle I can't reproduce it yet, sadly...) -Kees -- Kees Cook Pixel Security