From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1F9BC433E3 for ; Thu, 13 Aug 2020 11:31:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8928A20771 for ; Thu, 13 Aug 2020 11:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726574AbgHMLbI (ORCPT ); Thu, 13 Aug 2020 07:31:08 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:9279 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726100AbgHMLbG (ORCPT ); Thu, 13 Aug 2020 07:31:06 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id E96D9C49CF1BCE701B31; Thu, 13 Aug 2020 19:31:02 +0800 (CST) Received: from [127.0.0.1] (10.174.176.211) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.487.0; Thu, 13 Aug 2020 19:30:56 +0800 Subject: Re: [PATCH 4.19 016/133] cgroup: fix cgroup_sk_alloc() for sk_clone_lock() To: Greg Kroah-Hartman , CC: , Cameron Berkenpas , Peter Geis , Lu Fengqi , =?UTF-8?Q?Dani=c3=abl_Sonck?= , Zhang Qiang , Thomas Lamprecht , Daniel Borkmann , Zefan Li , "Tejun Heo" , Roman Gushchin , Cong Wang , "David S. Miller" References: <20200720152803.732195882@linuxfoundation.org> <20200720152804.513188610@linuxfoundation.org> From: Yang Yingliang Message-ID: Date: Thu, 13 Aug 2020 19:30:55 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200720152804.513188610@linuxfoundation.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.174.176.211] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2020/7/20 23:36, Greg Kroah-Hartman wrote: > From: Cong Wang > > [ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ] > > When we clone a socket in sk_clone_lock(), its sk_cgrp_data is > copied, so the cgroup refcnt must be taken too. And, unlike the > sk_alloc() path, sock_update_netprioidx() is not called here. > Therefore, it is safe and necessary to grab the cgroup refcnt > even when cgroup_sk_alloc is disabled. > > sk_clone_lock() is in BH context anyway, the in_interrupt() > would terminate this function if called there. And for sk_alloc() > skcd->val is always zero. So it's safe to factor out the code > to make it more readable. > > The global variable 'cgroup_sk_alloc_disabled' is used to determine > whether to take these reference counts. It is impossible to make > the reference counting correct unless we save this bit of information > in skcd->val. So, add a new bit there to record whether the socket > has already taken the reference counts. This obviously relies on > kmalloc() to align cgroup pointers to at least 4 bytes, > ARCH_KMALLOC_MINALIGN is certainly larger than that. > > This bug seems to be introduced since the beginning, commit > d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") > tried to fix it but not compeletely. It seems not easy to trigger until > the recent commit 090e28b229af > ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged. > > Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") > Reported-by: Cameron Berkenpas > Reported-by: Peter Geis > Reported-by: Lu Fengqi > Reported-by: Daniƫl Sonck > Reported-by: Zhang Qiang > Tested-by: Cameron Berkenpas > Tested-by: Peter Geis > Tested-by: Thomas Lamprecht > Cc: Daniel Borkmann > Cc: Zefan Li > Cc: Tejun Heo > Cc: Roman Gushchin > Signed-off-by: Cong Wang > Signed-off-by: David S. Miller > Signed-off-by: Greg Kroah-Hartman > --- [...] > > +void cgroup_sk_clone(struct sock_cgroup_data *skcd) > +{ > + /* Socket clone path */ > + if (skcd->val) { Compare to mainline patch, it's missing *if (skcd->no_refcnt)* check here. Is it a mistake here ? Thanks, Yang > + /* > + * We might be cloning a socket which is left in an empty > + * cgroup and the cgroup might have already been rmdir'd. > + * Don't use cgroup_get_live(). > + */ > + cgroup_get(sock_cgroup_ptr(skcd)); > + } > +} > + > void cgroup_sk_free(struct sock_cgroup_data *skcd) > { > + if (skcd->no_refcnt) > + return; > + > cgroup_put(sock_cgroup_ptr(skcd)); > } > > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1694,7 +1694,7 @@ struct sock *sk_clone_lock(const struct > /* sk->sk_memcg will be populated at accept() time */ > newsk->sk_memcg = NULL; > > - cgroup_sk_alloc(&newsk->sk_cgrp_data); > + cgroup_sk_clone(&newsk->sk_cgrp_data); > > rcu_read_lock(); > filter = rcu_dereference(sk->sk_filter); > > > .