From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44715C43334 for ; Tue, 7 Jun 2022 21:02:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377144AbiFGVCl (ORCPT ); Tue, 7 Jun 2022 17:02:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379232AbiFGVCN (ORCPT ); Tue, 7 Jun 2022 17:02:13 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2FA736151 for ; Tue, 7 Jun 2022 11:47:02 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id w2-20020a17090ac98200b001e0519fe5a8so16190542pjt.4 for ; Tue, 07 Jun 2022 11:47:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=message-id:date:mime-version:user-agent:content-language:to:cc :references:from:subject:in-reply-to:content-transfer-encoding; bh=ClvLn1v0KW5jcy2W1CElCvPJ8/3zjMIv2rSPG5d3wcg=; b=tc1I+sdA0FwDJ/gsRcalhTdWIHb3oeBmUEKkK8g4OZEMjXtAlRTZvaNhHDSk/3+9Zf ZCnTm71Ikwc9fj78Z6tJ+r+s3NKyZPGXQUM5GKc63uB2wcSoTM+yHLH1Ir/Fm51e9ot0 iPOjiMnofGcQT1I4z3zOljtH0zbFxSybwkGC8Uh8OTfQAgLL0VB/fu8yQm7VfTWTj1iK ViabI4/ZPV3krTjB9RtlTrwxbRjkqxfKSGqc7GY05gqlqamDH4sYLLUCxRq+Tg2RE6e2 AQwMtPhyN/4yV7ZESuVPupvYQ3xLbWu98R4sqW9Vc3joiZka92axUvzIDfM/7JpyJlSZ AErQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:subject:in-reply-to :content-transfer-encoding; bh=ClvLn1v0KW5jcy2W1CElCvPJ8/3zjMIv2rSPG5d3wcg=; b=227KLh5GZ8KWO/sAJRVYNSScTUets0EqzQBzhs7n626fFtErfNh4haKubSeSVljBqT yPtMXxDyzVsqfLZYtOpxCDdyhGq+eNJns+uLxf1WTYoIAn21MUtIkLDA2S3Gk2OaTcNw ohnRyQm4/eC5MtcTXZ6dQbD6vE1MPQ1dAvFx2EWwfVzyTI6oDVXuNHnHDypsGk3WFZJO 34dmmgFIRzHkBempdAz8cV4E4OYbn5I8MZngVuEva247fzugVKPSfACDCpqxXgcBuRgt aeYMrnppniUFJemLhCHN9B+BpYAMNZfCoWLAA5MeknGgbBIqDL+wcRAf+85EyxPpe5ay MdYA== X-Gm-Message-State: AOAM530r6XQ93wrR9P8P6DltSVH2l5/gViyAo5bSABmagDLB6fPYgJwQ bSo/cB2ttIeMIkHLCebkA7K2LA== X-Google-Smtp-Source: ABdhPJzC+jPXqON9HpNHUt08r2bqtxWHt/ITCvwKc/1nQ9S2LBfu4fKx7HHimoSJfCLrrW0kyL4vcg== X-Received: by 2002:a17:902:ec92:b0:166:3502:ecb1 with SMTP id x18-20020a170902ec9200b001663502ecb1mr30357875plg.62.1654627621789; Tue, 07 Jun 2022 11:47:01 -0700 (PDT) Received: from [192.168.254.36] ([50.39.160.154]) by smtp.gmail.com with ESMTPSA id a2-20020a170902710200b0016141e6c5acsm13036791pll.296.2022.06.07.11.47.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Jun 2022 11:47:01 -0700 (PDT) Message-ID: Date: Tue, 7 Jun 2022 11:47:00 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , cgroups@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, stable@vger.kernel.org, linux-kernel@vger.kernel.org, syzbot+e42ae441c3b10acf9e9d@syzkaller.appspotmail.com References: <20220603173455.441537-1-tadeusz.struk@linaro.org> <20220603181321.443716-1-tadeusz.struk@linaro.org> <20220606123910.GF6928@blackbody.suse.cz> From: Tadeusz Struk Subject: Re: [PATCH v2] cgroup: serialize css kill and release paths In-Reply-To: <20220606123910.GF6928@blackbody.suse.cz> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 6/6/22 05:39, Michal Koutný wrote: > On Fri, Jun 03, 2022 at 11:13:21AM -0700, Tadeusz Struk wrote: >> In such scenario the css_killed_work_fn will be en-queued via >> cgroup_apply_control_disable(cgrp)->kill_css(css), and bail out to >> cgroup_kn_unlock(). Then cgroup_kn_unlock() will call: >> cgroup_put(cgrp)->css_put(&cgrp->self), which will try to enqueue >> css_release_work_fn for the same css instance, causing a list_add >> corruption bug, as can be seen in the syzkaller report [1]. > This hypothesis doesn't add up to me (I am sorry). > > The kill_css(css) would be a css associated with a subsys (css.ss != > NULL) whereas css_put(&cgrp->self) is a different css just for the > cgroup (css.ss == NULL). Yes, you are right. I couldn't figure it out where the extra css_put() is called from, and the only place that fitted into my theory was from the cgroup_kn_unlock() in cgroup_apply_control_disable(). After some more debugging I can see that, as you said, the cgrp->self is a different css. The offending _put() is actually called by the percpu_ref_kill_and_confirm(), as it not only calls the passed confirm_kill percpu_ref_func_t, but also it puts the refcnt iself. Because the cgroup_apply_control_disable() will loop for_each_live_descendant, and call css_kill() on all css'es, and css_killed_work_fn() will also loop and call css_put() on all parents, the css_release() will be called on the first parent prematurely, causing the BUG(). What I think should be done to balance put/get is to call css_get() for all the parents in kill_css(): diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index c1e1a5c34e77..3ca61325bc4e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5527,6 +5527,8 @@ static void css_killed_ref_fn(struct percpu_ref *ref) */ static void kill_css(struct cgroup_subsys_state *css) { + struct cgroup_subsys_state *_css = css; + lockdep_assert_held(&cgroup_mutex); if (css->flags & CSS_DYING) @@ -5541,10 +5543,13 @@ static void kill_css(struct cgroup_subsys_state *css) css_clear_dir(css); /* - * Killing would put the base ref, but we need to keep it alive - * until after ->css_offline(). + * Killing would put the base ref, but we need to keep it alive, + * and all its parents, until after ->css_offline(). */ - css_get(css); + do { + css_get(_css); + _css = _css->parent; + } while (_css && atomic_read(&_css->online_cnt)); /* * cgroup core guarantees that, by the time ->css_offline() is This will be then "reverted" in css_killed_work_fn() Please let me know if it makes sense to you. I'm still testing it, but syzbot is very slow today. -- Thanks, Tadeusz