From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E9DFC433DB for ; Tue, 9 Feb 2021 03:28:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 26EEE601FD for ; Tue, 9 Feb 2021 03:28:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 26EEE601FD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 71A166B006C; Mon, 8 Feb 2021 22:28:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CA156B006E; Mon, 8 Feb 2021 22:28:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DF756B0070; Mon, 8 Feb 2021 22:28:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id 4876F6B006C for ; Mon, 8 Feb 2021 22:28:05 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 03D71824805A for ; Tue, 9 Feb 2021 03:28:05 +0000 (UTC) X-FDA: 77797295730.06.bag35_2b0b8ea27604 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id D8DC91005012C for ; Tue, 9 Feb 2021 03:28:04 +0000 (UTC) X-HE-Tag: bag35_2b0b8ea27604 X-Filterd-Recvd-Size: 4178 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Tue, 9 Feb 2021 03:28:04 +0000 (UTC) Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DZSwR18dCzlHtb; Tue, 9 Feb 2021 11:26:15 +0800 (CST) Received: from [10.174.179.149] (10.174.179.149) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.498.0; Tue, 9 Feb 2021 11:27:57 +0800 Subject: Re: [PATCH RFC] hugetlb_cgroup: fix unbalanced css_put for shared mappings To: Mike Kravetz , CC: , , , References: <20210123093111.60785-1-linmiaohe@huawei.com> <32100d84-8a26-2f8f-303f-52182ce72f52@oracle.com> From: Miaohe Lin Message-ID: <1f683c18-6a22-b5a9-6352-2e7d956132bb@huawei.com> Date: Tue, 9 Feb 2021 11:27:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <32100d84-8a26-2f8f-303f-52182ce72f52@oracle.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.149] X-CFilter-Loop: Reflected X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/2/9 3:52, Mike Kravetz wrote: > On 1/23/21 1:31 AM, Miaohe Lin wrote: >> The current implementation of hugetlb_cgroup for shared mappings could have >> different behavior. Consider the following two scenarios: >> >> 1.Assume initial css reference count of hugetlb_cgroup is 1: >> 1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference >> count is 2 associated with 1 file_region. >> 1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference >> count is 3 associated with 2 file_region. >> 1.3 coalesce_file_region will coalesce these two file_regions into one. >> So css reference count is 3 associated with 1 file_region now. >> >> 2.Assume initial css reference count of hugetlb_cgroup is 1 again: >> 2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference >> count is 2 associated with 1 file_region. >> >> Therefore, we might have one file_region while holding one or more css >> reference counts. This inconsistency could lead to unbalanced css_put(). >> If we do css_put one by one (i.g. hole punch case), scenario 2 would put >> one more css reference. If we do css_put all together (i.g. truncate case), >> scenario 1 will leak one css reference. > > Sorry for the delay in replying. This is tricky code and I needed some quiet > time to study it. > That's fine. I was trying to catch more buggy case too. > I agree that the issue described exists. Can you describe what a user would > see in the above imbalance scenarios? What happens if we do one too many > css_put calls? What happens if we leak the reference and do not do the > required number of css_puts? > The imbalanced css_get/css_put would result in a non-zero reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup dir is removed __but__ associated resource is not freed. This might result in OOM or can not create a new hugetlb cgroup in a really busy workload finally. > The code changes look correct. > > I just wish this code was not so complicated. I think the private mapping > case could be simplified to only take a single css_ref per reserve map. Could you explain this more? It seems one reserve map already takes a single css_ref. And a hugepage outside reservation would take a single css_ref too. > However, for shared mappings we need to track each individual reservation > which adds the complexity. I can not think of a better way to do things. > I can't figure out one too. And the fix might make the code more complex. :( > Please update commit message with an explanation of what users might see > because of this issue and resubmit as a patch. > Will do. Thanks. > Thanks, > Many thanks for reply. :)