From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D9BC433DB for ; Fri, 8 Jan 2021 08:43:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CFE1E207C5 for ; Fri, 8 Jan 2021 08:43:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CFE1E207C5 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1FA18D016D; Fri, 8 Jan 2021 03:43:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ECF888D0156; Fri, 8 Jan 2021 03:43:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE6938D016D; Fri, 8 Jan 2021 03:43:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id CA5BF8D0156 for ; Fri, 8 Jan 2021 03:43:48 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 865F2181AC9C6 for ; Fri, 8 Jan 2021 08:43:48 +0000 (UTC) X-FDA: 77681969736.22.peace50_3804b2f274f1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 64B7318038E60 for ; Fri, 8 Jan 2021 08:43:48 +0000 (UTC) X-HE-Tag: peace50_3804b2f274f1 X-Filterd-Recvd-Size: 5554 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Jan 2021 08:43:47 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1610095426; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+IIxN0LPQPAe8yS5/PAw+KxjC1DAdtWn6setESfa8RY=; b=uovtPFcM0yJKYnRKekCnL570Mz5vWQbkKVbxHnmzQWRft9A8fgkLbEOChRtt1REimI2IDL QAN7ZD93fIM0Zog9nVuy/UdSTneUwhAGOnR/2sbJ9Y0HZJ0EN24j9FDH+9JQuIUECYcVE+ aaa19VHBKS7oj9/zSYg/CS9QcVqCYuo= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8D964ACAF; Fri, 8 Jan 2021 08:43:46 +0000 (UTC) Date: Fri, 8 Jan 2021 09:43:30 +0100 From: Michal Hocko To: Muchun Song Cc: Mike Kravetz , Andrew Morton , Naoya Horiguchi , Andi Kleen , Linux Memory Management List , LKML Subject: Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Message-ID: <20210108084330.GW13207@dhcp22.suse.cz> References: <20210106165632.GT13207@dhcp22.suse.cz> <20210107084146.GD13207@dhcp22.suse.cz> <20210107111827.GG13207@dhcp22.suse.cz> <20210107123854.GJ13207@dhcp22.suse.cz> <20210107141130.GL13207@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 07-01-21 23:11:22, Muchun Song wrote: > On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko wrote: > > > > On Thu 07-01-21 20:59:33, Muchun Song wrote: > > > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko wrote: > > [...] > > > > Right. Can we simply back off in the dissolving path when ref count is > > > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario > > > > when the all above is true and the page is not being freed? > > > > > > The list_empty(&page->lru) may always return false. > > > The page before freeing is on the active list > > > (hstate->hugepage_activelist).Then it is on the free list > > > after freeing. So list_empty(&page->lru) is always false. > > > > The point I was trying to make is that the page has to be enqueued when > > it is dissolved and freed. If the page is not enqueued then something > > racing. But then I have realized that this is not a great check to > > detect the race because pages are going to be released to buddy > > allocator and that will reuse page->lru again. So scratch that and sorry > > for the detour. > > > > But that made me think some more and one way to reliably detect the race > > should be PageHuge() check in the freeing path. This is what dissolve > > path does already. PageHuge becomes false during update_and_free_page() > > while holding the hugetlb_lock. So can we use that? > > It may make the thing complex. Apart from freeing it to the > buddy allocator, free_huge_page also does something else for > us. If we detect the race in the freeing path, if it is not a HugeTLB > page, the freeing path just returns. We also should move those > things to the dissolve path. Right? Not sure what you mean. Dissolving is a subset of the freeing path. It effectivelly only implements the update_and_free_page branch (aka free to buddy). It skips some of the existing steps because it believes it sees a freed page. But as you have pointed out this is racy and I strongly suspect it is simply wrong in those assumptions. E.g. hugetlb cgroup accounting can get wrong right? The more I think about it the more I think that dissolving path should simply have a common helper with __free_huge_page that would release the huge page to the allocator. The only thing that should be specific to the dissolving path is HWpoison handling. It shouldn't be playing with accounting and what not. Btw the HWpoison handling is suspicious as well, a lost race would mean this doesn't happen. But maybe there is some fixup handled later on... > But I find a tricky problem to solve. See free_huge_page(). > If we are in non-task context, we should schedule a work > to free the page. We reuse the page->mapping. If the page > is already freed by the dissolve path. We should not touch > the page->mapping. So we need to check PageHuge(). > The check and llist_add() should be protected by > hugetlb_lock. But we cannot do that. Right? If dissolve > happens after it is linked to the list. We also should > remove it from the list (hpage_freelist). It seems to make > the thing more complex. I am not sure I follow you here but yes PageHuge under hugetlb_lock should be the reliable way to check for the race. I am not sure why we really need to care about mapping or other state. -- Michal Hocko SUSE Labs