From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6C78C433E6 for ; Tue, 16 Feb 2021 08:15:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 97A5664D9A for ; Tue, 16 Feb 2021 08:15:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229803AbhBPIO4 (ORCPT ); Tue, 16 Feb 2021 03:14:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:59430 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbhBPIOx (ORCPT ); Tue, 16 Feb 2021 03:14:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613463206; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mrBI3gmDGE4IcqlEPbbLaR/5KOLeuQUWrbl5wiK6Nv8=; b=Pf0V6udfHXXGDunQ80myyXhHP0Hcn//ApuGHWaGGGxenr8T76TvDLHnCuNbyukz//1DQD5 zGHjuL9Y501bp28DSl+dfs1RMD7a6Uxw9ZSBO3eU56t7mcDs7FXdfO3og4pjyYLtmdwzFc rfjMYbl8YuN3FaufPuJZzw7GrFToNv0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-576-7mnurP3rNQGA5ySur9Yqbw-1; Tue, 16 Feb 2021 03:13:22 -0500 X-MC-Unique: 7mnurP3rNQGA5ySur9Yqbw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 02480189CD2E; Tue, 16 Feb 2021 08:13:18 +0000 (UTC) Received: from [10.36.114.70] (ovpn-114-70.ams2.redhat.com [10.36.114.70]) by smtp.corp.redhat.com (Postfix) with ESMTP id F0FB57216B; Tue, 16 Feb 2021 08:13:09 +0000 (UTC) Subject: Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page To: Michal Hocko , Muchun Song Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Joao Martins , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel References: From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <4f8664fb-0d65-b7d6-39d6-2ce5fc86623a@redhat.com> Date: Tue, 16 Feb 2021 09:13:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 15.02.21 20:02, Michal Hocko wrote: > On Tue 16-02-21 01:48:29, Muchun Song wrote: >> On Tue, Feb 16, 2021 at 12:28 AM Michal Hocko wrote: >>> >>> On Mon 15-02-21 23:36:49, Muchun Song wrote: >>> [...] >>>>> There shouldn't be any real reason why the memory allocation for >>>>> vmemmaps, or handling vmemmap in general, has to be done from within the >>>>> hugetlb lock and therefore requiring a non-sleeping semantic. All that >>>>> can be deferred to a more relaxed context. If you want to make a >>>> >>>> Yeah, you are right. We can put the freeing hugetlb routine to a >>>> workqueue. Just like I do in the previous version (before v13) patch. >>>> I will pick up these patches. >>> >>> I haven't seen your v13 and I will unlikely have time to revisit that >>> version. I just wanted to point out that the actual allocation doesn't >>> have to happen from under the spinlock. There are multiple ways to go >>> around that. Dropping the lock would be one of them. Preallocation >>> before the spin lock is taken is another. WQ is certainly an option but >>> I would take it as the last resort when other paths are not feasible. >>> >> >> "Dropping the lock" and "Preallocation before the spin lock" can limit >> the context of put_page to non-atomic context. I am not sure if there >> is a page puted somewhere under an atomic context. e.g. compaction. >> I am not an expert on this. > > Then do a due research or ask for a help from the MM community. Do > not just try to go around harder problems and somehow duct tape a > solution. I am sorry for sounding harsh here but this is a repetitive > pattern. > > Now to the merit. put_page can indeed be called from all sorts of > contexts. And it might be indeed impossible to guarantee that hugetlb > pages are never freed up from an atomic context. Requiring that would be > even hard to maintain longterm. There are ways around that, I believe, > though. > > The most simple one that I can think of right now would be using > in_atomic() rather than in_task() check free_huge_page. IIRC recent > changes would allow in_atomic to be reliable also on !PREEMPT kernels > (via RCU tree, not sure where this stands right now). That would make > __free_huge_page always run in a non-atomic context which sounds like an > easy enough solution. > Another way would be to keep a pool of ready pages to use in case of > GFP_NOWAIT allocation fails and have means to keep that pool replenished > when needed. Would it be feasible to reused parts of the freed page in > the worst case? As already discussed, this is only possible when the huge page does not reside on ZONE_MOVABLE/CMA. In addition, we can no longer form a huge page at that memory location ever. -- Thanks, David / dhildenb