From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4631CC2BBCF for ; Wed, 16 Dec 2020 22:52:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C386423716 for ; Wed, 16 Dec 2020 22:52:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C386423716 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E9A9D6B0036; Wed, 16 Dec 2020 17:52:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E24286B005D; Wed, 16 Dec 2020 17:52:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CECA56B0068; Wed, 16 Dec 2020 17:52:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B5A656B0036 for ; Wed, 16 Dec 2020 17:52:14 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7D44F8249980 for ; Wed, 16 Dec 2020 22:52:14 +0000 (UTC) X-FDA: 77600645388.11.cork02_0e08de32742f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 59161180F8B80 for ; Wed, 16 Dec 2020 22:52:14 +0000 (UTC) X-HE-Tag: cork02_0e08de32742f X-Filterd-Recvd-Size: 6833 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Dec 2020 22:52:13 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0BGMnwJl082082; Wed, 16 Dec 2020 22:51:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=GWzfUJIU936fkGAX1MiiiDp/YR1yirB8QOBvoZtTBBk=; b=dSY3YZuNxSjZi3z5/7a8Jh1pLGdp8Chp+wAXG5EJLzjWD3Ptn8q96F2v50DN8jNyWFrw 2KRG5b21NK4g6KXCs/mpjGUriJt1z46DuXFoozsyqJEE9lv4cesPuNIOTLa8Grwp4vJi tbx3N8vYeBsj86AUsq+IN7fy40Eo7ruGsWYhdK+FR5UZ5+KtIcNmjVOfE5uJGKzS/1jR VrkEuV3vp92CtJns7Lo0cnscysFXPPqQL2oNQvi2FiReuskxmBmioOW9mFWipJYB9KJ4 OIcO1hf7RhFUn3/AW4KbQzcTbcBOZfdjcx34MopbTKo9M2XIMfBwEd252XgGxrz2DKB+ qA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 35cntmas5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 16 Dec 2020 22:51:53 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0BGMjHZW062482; Wed, 16 Dec 2020 22:49:53 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 35d7eq52x2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Dec 2020 22:49:53 +0000 Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0BGMndJP001074; Wed, 16 Dec 2020 22:49:39 GMT Received: from [192.168.2.112] (/50.38.35.18) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 16 Dec 2020 14:49:38 -0800 Subject: Re: [PATCH v9 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page To: Oscar Salvador Cc: Muchun Song , corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: <20201213154534.54826-1-songmuchun@bytedance.com> <20201213154534.54826-4-songmuchun@bytedance.com> <5936a766-505a-eab0-42a6-59aab2585880@oracle.com> <20201216222549.GC3207@localhost.localdomain> From: Mike Kravetz Message-ID: <49f6a0f1-c6fa-4642-2db0-69f090e8a392@oracle.com> Date: Wed, 16 Dec 2020 14:49:36 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <20201216222549.GC3207@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9837 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 bulkscore=0 suspectscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012160141 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9837 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 impostorscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012160142 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/16/20 2:25 PM, Oscar Salvador wrote: > On Wed, Dec 16, 2020 at 02:08:30PM -0800, Mike Kravetz wrote: >>> + * vmemmap_rmap_walk - walk vmemmap page table >>> + >>> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, >>> + unsigned long end, struct vmemmap_rmap_walk *walk) >>> +{ >>> + pte_t *pte; >>> + >>> + pte = pte_offset_kernel(pmd, addr); >>> + do { >>> + BUG_ON(pte_none(*pte)); >>> + >>> + if (!walk->reuse) >>> + walk->reuse = pte_page(pte[VMEMMAP_TAIL_PAGE_REUSE]); >> >> It may be just me, but I don't like the pte[-1] here. It certainly does work >> as designed because we want to remap all pages in the range to the page before >> the range (at offset -1). But, we do not really validate this 'reuse' page. >> There is the BUG_ON(pte_none(*pte)) as a sanity check, but we do nothing similar >> for pte[-1]. Based on the usage for HugeTLB pages, we can be confident that >> pte[-1] is actually a pte. In discussions with Oscar, you mentioned another >> possible use for these routines. > > Without giving it much of a thought, I guess we could duplicate the > BUG_ON for the pte outside the loop, and add a new one for pte[-1]. > Also, since walk->reuse seems to not change once it is set, we can take > it outside the loop? e.g: > > pte *pte; > > pte = pte_offset_kernel(pmd, addr); > BUG_ON(pte_none(*pte)); > BUG_ON(pte_none(pte[VMEMMAP_TAIL_PAGE_REUSE])); > walk->reuse = pte_page(pte[VMEMMAP_TAIL_PAGE_REUSE]); > do { > .... > } while... > > Or I am not sure whether we want to keep it inside the loop in case > future cases change walk->reuse during the operation. > But to be honest, I do not think it is realistic of all future possible > uses of this, so I would rather keep it simple for now. I was thinking about possibly passing the 'reuse' address as another parameter to vmemmap_remap_reuse(). We could add this addr to the vmemmap_rmap_walk struct and set walk->reuse when we get to the pte for that address. Of course this would imply that the addr would need to be part of the range. Ideally, we would walk the page table to get to the reuse page. My concern was not explicitly about adding the BUG_ON. In more general use, *pte could be the first entry on a pte page. And, then pte[-1] may not even be a pte. Again, I don't think this matters for the current HugeTLB use case. Just a little concerned if code is put to use for other purposes. -- Mike Kravetz