From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D8EFC433EF for ; Wed, 2 Feb 2022 08:14:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0C2C8D00F4; Wed, 2 Feb 2022 03:14:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A94C78D00C9; Wed, 2 Feb 2022 03:14:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 935778D00F4; Wed, 2 Feb 2022 03:14:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 805FA8D00C9 for ; Wed, 2 Feb 2022 03:14:28 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3F0AD97917 for ; Wed, 2 Feb 2022 08:14:28 +0000 (UTC) X-FDA: 79097127816.10.7800A3C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id B3F0AC0003 for ; Wed, 2 Feb 2022 08:14:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643789667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2Rq6N4dNe/uLrGvcKxFQPZwRPXOX6fDs2MMCU1MQ9Tc=; b=iU+kiWUz3xJvsOUuv6/lUXLv0BcFlH84i1j1qbvUq6Xwh7pRrEi/hO4kdU2xmkh7B/GOIB 9CDwnSuVFhSfBC+rxwmPVHpX9EWauBLV5QlhzsUwX7Ppu4r4aoJOPiQCNsryttwGYhcc6t jOY4EIl3Ghqi0DixPzV3kGaQKDEHqMo= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-526-1yU6F9jINyqXGC9L1_hN-A-1; Wed, 02 Feb 2022 03:14:23 -0500 X-MC-Unique: 1yU6F9jINyqXGC9L1_hN-A-1 Received: by mail-ed1-f70.google.com with SMTP id q10-20020a5085ca000000b0040e3ecf0ec2so2150156edh.14 for ; Wed, 02 Feb 2022 00:14:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=2Rq6N4dNe/uLrGvcKxFQPZwRPXOX6fDs2MMCU1MQ9Tc=; b=3HqpYIPv28LjlyX42wJARFu7olDpv6j0Sd5pGUNjB+MdOVuZPRU30Qc8Bw7afGQSWi B9jI6EIXkXSq3u+HkMLKxyTeI1Ct3lFu2k48lxHKph/XI+Vz6rIJgoZVA21YxmYqX69K +rCs96k9el89UvdRMxdDRrZPeLsq+nTWMvqJC9AJHKRhg5uXpzRBV/I/BhixnLKXO1cz qpgCJ7YJgHGLbd8T+yRc8JfcYKKtIrM0bdUwwUeHUKxOve5MqO44Cg0p884jTggABLwo OGpHIqkQqnAqhyodnHqlXd+Dc5ZcAsZlwcYAEuHpg/B4yL4/ekPmPF5Yo+Ne4G+py9uv K+wA== X-Gm-Message-State: AOAM533XIfwtzbAntu7E7bPWpH+fYRBI7V+FVIdCveMtsgjQN/djXPpS 9zh56NR8fO8bwPZWJjeIb8za/Kka8Opd0S1q1FJrcoJMOYDzXphWi2AtqB38qbWZqH/vEPjrwD1 HfKYpfQQNAIU= X-Received: by 2002:a17:906:9756:: with SMTP id o22mr15847193ejy.448.1643789662477; Wed, 02 Feb 2022 00:14:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJyqH3rVTg5DRRB8rQxbvHzgYDMg49Oc4tMCYxG37o/ZZRNCcg4Pgudybt6DPaMHv/Zy74LpvQ== X-Received: by 2002:a17:906:9756:: with SMTP id o22mr15847174ejy.448.1643789662268; Wed, 02 Feb 2022 00:14:22 -0800 (PST) Received: from ?IPV6:2003:cb:c709:f800:a55c:e484:3cd9:3632? (p200300cbc709f800a55ce4843cd93632.dip0.t-ipconnect.de. [2003:cb:c709:f800:a55c:e484:3cd9:3632]) by smtp.gmail.com with ESMTPSA id d2sm15370010ejw.79.2022.02.02.00.14.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Feb 2022 00:14:21 -0800 (PST) Message-ID: <20571829-9d3d-0b48-817c-b6b15565f651@redhat.com> Date: Wed, 2 Feb 2022 09:14:19 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Naoya Horiguchi , Axel Rasmussen , Mina Almasry , Michal Hocko , Peter Xu , Andrea Arcangeli , Shuah Khan , Andrew Morton References: <20220202014034.182008-1-mike.kravetz@oracle.com> <20220202014034.182008-2-mike.kravetz@oracle.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings In-Reply-To: <20220202014034.182008-2-mike.kravetz@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: oj3xgh9o1qe8nkxu1cd9m797z5rsxh9o X-Rspam-User: nil Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iU+kiWUz; spf=none (imf22.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B3F0AC0003 X-HE-Tag: 1643789667-510771 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.02.22 02:40, Mike Kravetz wrote: > MADV_DONTNEED is currently disabled for hugetlb mappings. This > certainly makes sense in shared file mappings as the pagecache maintains > a reference to the page and it will never be freed. However, it could > be useful to unmap and free pages in private mappings. > > The only thing preventing MADV_DONTNEED from working on hugetlb mappings > is a check in can_madv_lru_vma(). To allow support for hugetlb mappings > create and use a new routine madvise_dontneed_free_valid_vma() that will > allow hugetlb mappings. Also, before calling zap_page_range in the > DONTNEED case align start and size to huge page size for hugetlb vmas. > madvise only requires PAGE_SIZE alignment, but the hugetlb unmap routine > requires huge page size alignment. > > Signed-off-by: Mike Kravetz > --- > mm/madvise.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 5604064df464..7ae891e030a4 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -796,10 +796,30 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, > static long madvise_dontneed_single_vma(struct vm_area_struct *vma, > unsigned long start, unsigned long end) > { > + /* > + * start and size (end - start) must be huge page size aligned > + * for hugetlb vmas. > + */ > + if (is_vm_hugetlb_page(vma)) { > + struct hstate *h = hstate_vma(vma); > + > + start = ALIGN_DOWN(start, huge_page_size(h)); > + end = ALIGN(end, huge_page_size(h)); So you effectively extend the range silently. IIUC, if someone would zap a 4k range you would implicitly zap a whole 2M page and effectively zero out more data than requested. Looking at do_madvise(), we: (1) reject start addresses that are not page-aligned (2) shrink lengths that are not page-aligned and refuse if it turns 0 The man page documents (1) but doesn't really document (2). Naturally I'd have assume that we apply the same logic to huge page sizes and documenting it in the man page accordingly. Why did you decide to extend the range? I'd assume MADV_REMOVE behaves like FALLOC_FL_PUNCH_HOLE: "Within the specified range, partial filesystem blocks are zeroed, and whole filesystem blocks are removed from the file. After a successful call, subsequent reads from this range will return zeros." So we don't "discard more than requested". I see the following possible alternatives: (a) Fail if the range is not aligned -> Clear semantics (b) Fail if the start is not aligned, shrink the end if required -> Same rules as for PAGE_SIZE (c) Zero out the requested part -> Same semantics as FALLOC_FL_PUNCH_HOLE. My preference would be a), properly documenting it in the man page. -- Thanks, David / dhildenb