From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3BEEC07E9B for ; Mon, 19 Jul 2021 11:30:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F1BD610C7 for ; Mon, 19 Jul 2021 11:30:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F1BD610C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 130AA8D00F5; Mon, 19 Jul 2021 07:30:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1080A8D00EC; Mon, 19 Jul 2021 07:30:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEA0E8D00F5; Mon, 19 Jul 2021 07:30:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id C42928D00EC for ; Mon, 19 Jul 2021 07:30:45 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6395718021512 for ; Mon, 19 Jul 2021 11:30:44 +0000 (UTC) X-FDA: 78379120008.34.F179773 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 6EB8470148FA for ; Mon, 19 Jul 2021 11:28:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626694122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zdWfaYFJgwjqEmB5BWPh9A+L9EizM7zeJPklZSFE/NY=; b=ILVlE82RQEUzqwUcZSwZHDW/8FzpEcRqTlP22WGRT7ohdCR4hA2MrDGoDKMBqnEQiwH5C4 bQa8Lr5eyUTR2jlwUgopw5VzvXekwTBY/HSpmp4IxowOYovFHT6/9E5Albe7JlO1RCpmCO KS1/QGLla179a7av355GIOpWdr4qKzo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-174-HriHMZ_AMFWGC4gj3Mhc2g-1; Mon, 19 Jul 2021 07:28:41 -0400 X-MC-Unique: HriHMZ_AMFWGC4gj3Mhc2g-1 Received: by mail-wm1-f70.google.com with SMTP id j42-20020a05600c1c2ab0290238db573ab7so2357574wms.5 for ; Mon, 19 Jul 2021 04:28:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:references:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=zdWfaYFJgwjqEmB5BWPh9A+L9EizM7zeJPklZSFE/NY=; b=ArHD72G597emCh5mjS69u+4VP3QMn9DdZrj9SOcHpgjeKpWmWhU4oEMHyfoB9z+1w/ UcPY/E7qIyRya3tYDr9yeSlHsttRMjiKN/3DaiZZ7YVs+aSvhxbpWN/XDsS4BvIkSE4o TdMWDXIyyrCDgOUCM6Kp4OMdoCsoIU7bh8H24+BHvyrIw+ty9+cvwxr9CX/DxWm0ynI6 hZM7OkjRnHE0ZEUd6vD7fM8NEMmFijR9eBQwIfQCU0sImc6BMIymvs0eMPNwPOW2qf/6 nETygTrzP6SP64Ryt3O5tIeQp/WyDDJvGWmH+bBED4AcQPcIXZe32xleqQOH138zI1xl revw== X-Gm-Message-State: AOAM530xom9DbrIisj3/v890J1WRDBtnI5RO58ocrudLaPxFdGpbMIRI DEG4ZV9XralhdqHEFHlPtlgc3/pi7lPgbCQn0cW9XHvED+gbD3iM7Z8WgH9YQdDW/jbHNX37vfB JPkLhrPtabwI= X-Received: by 2002:adf:f949:: with SMTP id q9mr28148632wrr.178.1626694120289; Mon, 19 Jul 2021 04:28:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5eBko6UcDEM3HnRNlAFqNKgG1GX1Bx9LCzhwSh8WXYNCaTJRXTb3x82dnFDJMhHa/+16TYA== X-Received: by 2002:adf:f949:: with SMTP id q9mr28148607wrr.178.1626694120033; Mon, 19 Jul 2021 04:28:40 -0700 (PDT) Received: from ?IPv6:2003:d8:2f0a:7f00:fad7:3bc9:69d:31f? (p200300d82f0a7f00fad73bc9069d031f.dip0.t-ipconnect.de. [2003:d8:2f0a:7f00:fad7:3bc9:69d:31f]) by smtp.gmail.com with ESMTPSA id b16sm20296101wrw.46.2021.07.19.04.28.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Jul 2021 04:28:39 -0700 (PDT) From: David Hildenbrand To: Qi Zheng , akpm@linux-foundation.org, tglx@linutronix.de, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com References: <20210718043034.76431-1-zhengqi.arch@bytedance.com> <5ce5fb25-df1d-b807-8807-595b8a7bfc63@redhat.com> Organization: Red Hat Subject: Re: [PATCH 0/7] Free user PTE page table pages Message-ID: <089e710c-fb06-e731-6d50-7858d6b9ecdf@redhat.com> Date: Mon, 19 Jul 2021 13:28:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <5ce5fb25-df1d-b807-8807-595b8a7bfc63@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ILVlE82R; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6EB8470148FA X-Stat-Signature: ex3bnseefdsrophwqm7rpunifamtyhzo X-HE-Tag: 1626694123-15879 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.07.21 09:34, David Hildenbrand wrote: > On 18.07.21 06:30, Qi Zheng wrote: >> Hi, >> >> This patch series aims to free user PTE page table pages when all PTE = entries >> are empty. >> >> The beginning of this story is that some malloc libraries(e.g. jemallo= c or >> tcmalloc) usually allocate the amount of VAs by mmap() and do not unma= p those VAs. >> They will use madvise(MADV_DONTNEED) to free physical memory if they w= ant. >> But the page tables do not be freed by madvise(), so it can produce ma= ny >> page tables when the process touches an enormous virtual address space= . >=20 > ... did you see that I am actually looking into this? >=20 > https://lkml.kernel.org/r/bae8b967-c206-819d-774c-f57b94c4b362@redhat.c= om >=20 > and have already spent a significant time on it as part of my research, > which is *really* unfortunate and makes me quite frustrated at the > beginning of the week alreadty ... >=20 > Ripping out page tables is quite difficult, as we have to stop all page > table walkers from touching it, including the fast_gup, rmap and page > faults. This usually involves taking the mmap lock in write. My approac= h > does page table reclaim asynchronously from another thread and do not > rely on reference counts. FWIW, I had a quick peek and I like the simplistic approach using=20 reference counting, although it seems to come with a price. By hooking=20 using pte_alloc_get_map_lock() instead of pte_alloc_map_lock, we can=20 handle quite some cases easily. There are cases where we might immediately see a reuse after discarding=20 memory (especially, with virtio-balloon free page reporting), in which=20 case it's suboptimal to immediately discard instead of waiting a bit if=20 there is a reuse. However, the performance impact seems to be=20 comparatively small. I do wonder if the 1% overhead you're seeing is actually because of=20 allcoating/freeing or because of the reference count handling on some=20 hot paths. I'm primarily looking into asynchronous reclaim, because it somewhat=20 makes sense to only reclaim (+ pay a cost) when there is really need to=20 reclaim memory -- similar to our shrinker infrastructure. --=20 Thanks, David / dhildenb