From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: akpm@linux-foundation.org, tglx@linutronix.de,
kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com,
david@redhat.com, jgg@nvidia.com, tj@kernel.org,
dennis@kernel.org, ming.lei@redhat.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, songmuchun@bytedance.com,
zhouchengming@bytedance.com
Subject: Re: [RFC PATCH 18/18] Documentation: add document for pte_ref
Date: Sat, 30 Apr 2022 21:32:01 +0800 [thread overview]
Message-ID: <8e4d1e8a-6bf7-df07-2cc6-01d840db2757@bytedance.com> (raw)
In-Reply-To: <Ym03Z7FlgcCpwXCi@debian.me>
On 2022/4/30 9:19 PM, Bagas Sanjaya wrote:
> Hi Qi,
>
> On Fri, Apr 29, 2022 at 09:35:52PM +0800, Qi Zheng wrote:
>> +Now in order to pursue high performance, applications mostly use some
>> +high-performance user-mode memory allocators, such as jemalloc or tcmalloc.
>> +These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release
>> +physical memory for the following reasons::
>> +
>> + First of all, we should hold as few write locks of mmap_lock as possible,
>> + since the mmap_lock semaphore has long been a contention point in the
>> + memory management subsystem. The mmap()/munmap() hold the write lock, and
>> + the madvise(MADV_DONTNEED or MADV_FREE) hold the read lock, so using
>> + madvise() instead of munmap() to released physical memory can reduce the
>> + competition of the mmap_lock.
>> +
>> + Secondly, after using madvise() to release physical memory, there is no
>> + need to build vma and allocate page tables again when accessing the same
>> + virtual address again, which can also save some time.
>> +
>
> I think we can use enumerated list, like below:
Thanks for your review, LGTM, will do.
>
> -- >8 --
>
> diff --git a/Documentation/vm/pte_ref.rst b/Documentation/vm/pte_ref.rst
> index 0ac1e5a408d7c6..67b18e74fcb367 100644
> --- a/Documentation/vm/pte_ref.rst
> +++ b/Documentation/vm/pte_ref.rst
> @@ -10,18 +10,18 @@ Preface
> Now in order to pursue high performance, applications mostly use some
> high-performance user-mode memory allocators, such as jemalloc or tcmalloc.
> These memory allocators use madvise(MADV_DONTNEED or MADV_FREE) to release
> -physical memory for the following reasons::
> -
> - First of all, we should hold as few write locks of mmap_lock as possible,
> - since the mmap_lock semaphore has long been a contention point in the
> - memory management subsystem. The mmap()/munmap() hold the write lock, and
> - the madvise(MADV_DONTNEED or MADV_FREE) hold the read lock, so using
> - madvise() instead of munmap() to released physical memory can reduce the
> - competition of the mmap_lock.
> -
> - Secondly, after using madvise() to release physical memory, there is no
> - need to build vma and allocate page tables again when accessing the same
> - virtual address again, which can also save some time.
> +physical memory for the following reasons:
> +
> +1. We should hold as few write locks of mmap_lock as possible,
> + since the mmap_lock semaphore has long been a contention point in the
> + memory management subsystem. The mmap()/munmap() hold the write lock, and
> + the madvise(MADV_DONTNEED or MADV_FREE) hold the read lock, so using
> + madvise() instead of munmap() to released physical memory can reduce the
> + competition of the mmap_lock.
> +
> +2. After using madvise() to release physical memory, there is no
> + need to build vma and allocate page tables again when accessing the same
> + virtual address again, which can also save some time.
>
> The following is the largest user PTE page table memory that can be
> allocated by a single user process in a 32-bit and a 64-bit system.
>
>> +The following is the largest user PTE page table memory that can be
>> +allocated by a single user process in a 32-bit and a 64-bit system.
>> +
>
> We can say "assuming 4K page size" here,
>
>> ++---------------------------+--------+---------+
>> +| | 32-bit | 64-bit |
>> ++===========================+========+=========+
>> +| user PTE page table pages | 3 MiB | 512 GiB |
>> ++---------------------------+--------+---------+
>> +| user PMD page table pages | 3 KiB | 1 GiB |
>> ++---------------------------+--------+---------+
>> +
>> +(for 32-bit, take 3G user address space, 4K page size as an example;
>> + for 64-bit, take 48-bit address width, 4K page size as an example.)
>> +
>
> ... instead of here.
will do.
>
>> +There is also a lock-less scenario(such as fast GUP). Fortunately, we don't need
>> +to do any additional operations to ensure that the system is in order. Take fast
>> +GUP as an example::
>> +
>> + thread A thread B
>> + fast GUP madvise(MADV_DONTNEED)
>> + ======== ======================
>> +
>> + get_user_pages_fast_only()
>> + --> local_irq_save();
>> + call_rcu(pte_free_rcu)
>> + gup_pgd_range();
>> + local_irq_restore();
>> + /* do pte_free_rcu() */
>> +
>
> I see whitespace warning circa do pte_free_rcu() line above when
> applying this series.
will fix.
Thanks,
Qi
>
> Thanks.
>
--
Thanks,
Qi
next prev parent reply other threads:[~2022-04-30 13:32 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-29 13:35 [RFC PATCH 00/18] Try to free user PTE page table pages Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 01/18] x86/mm/encrypt: add the missing pte_unmap() call Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 02/18] percpu_ref: make ref stable after percpu_ref_switch_to_atomic_sync() returns Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 03/18] percpu_ref: make percpu_ref_switch_lock per percpu_ref Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 04/18] mm: convert to use ptep_clear() in pte_clear_not_present_full() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 05/18] mm: split the related definitions of pte_offset_map_lock() into pgtable.h Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 06/18] mm: introduce CONFIG_FREE_USER_PTE Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 07/18] mm: add pte_to_page() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 08/18] mm: introduce percpu_ref for user PTE page table page Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 09/18] pte_ref: add pte_tryget() and {__,}pte_put() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 10/18] mm: add pte_tryget_map{_lock}() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 11/18] mm: convert to use pte_tryget_map_lock() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 12/18] mm: convert to use pte_tryget_map() Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 13/18] mm: add try_to_free_user_pte() helper Qi Zheng
2022-04-30 13:35 ` Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 14/18] mm: use try_to_free_user_pte() in MADV_DONTNEED case Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 15/18] mm: use try_to_free_user_pte() in MADV_FREE case Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 16/18] pte_ref: add track_pte_{set, clear}() helper Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 17/18] x86/mm: add x86_64 support for pte_ref Qi Zheng
2022-04-29 13:35 ` [RFC PATCH 18/18] Documentation: add document " Qi Zheng
2022-04-30 13:19 ` Bagas Sanjaya
2022-04-30 13:32 ` Qi Zheng [this message]
2022-05-17 8:30 ` [RFC PATCH 00/18] Try to free user PTE page table pages Qi Zheng
2022-05-18 14:51 ` David Hildenbrand
2022-05-18 14:56 ` Matthew Wilcox
2022-05-19 4:03 ` Qi Zheng
2022-05-19 3:58 ` Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e4d1e8a-6bf7-df07-2cc6-01d840db2757@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=bagasdotme@gmail.com \
--cc=david@redhat.com \
--cc=dennis@kernel.org \
--cc=jgg@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mika.penttila@nextfour.com \
--cc=ming.lei@redhat.com \
--cc=songmuchun@bytedance.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).