linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Gofman <pgofman@codeweavers.com>
To: "Michał Mirosław" <emmir@google.com>,
	"Muhammad Usama Anjum" <usama.anjum@collabora.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Michał Mirosław" <mirq-linux@rere.qmqm.pl>,
	"Andrei Vagin" <avagin@gmail.com>,
	"Danylo Mocherniuk" <mdanylo@google.com>,
	"Alex Sierra" <alex.sierra@amd.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Christian Brauner" <brauner@kernel.org>,
	"Cyrill Gorcunov" <gorcunov@gmail.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Gustavo A . R . Silva" <gustavoars@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Nadav Amit" <namit@vmware.com>,
	"Pasha Tatashin" <pasha.tatashin@soleen.com>,
	"Peter Xu" <peterx@redhat.com>, "Shuah Khan" <shuah@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Yang Shi" <shy828301@gmail.com>,
	"Yun Zhou" <yun.zhou@windriver.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	kernel@collabora.com
Subject: Re: [v3] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
Date: Wed, 26 Jul 2023 17:06:02 -0600	[thread overview]
Message-ID: <94c6b665-bbc2-5030-f9b1-d933791008b8@codeweavers.com> (raw)
In-Reply-To: <CABb0KFGQ_HbD+MNwKCcE+6D50XhJxpx0M0dRiC-EVwEXPv+4XA@mail.gmail.com>

Hello Michał,

     I was looking into that from the Wine point of view and did a bit 
of testing, so will try to answer the question cited below.

     Without Windows large pages I guess the only way to make this work 
correctly is to disable THP with madvise(MADV_NOHUGEPAGE) on the memory 
ranges allocated with MEM_WRITE_WATCH, as the memory changes should not 
only be reported but also tracked with 4k page granularity as Windows 
applications expect.

     Currently we don't implement MEM_LARGE_PAGES flag support in Wine 
(while of course might want to do that in the future). On Windows using 
this flag requires special permissions and implies more than just using 
huge pages under the hood but also, in particular, locking pages in 
memory. I'd expect that support to be extended in Windows though in the 
future in some way. WRT write watches, the range is watched with large 
page granularity. GetWriteWatch lpdwGranularity output parameter returns 
the value of "large page minimum" (returned by GetLargePageMinimum) and 
the returned addresses correspond to those large pages. I suppose to 
implement that on top of Linux huge pages we'd need a way to control 
huge pages allocation at the first place, i. e., a way to enforce the 
specified size for the huge pages for the memory ranged being mapped. 
Without that I am afraid the only way to correctly implement that is to 
still disable THP on the range and only adjust our API output so that 
matches expected.

     Not related to the question, but without any relation to Wine and 
Windows API current way of dealing with THP in the API design looks a 
bit not straightforward to me. In a sense that transparent huge pages 
will appear not so transparent when it comes to dirty pages tracking. If 
I understand correctly, the application which allocated a reasonably big 
memory area and didn't use madvise(MADV_NOHUGEPAGE) might end up with a 
whole range being a single page and getting dirtified as a whole, which 
may likely void app's optimization based on changed memory tracking. Not 
that I know an ideal way out of this, maybe it is a matter of having THP 
disabled by default on watched ranges or clearly warning about this 
caveat in documentation?

Regards,
     Paul.


On 7/26/23 15:10, Michał Mirosław wrote:
>
>>> 3. BTW, One of the uses is the GetWriteWatch and I wonder how it
>>> behaves on HugeTLB (MEM_LARGE_PAGES allocation)? Shouldn't it return a
>>> list of huge pages and write *lpdwGranularity = HPAGE_SIZE?
>> Wine/Proton doesn't used hugetlb by default. Hugetlb isn't enabled by
>> default on Debian as well. For GetWriteWatch() we don't care about the
>> hugetlb at all. We have added hugetlb's implementation to complete the
>> feature and leave out something.
> How is GetWriteWatch() working when passed a VirtualAlloc(...,
> MEM_LARGE_PAGES|MEM_WRITE_WATCH...)-allocated range? Does it still
> report 4K pages?
> This is only a problem when using max_pages: a hugetlb range might
> need counting and reporting huge pages and not 4K parts.
>
> Best Regards
> Michał Mirosław



  reply	other threads:[~2023-07-26 23:23 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-13 10:14 [PATCH v25 0/5] Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 1/5] userfaultfd: UFFD_FEATURE_WP_ASYNC Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-07-17 17:26   ` Andrei Vagin
2023-07-18  8:18     ` Muhammad Usama Anjum
2023-07-18 16:08       ` Andrei Vagin
2023-07-18 16:27         ` Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 3/5] tools headers UAPI: Update linux/fs.h with the kernel sources Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 4/5] mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 5/5] selftests: mm: add pagemap ioctl tests Muhammad Usama Anjum
     [not found]   ` <a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.1689881078.git.mirq-linux@rere.qmqm.pl>
2023-07-20 19:50     ` fs/proc/task_mmu: Implement IOCTL for efficient page table scanning Michał Mirosław
2023-07-20 21:12     ` kernel test robot
2023-07-21  2:56     ` kernel test robot
2023-07-21  4:27     ` Muhammad Usama Anjum
2023-07-21 14:49       ` Andrei Vagin
2023-07-21  5:43     ` kernel test robot
2023-07-21  7:18     ` kernel test robot
2023-07-21 10:48     ` Muhammad Usama Anjum
2023-07-21 11:23       ` Michał Mirosław
2023-07-21 17:50         ` Muhammad Usama Anjum
2023-07-22  0:22           ` Michał Mirosław
2023-07-22  0:24           ` [v2] " Michał Mirosław
2023-07-22 13:55             ` kernel test robot
2023-07-22 14:05             ` kernel test robot
2023-07-24 14:04             ` Muhammad Usama Anjum
2023-07-24 14:38               ` Michał Mirosław
2023-07-24 15:21                 ` Muhammad Usama Anjum
2023-07-24 16:10                   ` Michał Mirosław
2023-07-25  7:23                     ` Muhammad Usama Anjum
2023-07-25  9:09                       ` Muhammad Usama Anjum
2023-07-25  9:11                         ` [v3] " Muhammad Usama Anjum
2023-07-25 18:05                           ` Michał Mirosław
2023-07-26  8:34                             ` Muhammad Usama Anjum
2023-07-26 21:10                               ` Michał Mirosław
2023-07-26 23:06                                 ` Paul Gofman [this message]
2023-07-27 11:18                                   ` Michał Mirosław
2023-07-27 11:21                                     ` Michał Mirosław
2023-07-27 17:15                                     ` Paul Gofman
2023-07-27  8:03                                 ` Muhammad Usama Anjum
2023-07-27 11:26                                   ` Michał Mirosław
2023-07-27 11:31                                     ` Muhammad Usama Anjum
2023-07-21 11:29       ` Michał Mirosław
2023-07-21 17:51         ` Muhammad Usama Anjum
2023-08-26 13:07     ` kernel test robot
2023-07-18 16:05 ` [PATCH v25 0/5] Implement IOCTL to get and optionally clear info about PTEs Rogerio Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94c6b665-bbc2-5030-f9b1-d933791008b8@codeweavers.com \
    --to=pgofman@codeweavers.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=avagin@gmail.com \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=emmir@google.com \
    --cc=gorcunov@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavoars@kernel.org \
    --cc=kernel@collabora.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdanylo@google.com \
    --cc=mirq-linux@rere.qmqm.pl \
    --cc=namit@vmware.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=shuah@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=usama.anjum@collabora.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yun.zhou@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).