linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Zi Yan" <zi.yan@cs.rutgers.edu>
To: "Peter Xu" <peterx@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Naoya Horiguchi" <n-horiguchi@ah.jp.nec.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"Konstantin Khlebnikov" <khlebnikov@yandex-team.ru>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] mm: hugepage: mark splitted page dirty when needed
Date: Wed, 05 Sep 2018 08:49:20 -0400	[thread overview]
Message-ID: <BB56C67D-BDA0-4C14-B787-77504EC989C6@cs.rutgers.edu> (raw)
In-Reply-To: <20180905073037.GA23021@xz-x1>

[-- Attachment #1: Type: text/plain, Size: 3112 bytes --]

On 5 Sep 2018, at 3:30, Peter Xu wrote:

> On Tue, Sep 04, 2018 at 10:00:28AM -0400, Zi Yan wrote:
>> On 4 Sep 2018, at 4:01, Kirill A. Shutemov wrote:
>>
>>> On Tue, Sep 04, 2018 at 03:55:10PM +0800, Peter Xu wrote:
>>>> When splitting a huge page, we should set all small pages as dirty if
>>>> the original huge page has the dirty bit set before.  Otherwise we'll
>>>> lose the original dirty bit.
>>>
>>> We don't lose it. It got transfered to struct page flag:
>>>
>>> 	if (pmd_dirty(old_pmd))
>>> 		SetPageDirty(page);
>>>
>>
>> Plus, when split_huge_page_to_list() splits a THP, its subroutine __split_huge_page()
>> propagates the dirty bit in the head page flag to all subpages in __split_huge_page_tail().
>
> Hi, Kirill, Zi,
>
> Thanks for your responses!
>
> Though in my test the huge page seems to be splitted not by
> split_huge_page_to_list() but by explicit calls to
> change_protection().  The stack looks like this (again, this is a
> customized kernel, and I added an explicit dump_stack() there):
>
>   kernel:  dump_stack+0x5c/0x7b
>   kernel:  __split_huge_pmd+0x192/0xdc0
>   kernel:  ? update_load_avg+0x8b/0x550
>   kernel:  ? update_load_avg+0x8b/0x550
>   kernel:  ? account_entity_enqueue+0xc5/0xf0
>   kernel:  ? enqueue_entity+0x112/0x650
>   kernel:  change_protection+0x3a2/0xab0
>   kernel:  mwriteprotect_range+0xdd/0x110
>   kernel:  userfaultfd_ioctl+0x50b/0x1210
>   kernel:  ? do_futex+0x2cf/0xb20
>   kernel:  ? tty_write+0x1d2/0x2f0
>   kernel:  ? do_vfs_ioctl+0x9f/0x610
>   kernel:  do_vfs_ioctl+0x9f/0x610
>   kernel:  ? __x64_sys_futex+0x88/0x180
>   kernel:  ksys_ioctl+0x70/0x80
>   kernel:  __x64_sys_ioctl+0x16/0x20
>   kernel:  do_syscall_64+0x55/0x150
>   kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> At the very time the userspace is sending an UFFDIO_WRITEPROTECT ioctl
> to kernel space, which is handled by mwriteprotect_range().  In case
> you'd like to refer to the kernel, it's basically this one from
> Andrea's (with very trivial changes):
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git userfault
>
> So... do we have two paths to split the huge pages separately?
>
> Another (possibly very naive) question is: could any of you hint me
> how the page dirty bit is finally applied to the PTEs?  These two
> dirty flags confused me for a few days already (the SetPageDirty() one
> which sets the page dirty flag, and the pte_mkdirty() which sets that
> onto the real PTEs).

change_protection() only causes splitting a PMD entry into multiple PTEs
but not the physical compound page, so my answer does not apply to your case.
It is unclear how the dirty bit makes your QEMU get a SIGBUS. I think you
need to describe your problem with more details.

AFAIK, the PageDirty bit will not apply back to any PTEs. So for your case,
when reporting a page’s dirty bit information, some function in the kernel only checks
the PTE’s dirty bit but not the dirty bit in the struct page flags, which
might provide a wrong answer.


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 516 bytes --]

  reply	other threads:[~2018-09-05 12:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04  7:55 [PATCH] mm: hugepage: mark splitted page dirty when needed Peter Xu
2018-09-04  8:01 ` Kirill A. Shutemov
2018-09-04 14:00   ` Zi Yan
2018-09-05  7:30     ` Peter Xu
2018-09-05 12:49       ` Zi Yan [this message]
2018-09-06 11:43         ` Peter Xu
2018-09-05 12:55       ` Kirill A. Shutemov
2018-09-06 11:39         ` Peter Xu
2018-09-06 14:08           ` Kirill A. Shutemov
2018-09-07  4:35             ` Peter Xu
2018-09-07 17:54               ` Jerome Glisse
2018-09-10  4:07                 ` Peter Xu
2018-09-06 14:17           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BB56C67D-BDA0-4C14-B787-77504EC989C6@cs.rutgers.edu \
    --to=zi.yan@cs.rutgers.edu \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=jglisse@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=peterx@redhat.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).