All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vineet Gupta <vgupta@synopsys.com>,
	Russell King <linux@armlinux.org.uk>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ralf Baechle <ralf@linux-mips.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2 3/3] mm: Use updated pmdp_invalidate() inteface to track dirty/accessed bits
Date: Fri, 16 Jun 2017 16:19:08 +0300	[thread overview]
Message-ID: <20170616131908.3rxtm2w73gdfex4a@node.shutemov.name> (raw)
In-Reply-To: <20170616030250.GA27637@bbox>

On Fri, Jun 16, 2017 at 12:02:50PM +0900, Minchan Kim wrote:
> Hello,
> 
> On Thu, Jun 15, 2017 at 05:52:24PM +0300, Kirill A. Shutemov wrote:
> > This patch uses modifed pmdp_invalidate(), that return previous value of pmd,
> > to transfer dirty and accessed bits.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  fs/proc/task_mmu.c |  8 ++++----
> >  mm/huge_memory.c   | 29 ++++++++++++-----------------
> >  2 files changed, 16 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index f0c8b33d99b1..f2fc1ef5bba2 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -906,13 +906,13 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
> >  static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
> >  		unsigned long addr, pmd_t *pmdp)
> >  {
> > -	pmd_t pmd = *pmdp;
> > +	pmd_t old, pmd = *pmdp;
> >  
> >  	/* See comment in change_huge_pmd() */
> > -	pmdp_invalidate(vma, addr, pmdp);
> > -	if (pmd_dirty(*pmdp))
> > +	old = pmdp_invalidate(vma, addr, pmdp);
> > +	if (pmd_dirty(old))
> >  		pmd = pmd_mkdirty(pmd);
> > -	if (pmd_young(*pmdp))
> > +	if (pmd_young(old))
> >  		pmd = pmd_mkyoung(pmd);
> >  
> >  	pmd = pmd_wrprotect(pmd);
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index a84909cf20d3..0433e73531bf 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1777,17 +1777,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  	 * pmdp_invalidate() is required to make sure we don't miss
> >  	 * dirty/young flags set by hardware.
> >  	 */
> > -	entry = *pmd;
> > -	pmdp_invalidate(vma, addr, pmd);
> > -
> > -	/*
> > -	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
> > -	 * corrupt them.
> > -	 */
> > -	if (pmd_dirty(*pmd))
> > -		entry = pmd_mkdirty(entry);
> > -	if (pmd_young(*pmd))
> > -		entry = pmd_mkyoung(entry);
> > +	entry = pmdp_invalidate(vma, addr, pmd);
> >  
> >  	entry = pmd_modify(entry, newprot);
> >  	if (preserve_write)
> > @@ -1927,8 +1917,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	struct page *page;
> >  	pgtable_t pgtable;
> > -	pmd_t _pmd;
> > -	bool young, write, dirty, soft_dirty;
> > +	pmd_t old, _pmd;
> > +	bool young, write, soft_dirty;
> >  	unsigned long addr;
> >  	int i;
> >  
> > @@ -1965,7 +1955,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	page_ref_add(page, HPAGE_PMD_NR - 1);
> >  	write = pmd_write(*pmd);
> >  	young = pmd_young(*pmd);
> > -	dirty = pmd_dirty(*pmd);
> >  	soft_dirty = pmd_soft_dirty(*pmd);
> >  
> >  	pmdp_huge_split_prepare(vma, haddr, pmd);
> > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  			if (soft_dirty)
> >  				entry = pte_mksoft_dirty(entry);
> >  		}
> > -		if (dirty)
> > -			SetPageDirty(page + i);
> >  		pte = pte_offset_map(&_pmd, addr);
> >  		BUG_ON(!pte_none(*pte));
> >  		set_pte_at(mm, addr, pte, entry);
> > @@ -2045,7 +2032,15 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	 * and finally we write the non-huge version of the pmd entry with
> >  	 * pmd_populate.
> >  	 */
> > -	pmdp_invalidate(vma, haddr, pmd);
> > +	old = pmdp_invalidate(vma, haddr, pmd);
> > +
> > +	/*
> > +	 * Transfer dirty bit using value returned by pmd_invalidate() to be
> > +	 * sure we don't race with CPU that can set the bit under us.
> > +	 */
> > +	if (pmd_dirty(old))
> > +		SetPageDirty(page);
> > +
> 
> When I see this, without this patch, MADV_FREE has been broken because
> it can lose dirty bit by early checking. Right?
> If so, isn't it a candidate for -stable?

Actually, I don't see how MADV_FREE supposed to work: vmscan splits THP on
reclaim and split_huge_page() would set unconditionally, so MADV_FREE
seems no effect on THP.

Or have I missed anything?

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vineet Gupta <vgupta@synopsys.com>,
	Russell King <linux@armlinux.org.uk>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ralf Baechle <ralf@linux-mips.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHv2 3/3] mm: Use updated pmdp_invalidate() inteface to track dirty/accessed bits
Date: Fri, 16 Jun 2017 16:19:08 +0300	[thread overview]
Message-ID: <20170616131908.3rxtm2w73gdfex4a@node.shutemov.name> (raw)
In-Reply-To: <20170616030250.GA27637@bbox>

On Fri, Jun 16, 2017 at 12:02:50PM +0900, Minchan Kim wrote:
> Hello,
> 
> On Thu, Jun 15, 2017 at 05:52:24PM +0300, Kirill A. Shutemov wrote:
> > This patch uses modifed pmdp_invalidate(), that return previous value of pmd,
> > to transfer dirty and accessed bits.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  fs/proc/task_mmu.c |  8 ++++----
> >  mm/huge_memory.c   | 29 ++++++++++++-----------------
> >  2 files changed, 16 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index f0c8b33d99b1..f2fc1ef5bba2 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -906,13 +906,13 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
> >  static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
> >  		unsigned long addr, pmd_t *pmdp)
> >  {
> > -	pmd_t pmd = *pmdp;
> > +	pmd_t old, pmd = *pmdp;
> >  
> >  	/* See comment in change_huge_pmd() */
> > -	pmdp_invalidate(vma, addr, pmdp);
> > -	if (pmd_dirty(*pmdp))
> > +	old = pmdp_invalidate(vma, addr, pmdp);
> > +	if (pmd_dirty(old))
> >  		pmd = pmd_mkdirty(pmd);
> > -	if (pmd_young(*pmdp))
> > +	if (pmd_young(old))
> >  		pmd = pmd_mkyoung(pmd);
> >  
> >  	pmd = pmd_wrprotect(pmd);
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index a84909cf20d3..0433e73531bf 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1777,17 +1777,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  	 * pmdp_invalidate() is required to make sure we don't miss
> >  	 * dirty/young flags set by hardware.
> >  	 */
> > -	entry = *pmd;
> > -	pmdp_invalidate(vma, addr, pmd);
> > -
> > -	/*
> > -	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
> > -	 * corrupt them.
> > -	 */
> > -	if (pmd_dirty(*pmd))
> > -		entry = pmd_mkdirty(entry);
> > -	if (pmd_young(*pmd))
> > -		entry = pmd_mkyoung(entry);
> > +	entry = pmdp_invalidate(vma, addr, pmd);
> >  
> >  	entry = pmd_modify(entry, newprot);
> >  	if (preserve_write)
> > @@ -1927,8 +1917,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	struct page *page;
> >  	pgtable_t pgtable;
> > -	pmd_t _pmd;
> > -	bool young, write, dirty, soft_dirty;
> > +	pmd_t old, _pmd;
> > +	bool young, write, soft_dirty;
> >  	unsigned long addr;
> >  	int i;
> >  
> > @@ -1965,7 +1955,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	page_ref_add(page, HPAGE_PMD_NR - 1);
> >  	write = pmd_write(*pmd);
> >  	young = pmd_young(*pmd);
> > -	dirty = pmd_dirty(*pmd);
> >  	soft_dirty = pmd_soft_dirty(*pmd);
> >  
> >  	pmdp_huge_split_prepare(vma, haddr, pmd);
> > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  			if (soft_dirty)
> >  				entry = pte_mksoft_dirty(entry);
> >  		}
> > -		if (dirty)
> > -			SetPageDirty(page + i);
> >  		pte = pte_offset_map(&_pmd, addr);
> >  		BUG_ON(!pte_none(*pte));
> >  		set_pte_at(mm, addr, pte, entry);
> > @@ -2045,7 +2032,15 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> >  	 * and finally we write the non-huge version of the pmd entry with
> >  	 * pmd_populate.
> >  	 */
> > -	pmdp_invalidate(vma, haddr, pmd);
> > +	old = pmdp_invalidate(vma, haddr, pmd);
> > +
> > +	/*
> > +	 * Transfer dirty bit using value returned by pmd_invalidate() to be
> > +	 * sure we don't race with CPU that can set the bit under us.
> > +	 */
> > +	if (pmd_dirty(old))
> > +		SetPageDirty(page);
> > +
> 
> When I see this, without this patch, MADV_FREE has been broken because
> it can lose dirty bit by early checking. Right?
> If so, isn't it a candidate for -stable?

Actually, I don't see how MADV_FREE supposed to work: vmscan splits THP on
reclaim and split_huge_page() would set unconditionally, so MADV_FREE
seems no effect on THP.

Or have I missed anything?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-06-16 13:19 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-15 14:52 [HELP-NEEDED, PATCHv2 0/3] Do not loose dirty bit on THP pages Kirill A. Shutemov
2017-06-15 14:52 ` Kirill A. Shutemov
2017-06-15 14:52 ` [PATCHv2 1/3] x86/mm: Provide pmdp_establish() helper Kirill A. Shutemov
2017-06-15 14:52   ` Kirill A. Shutemov
2017-06-16 13:36   ` Andrea Arcangeli
2017-06-16 13:36     ` Andrea Arcangeli
2017-06-19 12:46     ` Kirill A. Shutemov
2017-06-19 12:46       ` Kirill A. Shutemov
2017-06-19  5:48   ` Martin Schwidefsky
2017-06-19  5:48     ` Martin Schwidefsky
2017-06-19  5:48     ` Martin Schwidefsky
2017-06-19  5:48     ` Martin Schwidefsky
2017-06-19 12:48     ` Kirill A. Shutemov
2017-06-19 12:48       ` Kirill A. Shutemov
2017-06-19 13:04       ` Martin Schwidefsky
2017-06-19 13:04         ` Martin Schwidefsky
2017-06-19 15:22   ` Catalin Marinas
2017-06-19 15:22     ` Catalin Marinas
2017-06-19 16:00     ` Kirill A. Shutemov
2017-06-19 16:00       ` Kirill A. Shutemov
2017-06-19 17:09       ` Catalin Marinas
2017-06-19 17:09         ` Catalin Marinas
2017-06-19 21:52         ` Kirill A. Shutemov
2017-06-19 21:52           ` Kirill A. Shutemov
2017-06-20 15:54           ` Catalin Marinas
2017-06-20 15:54             ` Catalin Marinas
2017-06-21  9:53             ` Kirill A. Shutemov
2017-06-21  9:53               ` Kirill A. Shutemov
2017-06-21 10:40               ` Catalin Marinas
2017-06-21 10:40                 ` Catalin Marinas
2017-06-21 11:27               ` Catalin Marinas
2017-06-21 11:27                 ` Catalin Marinas
2017-06-21 12:04                 ` Kirill A. Shutemov
2017-06-21 12:04                   ` Kirill A. Shutemov
2017-06-21 15:49                 ` Vineet Gupta
2017-06-21 15:49                   ` Vineet Gupta
2017-06-21 17:15                   ` Kirill A. Shutemov
2017-06-21 17:15                     ` Kirill A. Shutemov
2017-06-21 17:20                     ` Vineet Gupta
2017-06-21 17:20                       ` Vineet Gupta
2017-06-21 17:52                       ` Kirill A. Shutemov
2017-06-21 17:52                         ` Kirill A. Shutemov
2017-06-21 17:52                         ` Kirill A. Shutemov
2017-06-19 17:11   ` Nadav Amit
2017-06-19 17:11     ` Nadav Amit
2017-06-19 21:57     ` Kirill A. Shutemov
2017-06-19 21:57       ` Kirill A. Shutemov
2017-06-15 14:52 ` [PATCHv2 2/3] mm: Do not loose dirty and access bits in pmdp_invalidate() Kirill A. Shutemov
2017-06-15 14:52   ` Kirill A. Shutemov
2017-06-15 22:44   ` kbuild test robot
2017-06-15 22:44     ` kbuild test robot
2017-06-15 22:44     ` kbuild test robot
2017-06-16 13:40   ` Andrea Arcangeli
2017-06-16 13:40     ` Andrea Arcangeli
2017-06-19 13:29     ` Kirill A. Shutemov
2017-06-19 13:29       ` Kirill A. Shutemov
2017-06-15 14:52 ` [PATCHv2 3/3] mm: Use updated pmdp_invalidate() inteface to track dirty/accessed bits Kirill A. Shutemov
2017-06-15 14:52   ` Kirill A. Shutemov
2017-06-15 21:54   ` kbuild test robot
2017-06-15 21:54     ` kbuild test robot
2017-06-15 21:54     ` kbuild test robot
2017-06-15 23:02   ` kbuild test robot
2017-06-15 23:02     ` kbuild test robot
2017-06-15 23:02     ` kbuild test robot
2017-06-16  3:02   ` Minchan Kim
2017-06-16  3:02     ` Minchan Kim
2017-06-16 13:19     ` Kirill A. Shutemov [this message]
2017-06-16 13:19       ` Kirill A. Shutemov
2017-06-16 13:52       ` Minchan Kim
2017-06-16 13:52         ` Minchan Kim
2017-06-16 14:27         ` Andrea Arcangeli
2017-06-16 14:27           ` Andrea Arcangeli
2017-06-16 14:53           ` Minchan Kim
2017-06-16 14:53             ` Minchan Kim
2017-06-19 14:03             ` Kirill A. Shutemov
2017-06-19 14:03               ` Kirill A. Shutemov
2017-06-20  2:52               ` Minchan Kim
2017-06-20  2:52                 ` Minchan Kim
2017-06-20  9:57                 ` Minchan Kim
2017-06-20  9:57                   ` Minchan Kim
2017-06-16 11:31   ` Aneesh Kumar K.V
2017-06-16 11:31     ` Aneesh Kumar K.V
2017-06-16 11:31     ` Aneesh Kumar K.V
2017-06-16 11:31     ` Aneesh Kumar K.V
2017-06-16 13:21     ` Kirill A. Shutemov
2017-06-16 13:21       ` Kirill A. Shutemov
2017-06-16 15:57       ` Aneesh Kumar K.V
2017-06-16 15:57         ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170616131908.3rxtm2w73gdfex4a@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=minchan@kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=vgupta@synopsys.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.