From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hugh@veritas.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [patch] mm: dirty page tracking race fix
Date: Mon, 18 Aug 2008 09:49:06 +0200 [thread overview]
Message-ID: <1219045746.10800.261.camel@twins> (raw)
In-Reply-To: <20080818053821.GA3011@wotan.suse.de>
On Mon, 2008-08-18 at 07:38 +0200, Nick Piggin wrote:
> There is a race with dirty page accounting where a page may not properly
> be accounted for.
>
> clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
>
> page_mkclean walks the rmaps for that page, and for each one it cleans and
> write protects the pte if it was dirty. It uses page_check_address to find the
> pte. That function has a shortcut to avoid the ptl if the pte is not
> present. Unfortunately, the pte can be switched to not-present then back to
> present by other code while holding the page table lock -- this should not
> be a signal for page_mkclean to ignore that pte, because it may be dirty.
>
> For example, powerpc64's set_pte_at will clear a previously present pte before
> setting it to the desired value. There may also be other code in core mm or
> in arch which do similar things.
>
> The consequence of the bug is loss of data integrity due to msync, and loss
> of dirty page accounting accuracy. XIP's __xip_unmap could easily also be
> unreliable (depending on the exact XIP locking scheme), which can lead to data
> corruption.
>
> Fix this by having an option to always take ptl to check the pte in
> page_check_address.
>
> It's possible to retain this optimization for page_referenced and
> try_to_unmap.
s/synch/sync/ ?
we use sync all over the kernel to mean synchonous, so why are you
inventing a new shorthand?
Anyway, still
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.orig/include/linux/rmap.h
> +++ linux-2.6/include/linux/rmap.h
> @@ -102,7 +102,7 @@ int try_to_unmap(struct page *, int igno
> * Called from mm/filemap_xip.c to unmap empty zero page
> */
> pte_t *page_check_address(struct page *, struct mm_struct *,
> - unsigned long, spinlock_t **);
> + unsigned long, spinlock_t **, int);
>
> /*
> * Used by swapoff to help locate where page is expected in vma.
> Index: linux-2.6/mm/filemap_xip.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap_xip.c
> +++ linux-2.6/mm/filemap_xip.c
> @@ -185,7 +185,7 @@ __xip_unmap (struct address_space * mapp
> address = vma->vm_start +
> ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
> BUG_ON(address < vma->vm_start || address >= vma->vm_end);
> - pte = page_check_address(page, mm, address, &ptl);
> + pte = page_check_address(page, mm, address, &ptl, 1);
> if (pte) {
> /* Nuke the page table entry. */
> flush_cache_page(vma, address, pte_pfn(*pte));
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -224,10 +224,14 @@ unsigned long page_address_in_vma(struct
> /*
> * Check that @page is mapped at @address into @mm.
> *
> + * If @synch is false, page_check_address may perform a racy check to avoid
> + * the page table lock when the pte is not present (helpful when reclaiming
> + * highly shared pages).
> + *
> * On success returns with pte mapped and locked.
> */
> pte_t *page_check_address(struct page *page, struct mm_struct *mm,
> - unsigned long address, spinlock_t **ptlp)
> + unsigned long address, spinlock_t **ptlp, int synch)
> {
> pgd_t *pgd;
> pud_t *pud;
> @@ -249,7 +253,7 @@ pte_t *page_check_address(struct page *p
>
> pte = pte_offset_map(pmd, address);
> /* Make a quick check before getting the lock */
> - if (!pte_present(*pte)) {
> + if (!synch && !pte_present(*pte)) {
> pte_unmap(pte);
> return NULL;
> }
> @@ -281,7 +285,7 @@ static int page_referenced_one(struct pa
> if (address == -EFAULT)
> goto out;
>
> - pte = page_check_address(page, mm, address, &ptl);
> + pte = page_check_address(page, mm, address, &ptl, 0);
> if (!pte)
> goto out;
>
> @@ -450,7 +454,7 @@ static int page_mkclean_one(struct page
> if (address == -EFAULT)
> goto out;
>
> - pte = page_check_address(page, mm, address, &ptl);
> + pte = page_check_address(page, mm, address, &ptl, 1);
> if (!pte)
> goto out;
>
> @@ -697,7 +701,7 @@ static int try_to_unmap_one(struct page
> if (address == -EFAULT)
> goto out;
>
> - pte = page_check_address(page, mm, address, &ptl);
> + pte = page_check_address(page, mm, address, &ptl, 0);
> if (!pte)
> goto out;
>
next prev parent reply other threads:[~2008-08-18 7:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-18 5:38 [patch] mm: dirty page tracking race fix Nick Piggin
2008-08-18 5:44 ` [patch] mm: xip fix fault vs sparse page invalidate race Nick Piggin
2008-08-18 6:03 ` [patch] mm: xip/ext2 fix block allocation race Nick Piggin
2008-08-18 10:51 ` Carsten Otte
2008-08-18 10:50 ` [patch] mm: xip fix fault vs sparse page invalidate race Carsten Otte
2008-08-18 7:49 ` Peter Zijlstra [this message]
2008-08-18 8:03 ` [patch] mm: dirty page tracking race fix Nick Piggin
2008-08-18 8:07 ` Peter Zijlstra
2008-08-18 8:12 ` Nick Piggin
2008-08-19 9:11 ` Andrew Morton
2008-08-19 10:19 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1219045746.10800.261.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).