All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Hansen <dave.hansen@intel.com>,
	Chen Liqin <liqin.linux@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	Guan Xuetao <gxt@mprc.pku.edu.cn>,
	"David S. Miller" <davem@davemloft.net>,
	Chris Zankel <chris@zankel.net>,
	Vineet Gupta <vgupta@synopsys.com>,
	Ley Foon Tan <lftan@altera.com>,
	Ralf Baechle <ralf@linux-mips.org>,
	Andi Kleen <ak@linux.intel.com>
Cc: linux-mm@kvack.org,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: The usage of page_mapping() in architecture code
Date: Sun, 11 Feb 2018 14:47:42 +0800	[thread overview]
Message-ID: <87o9kwxbsh.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <87vaf4xbz8.fsf@yhuang-dev.intel.com> (Ying Huang's message of "Sun, 11 Feb 2018 14:43:39 +0800")

Sorry for bothering, forget to Cc LKML in the original email.

Hi, All,

To optimize the scalability of swap cache, it is made more dynamic
than before.  That is, after being swapped off, the address space of
the swap device will be freed too.  So the usage of page_mapping()
need to be audited to make sure the address space of the swap device
will not be used after it is freed.  For most cases it is OK, because
to call page_mapping(), the page, page table, or LRU list will be
locked.  But I found at least one usage isn't safe.  When
page_mapping() is called in architecture specific code to flush dcache
or sync between dcache and icache.

The typical usage models are,


1) Check whether page_mapping() is NULL, which is safe

2) Call mapping_mapped() to check whether the backing file is mapped
   to user space.

3) Iterate all vmas via the interval tree (mapping->i_mmap) to flush dcache


2) and 3) isn't safe, because no lock to prevent swap device from
swapping off is held.  But I found the code is for file address space
only, not for swap cache.  For example, for flush_dcache_page() in
arch/parisc/kernel/cache.c,


void flush_dcache_page(struct page *page)
{
	struct address_space *mapping = page_mapping(page);
	struct vm_area_struct *mpnt;
	unsigned long offset;
	unsigned long addr, old_addr = 0;
	pgoff_t pgoff;

	if (mapping && !mapping_mapped(mapping)) {
		set_bit(PG_dcache_dirty, &page->flags);
		return;
	}

	flush_kernel_dcache_page(page);

	if (!mapping)
		return;

	pgoff = page->index;

	/* We have carefully arranged in arch_get_unmapped_area() that
	 * *any* mappings of a file are always congruently mapped (whether
	 * declared as MAP_PRIVATE or MAP_SHARED), so we only need
	 * to flush one address here for them all to become coherent */

	flush_dcache_mmap_lock(mapping);
	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
		addr = mpnt->vm_start + offset;

		/* The TLB is the engine of coherence on parisc: The
		 * CPU is entitled to speculate any page with a TLB
		 * mapping, so here we kill the mapping then flush the
		 * page along a special flush only alias mapping.
		 * This guarantees that the page is no-longer in the
		 * cache for any process and nor may it be
		 * speculatively read in (until the user or kernel
		 * specifically accesses it, of course) */

		flush_tlb_page(mpnt, addr);
		if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
				      != (addr & (SHM_COLOUR - 1))) {
			__flush_cache_page(mpnt, addr, page_to_phys(page));
			if (old_addr)
				printk(KERN_ERR "INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", old_addr, addr, mpnt->vm_file);
			old_addr = addr;
		}
	}
	flush_dcache_mmap_unlock(mapping);
}


if page is an anonymous page in swap cache, "mapping &&
!mapping_mapped()" will be true, so we will delay flushing.  But if my
understanding of the code were correct, we should call
flush_kernel_dcache() because the kernel may access the page during
swapping in/out.

The code in other architectures follow the similar logic.  Would it be
better for page_mapping() here to return NULL for anonymous pages even
if they are in swap cache?  Of course we need to change the function
name.  page_file_mapping() appears a good name, but that has been used
already.  Any suggestion?

Is my understanding correct?  Could you help me on this?

Best Regards,
Huang, Ying

WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Hansen <dave.hansen@intel.com>,
	Chen Liqin <liqin.linux@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	Guan Xuetao <gxt@mprc.pku.edu.cn>,
	"David S. Miller" <davem@davemloft.net>,
	Chris Zankel <chris@zankel.net>,
	Vineet Gupta <vgupta@synopsys.com>,
	Ley Foon Tan <lftan@altera.com>,
	Ralf Baechle <ralf@linux-mips.org>,
	Andi Kleen <ak@linux.intel.com>
Cc: linux-mm@kvack.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: The usage of page_mapping() in architecture code
Date: Sun, 11 Feb 2018 14:47:42 +0800	[thread overview]
Message-ID: <87o9kwxbsh.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <87vaf4xbz8.fsf@yhuang-dev.intel.com> (Ying Huang's message of "Sun, 11 Feb 2018 14:43:39 +0800")

Sorry for bothering, forget to Cc LKML in the original email.

Hi, All,

To optimize the scalability of swap cache, it is made more dynamic
than before.  That is, after being swapped off, the address space of
the swap device will be freed too.  So the usage of page_mapping()
need to be audited to make sure the address space of the swap device
will not be used after it is freed.  For most cases it is OK, because
to call page_mapping(), the page, page table, or LRU list will be
locked.  But I found at least one usage isn't safe.  When
page_mapping() is called in architecture specific code to flush dcache
or sync between dcache and icache.

The typical usage models are,


1) Check whether page_mapping() is NULL, which is safe

2) Call mapping_mapped() to check whether the backing file is mapped
   to user space.

3) Iterate all vmas via the interval tree (mapping->i_mmap) to flush dcache


2) and 3) isn't safe, because no lock to prevent swap device from
swapping off is held.  But I found the code is for file address space
only, not for swap cache.  For example, for flush_dcache_page() in
arch/parisc/kernel/cache.c,


void flush_dcache_page(struct page *page)
{
	struct address_space *mapping = page_mapping(page);
	struct vm_area_struct *mpnt;
	unsigned long offset;
	unsigned long addr, old_addr = 0;
	pgoff_t pgoff;

	if (mapping && !mapping_mapped(mapping)) {
		set_bit(PG_dcache_dirty, &page->flags);
		return;
	}

	flush_kernel_dcache_page(page);

	if (!mapping)
		return;

	pgoff = page->index;

	/* We have carefully arranged in arch_get_unmapped_area() that
	 * *any* mappings of a file are always congruently mapped (whether
	 * declared as MAP_PRIVATE or MAP_SHARED), so we only need
	 * to flush one address here for them all to become coherent */

	flush_dcache_mmap_lock(mapping);
	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
		addr = mpnt->vm_start + offset;

		/* The TLB is the engine of coherence on parisc: The
		 * CPU is entitled to speculate any page with a TLB
		 * mapping, so here we kill the mapping then flush the
		 * page along a special flush only alias mapping.
		 * This guarantees that the page is no-longer in the
		 * cache for any process and nor may it be
		 * speculatively read in (until the user or kernel
		 * specifically accesses it, of course) */

		flush_tlb_page(mpnt, addr);
		if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
				      != (addr & (SHM_COLOUR - 1))) {
			__flush_cache_page(mpnt, addr, page_to_phys(page));
			if (old_addr)
				printk(KERN_ERR "INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", old_addr, addr, mpnt->vm_file);
			old_addr = addr;
		}
	}
	flush_dcache_mmap_unlock(mapping);
}


if page is an anonymous page in swap cache, "mapping &&
!mapping_mapped()" will be true, so we will delay flushing.  But if my
understanding of the code were correct, we should call
flush_kernel_dcache() because the kernel may access the page during
swapping in/out.

The code in other architectures follow the similar logic.  Would it be
better for page_mapping() here to return NULL for anonymous pages even
if they are in swap cache?  Of course we need to change the function
name.  page_file_mapping() appears a good name, but that has been used
already.  Any suggestion?

Is my understanding correct?  Could you help me on this?

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-02-11  6:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-11  6:43 The usage of page_mapping() in architecture code Huang, Ying
2018-02-11  6:47 ` Huang, Ying [this message]
2018-02-11  6:47   ` Huang, Ying
2018-02-11  8:17 ` Russell King - ARM Linux
2018-02-11  8:39   ` Huang, Ying
2018-02-11 13:21     ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9kwxbsh.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@zankel.net \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=gxt@mprc.pku.edu.cn \
    --cc=hannes@cmpxchg.org \
    --cc=jejb@parisc-linux.org \
    --cc=lftan@altera.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=liqin.linux@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=vgupta@synopsys.com \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.