Re: [tip:x86/mm] x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation

* Re: [tip:x86/mm] x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
@ 2017-04-20 21:46 Dan Williams
  2017-04-21 14:16 ` Kirill A. Shutemov
  2017-04-23 23:31 ` get_zone_device_page() in get_page() and page_cache_get_speculative() Kirill A. Shutemov
  0 siblings, 2 replies; 48+ messages in thread
From: Dan Williams @ 2017-04-20 21:46 UTC (permalink / raw)
  To: Catalin Marinas, aneesh.kumar, steve.capper, Thomas Gleixner,
	Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
	Andrew Morton, Kirill A. Shutemov, H. Peter Anvin, dave.hansen,
	Borislav Petkov, Rik van Riel, dann.frazier, Linus Torvalds,
	Michal Hocko
  Cc: linux-tip-commits

On Sat, Mar 18, 2017 at 2:52 AM, tip-bot for Kirill A. Shutemov
<tipbot@zytor.com> wrote:
> Commit-ID:  2947ba054a4dabbd82848728d765346886050029
> Gitweb:     http://git.kernel.org/tip/2947ba054a4dabbd82848728d765346886050029
> Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> AuthorDate: Fri, 17 Mar 2017 00:39:06 +0300
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Sat, 18 Mar 2017 09:48:03 +0100
>
> x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
>
> This patch provides all required callbacks required by the generic
> get_user_pages_fast() code and switches x86 over - and removes
> the platform specific implementation.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Dann Frazier <dann.frazier@canonical.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Steve Capper <steve.capper@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
> [ Minor readability edits. ]
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

I'm still trying to spot the bug, but bisect points to this patch as
the point at which my unit tests start failing with the following
signature:

[   35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155
percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
[   35.425328] percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0
(0) after switching to atomic
[   35.425329] Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel nd_pmem(O) dax_pmem(O) nd_btt(O) dax(O) serio_raw
nfit(O) nd_e820(O) libnvdimm(O) tpm_tis tpm_tis_co
re tpm nfit_test_iomap(O) nfsd nfs_acl
[   35.433683] CPU: 8 PID: 245 Comm: rcuos/29 Tainted: G           O
 4.11.0-rc2+ #55
[   35.435538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.fc25 04/01/2014
[   35.437500] Call Trace:
[   35.438270]  dump_stack+0x86/0xc3
[   35.439156]  __warn+0xcb/0xf0
[   35.439995]  warn_slowpath_fmt+0x5f/0x80
[   35.440962]  ? rcu_nocb_kthread+0x27a/0x500
[   35.441957]  ? dax_pmem_percpu_exit+0x50/0x50 [dax_pmem]
[   35.443107]  percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
[   35.444251]  ? percpu_ref_exit+0x60/0x60
[   35.445206]  rcu_nocb_kthread+0x327/0x500
[   35.446186]  ? rcu_nocb_kthread+0x27a/0x500
[   35.447188]  kthread+0x10c/0x140
[   35.448058]  ? rcu_eqs_enter+0x50/0x50
[   35.448990]  ? kthread_create_on_node+0x60/0x60
[   35.450038]  ret_from_fork+0x31/0x40
[   35.450976] ---[ end trace eaa40898a09519b5 ]---

This is similar to the backtrace when we were not properly handling
pud faults and was fixed with this commit: 220ced1676c4 "mm: fix
get_user_pages() vs device-dax pud mappings"

I've found some missing _devmap checks in the generic
get_user_pages_fast() path, but this does not fix the regression:

diff --git a/mm/gup.c b/mm/gup.c
index 2559a3987de7..89156cd59cbc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1475,7 +1475,8 @@ static int gup_pmd_range(pud_t pud, unsigned
long addr, unsigned long end,
                if (pmd_none(pmd))
                        return 0;

-               if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+               if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd)
+                                       || pmd_devmap(pmd))) {
                        /*
                         * NUMA hinting faults need to be handled in the GUP
                         * slowpath for accounting purposes and so that they
@@ -1516,7 +1517,7 @@ static int gup_pud_range(p4d_t p4d, unsigned
long addr, unsigned long end,
                next = pud_addr_end(addr, end);
                if (pud_none(pud))
                        return 0;
-               if (unlikely(pud_huge(pud))) {
+               if (unlikely(pud_huge(pud) || pud_devmap(pud))) {
                        if (!gup_huge_pud(pud, pudp, addr, next, write,
                                          pages, nr))
                                return 0;


...more hunting tomorrow.

^ permalink raw reply related	[flat|nested] 48+ messages in thread