Re: [lkp] [x86/mtrr] edfe63ec97: kernel BUG at arch/x86/mm/physaddr.c:79!

* Re: [lkp] [x86/mtrr] edfe63ec97: kernel BUG at arch/x86/mm/physaddr.c:79!
       [not found] <877fgigh9t.fsf@yhuang-dev.intel.com>
@ 2016-04-01 19:44 ` Toshi Kani
  0 siblings, 0 replies; only message in thread
From: Toshi Kani @ 2016-04-01 19:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: lkp, linux-kernel, Toshi Kani, Peter Zijlstra, Luis R.Rodriguez,
	Linus Torvalds, Juergen Gross, H.Peter Anvin, Denys Vlasenko,
	Brian Gerst, Borislav Petkov, Borislav Petkov, Andy Lutomirski,
	Andrew Morton, Thomas Gleixner, Ingo Molnar

On Fri, 2016-04-01 at 11:05 +0800, kernel test robot wrote:
> FYI, we noticed the below changes on
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/mm
> commit edfe63ec97ed8d4496225f7ba54c9ce4207c5431 ("x86/mtrr: Fix Xorg
> crashes in Qemu sessions")
> 
> 
> [   10.429879] hgafb: HGA card not detected.
> [   10.430521] hgafb: probe of hgafb.0 failed with error -22
> [   10.434199] ------------[ cut here ]------------
> [   10.434889] kernel BUG at arch/x86/mm/physaddr.c:79!
> [   10.435784] invalid opcode: 0000 [#1] DEBUG_PAGEALLOC 
> [   10.436627] CPU: 0 PID: 117 Comm: v86d Not tainted 4.6.0-rc1-00015-
> gedfe63e #1
> [   10.437696] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Debian-1.8.2-1 04/01/2014
> [   10.438929] task: cf91d900 ti: cf8fa000 task.ti: cf8fa000
> [   10.439664] EIP: 0060:[<c1033290>] EFLAGS: 00010206 CPU: 0
> [   10.440409] EIP is at __phys_addr+0x80/0x90
> [   10.441022] EAX: 13fe0000 EBX: 13fe0000 ECX: 00000000 EDX: 13fe0000
> [   10.441975] ESI: 00000000 EDI: 00000000 EBP: cf8fbe4c ESP: cf8fbe48
> [   10.442804]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> [   10.443534] CR0: 80050033 CR2: 08063e48 CR3: 0f9f8f20 CR4: 000006b0
> [   10.444362] Stack:
> [   10.444772]  cf9e4dfc cf8fbe60 c1031eef 00000001 00001000 00000000
> cf8fbea8 c15952d1
> [   10.446322]  cf9e4dfc d3a23518 c10fce12 024080c0 024080c0 d2b05c80
> 00000000 00000000
> [   10.447870]  d15da220 cf9e4dd8 00001000 00001000 00000000 cf9ed790
> b7752000 cf9ed788
> [   10.449424] Call Trace:
> [   10.449877]  [<c1031eef>] phys_mem_access_prot_allowed+0xaf/0xf0
> [   10.450670]  [<c15952d1>] mmap_mem+0xa1/0x170
> [   10.451308]  [<c10fce12>] ? mmap_region+0x242/0x510
> [   10.451993]  [<c10fce9a>] mmap_region+0x2ca/0x510
> [   10.452657]  [<c10fd30d>] do_mmap+0x22d/0x300
> [   10.453313]  [<c10e7d74>] vm_mmap_pgoff+0x54/0x80
> [   10.453985]  [<c10fb211>] SyS_mmap_pgoff+0xa1/0x100
> [   10.454665]  [<c10013c3>] do_int80_syscall_32+0x63/0x150
> [   10.455396]  [<c1b2684e>] entry_INT80_32+0x36/0x36

In short, this is a bug in previously (and unintentionally) deadcode. 

After commit edfe63ec97, PAT is now set to disable properly when MTRRs are
disabled.  This led the following deadcode to resurrect on x86/32.

phys_mem_access_prot_allowed()
 :
#ifdef CONFIG_X86_32
        /*
         * On the PPro and successors, the MTRRs are used to set
         * memory types for physical addresses outside main memory,
         * so blindly setting UC or PWT on those pages is wrong.
         * For Pentiums and earlier, the surround logic should disable
         * caching for the high addresses through the KEN pin, but
         * we maintain the tradition of paranoia in this code.
         */
        if (!pat_enabled() &&
            !(boot_cpu_has(X86_FEATURE_MTRR) ||
              boot_cpu_has(X86_FEATURE_K6_MTRR) ||
              boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
              boot_cpu_has(X86_FEATURE_CENTAUR_MCR)) &&
            (pfn << PAGE_SHIFT) >= __pa(high_memory)) {
                pcm = _PAGE_CACHE_MODE_UC;
        }
#endif

When the system does not have much memory, 'high_memory' points to the
maximum memory address + 1, which is empty.  When CONFIG_DEBUG_VIRTUAL is
also set, __pa() calls __phys_addr(), which in turn
calls slow_virt_to_phys() for high_memory.  Because high_memory does not
point to a valid memory address, this address is not mapped.  Hence,
BUG_ON.

This can be fixed by changing it to either __pa(high_memory-1)
or __pa_nodebug(high_memory).  Since the code does not expect a valid
virtual address for high_memory, I think using __pa_nodebug() is
appropriate here.  I am going to send a patch with this change.

Note, the code should not use high_memory for this check.  I have a
separate patch for the /dev/mem driver to check if a target address is
backed by any memory (Ingo, any update on this one?).  I consider it as
enhancement, so I am not going to replace the high_memory check for this
bug fix, though.
https://lkml.org/lkml/2016/2/9/935
https://lkml.org/lkml/2016/2/17/493

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] only message in thread