On 19/01/18 4:04 PM, Matthew Wilcox wrote:
On Thu, Jan 18, 2018 at 02:18:20PM -0800, Laura Abbott wrote:
On 01/18/2018 01:55 PM, Andrew Morton wrote:
[   24.647744] BUG: unable to handle kernel NULL pointer dereference at
00000008
[   24.647801] IP: __radix_tree_lookup+0x14/0xa0
[   24.647811] *pdpt = 00000000253d6027 *pde = 0000000000000000
[   24.647828] Oops: 0000 [#1] SMP
[   24.647842] CPU: 5 PID: 3600 Comm: java Not tainted
4.14.13-rh10-20180115190010.xenU.i386 #1
[   24.647855] task: e52518c0 task.stack: e4e7a000
[   24.647866] EIP: __radix_tree_lookup+0x14/0xa0
[   24.647876] EFLAGS: 00010286 CPU: 5
[   24.647884] EAX: 00000004 EBX: 00000007 ECX: 00000000 EDX: 00000000
[   24.647895] ESI: 00000000 EDI: 00000000 EBP: e4e7bdb8 ESP: e4e7bda0
[   24.647904]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[   24.647917] CR0: 80050033 CR2: 00000008 CR3: 25360000 CR4: 00002660
[   24.647930] Call Trace:
[   24.647942]  radix_tree_lookup_slot+0x13/0x30
[   24.647955]  find_get_entry+0x1d/0x120
[   24.647963]  pagecache_get_page+0x1f/0x230
[   24.647975]  lookup_swap_cache+0x42/0x140
[   24.647983]  swap_readahead_detect+0x66/0x2e0
[   24.647993]  do_swap_page+0x1fa/0x860
[   24.648010]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[   24.648026]  ? xen_pmd_val+0x10/0x20
[   24.648035]  handle_mm_fault+0x6f8/0x1020
[   24.648046]  __do_page_fault+0x18a/0x450
[   24.648055]  ? vmalloc_sync_all+0x250/0x250
[   24.648063]  do_page_fault+0x21/0x30
[   24.648074]  common_exception+0x45/0x4a
[   24.648082] EIP: 0xb76d873e
[   24.648088] EFLAGS: 00010206 CPU: 5
[   24.648096] EAX: 76a10000 EBX: 76a1cd14 ECX: 00000006 EDX: 00000006
[   24.648105] ESI: 00000040 EDI: b796c380 EBP: 77881008 ESP: 77880ff8
[   24.648115]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[   24.648124] Code: ff ff ff 00 47 03 e9 69 ff ff ff 8b 45 08 89 06 e9 1f ff
ff ff 66 90 55 89 e5 57 89 d7 56 53 83 ec 0c 89 45 ec 89 4d e8 8b 45 ec <8b> 58
04 89 d8 83 e0 03 48 89 5d f0 75 64 89 d8 83 e0 fe 0f b6
[   24.648195] EIP: __radix_tree_lookup+0x14/0xa0 SS:ESP: 0069:e4e7bda0
[   24.648205] CR2: 0000000000000008
[   24.648273] ---[ end trace ed356e59f215ce07 ]---
Running that code through decodecode, I get:

   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	57                   	push   %edi
   4:	89 d7                	mov    %edx,%edi
   6:	56                   	push   %esi
   7:	53                   	push   %ebx
   8:	83 ec 0c             	sub    $0xc,%esp
   b:	89 45 ec             	mov    %eax,-0x14(%ebp)
   e:	89 4d e8             	mov    %ecx,-0x18(%ebp)
  11:	8b 45 ec             	mov    -0x14(%ebp),%eax
  14:*	8b 58 04             	mov    0x4(%eax),%ebx		<-- trapping instruction
  17:	89 d8                	mov    %ebx,%eax
  19:	83 e0 03             	and    $0x3,%eax

Which I think means it's looking at offset 4 from whichever argument
the x86 calling convention puts in register %eax.  Which I think is
argument 0?  Which is the radix tree root.  And that makes sense; we're
loading the root node from the radix tree root at offset 4.  The problem
is that %eax has the value 4 in it.  That would match with 'page_tree'
being at offset 4 from the start of address_space.  So find_get_page()
got called with a NULL mapping, so pagecache_get_page() got called
with a NULL mapping.

Which means I've tracked it back to:

        page = find_get_page(swap_address_space(entry), swp_offset(entry));

and swap_address_space() is returning NULL.  Has this machine run swapoff
recently, perhaps?
Swap was on.A Swap is small (127MB).A Swap had not been dipped into.A
A A A A A A A A A A A A totalA A A A A A usedA A A A A A freeA A A A sharedA A A buffersA A A A cached
Swap:A A A A A A A A A 127A A A A A A A A A 0A A A A A A A 127

PS: cannot recall seeing this issue on x86_64, just 32 bit.A PPS: reminder this is on a Xen VM which per https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#PVH-Guest-Specific-Options hasA "out of sync pagetables" if that is relevant (we do not set that option, I am unsure what default is used).