On 30/03/2016 1:14 AM, Boris Ostrovsky wrote: > On 03/29/2016 04:56 AM, Steven Haigh wrote: >> >> Interestingly enough, this just happened again - but on a different >> virtual machine. I'm starting to wonder if this may have something to do >> with the uptime of the machine - as the system that this seems to happen >> to is always different. >> >> Destroying it and monitoring it again has so far come up blank. >> >> I've thrown the latest lot of kernel messages here: >> http://paste.fedoraproject.org/346802/59241532 > > Would be good to see full console log. The one that you posted starts > with an error so I wonder what was before that. Ok, so I had a virtual machine do this again today. Both vcpus went to 100% usage and essentially hung. I attached to the screen console that was connected via 'xl console' and copied the entire buffer to paste below: yum-cron[30740]: segfault at 1781ab8 ip 00007f2a7fcd282f sp 00007ffe8655fe90 error 5 in libpython2.7.so.1.0[7f2a7fbf5000+178000] swap_free: Bad swap file entry 2a2b7d5bb69515d8 BUG: Bad page map in process yum-cron pte:56fab76d2a2bb06a pmd:0309e067 addr:0000000001780000 vm_flags:00100073 anon_vma:ffff88007b974c08 mapping: (null) index:1780 file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 0 PID: 30740 Comm: yum-cron Tainted: G B 4.4.6-4.el7xen.x86_64 #1 0000000000000000 ffff88004176bac0 ffffffff81323d17 0000000001780000 00003ffffffff000 ffff88004176bb08 ffffffff8117e574 ffffffff81193d6e 0000000000001780 ffff88000309ec00 0000000001780000 56fab76d2a2bb06a Call Trace: [] dump_stack+0x63/0x8c [] print_bad_pte+0x1e4/0x290 [] ? swap_info_get+0x7e/0xe0 [] unmap_single_vma+0x4ff/0x840 [] unmap_vmas+0x47/0x90 [] exit_mmap+0x98/0x150 [] mmput+0x47/0x100 [] do_exit+0x24e/0xad0 [] do_group_exit+0x3f/0xa0 [] get_signal+0x1c3/0x5e0 [] do_signal+0x28/0x630 [] ? printk+0x4d/0x4f [] ? vprintk_default+0x1f/0x30 [] ? bad_area_access_error+0x43/0x4a [] ? __do_page_fault+0x22c/0x3f0 [] exit_to_usermode_loop+0x4c/0x95 [] prepare_exit_to_usermode+0x18/0x20 [] retint_user+0x8/0x13 BUG: Bad page state in process yum-cron pfn:0f3bf page:ffffea00003cefc0 count:0 mapcount:7 mapping:ffff88000f3bf008 index:0xffff88000f3bf000 flags: 0x10000000000094(referenced|dirty|slab) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x80(slab) Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 0 PID: 30740 Comm: yum-cron Tainted: G B 4.4.6-4.el7xen.x86_64 #1 0000000000000000 ffff88004176b958 ffffffff81323d17 ffffea00003cefc0 ffffffff817ab348 ffff88004176b980 ffffffff811c1ab5 ffffea00003cefc0 0000000000000001 0000000000000000 ffff88004176b9d0 ffffffff81159584 Call Trace: [] dump_stack+0x63/0x8c [] bad_page.part.69+0xdf/0xfc [] free_pages_prepare+0x294/0x2a0 [] free_hot_cold_page+0x31/0x160 [] free_hot_cold_page_list+0x49/0xb0 [] release_pages+0xc5/0x260 [] free_pages_and_swap_cache+0x7d/0x90 [] tlb_flush_mmu_free+0x36/0x60 [] unmap_single_vma+0x664/0x840 [] unmap_vmas+0x47/0x90 [] exit_mmap+0x98/0x150 [] mmput+0x47/0x100 [] do_exit+0x24e/0xad0 [] do_group_exit+0x3f/0xa0 [] get_signal+0x1c3/0x5e0 [] do_signal+0x28/0x630 [] ? printk+0x4d/0x4f [] ? vprintk_default+0x1f/0x30 [] ? bad_area_access_error+0x43/0x4a [] ? __do_page_fault+0x22c/0x3f0 [] exit_to_usermode_loop+0x4c/0x95 [] prepare_exit_to_usermode+0x18/0x20 [] retint_user+0x8/0x13 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:1 val:2 BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:2 val:-1 yum-cron[4197]: segfault at 32947fcb ip 00007ff0fa1bf8bd sp 00007ffdb1c54990 error 4 in libpython2.7.so.1.0[7ff0fa13a000+178000] BUG: unable to handle kernel paging request at ffff88010f3beffe IP: [] free_block+0x119/0x190 PGD 188b063 PUD 0 Oops: 0002 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G B 4.4.6-4.el7xen.x86_64 #1 Workqueue: events cache_reap task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000 RIP: 0010:[] [] free_block+0x119/0x190 RSP: 0018:ffff880051703d40 EFLAGS: 00010082 RAX: ffffea00003cefc0 RBX: ffffea0000000000 RCX: ffff88000f3bf000 RDX: 00000000fffffffe RSI: ffff88007fd19c40 RDI: ffff88007d012100 RBP: ffff880051703d68 R08: ffff880051703d88 R09: 0000000000000006 R10: ffff88007d01a940 R11: 0000000080000000 R12: 000077ff80000000 R13: ffff88007fd19c58 R14: ffff88007d01a948 R15: ffff88007d01a968 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88010f3beffe CR3: 000000007bc8d000 CR4: 0000000000040620 Stack: ffff88007fd19bf0 ffff880051703d88 ffff88007d01a940 000000000000000b ffff88007d012100 ffff880051703dc0 ffffffff811a871b 0000000b51703dd8 ffff88007fd19c00 ffff880051703d88 ffff880051703d88 ffff88007d012100 Call Trace: [] drain_array+0xab/0x130 [] cache_reap+0x6c/0x230 [] process_one_work+0x151/0x400 [] worker_thread+0x11a/0x460 [] ? __schedule+0x2bf/0x880 [] ? rescuer_thread+0x2f0/0x2f0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: 41 89 d1 4c 0f af c9 0f b6 4f 1c 49 c1 e9 20 44 29 ca d3 ea 0f b6 4f 1d 41 01 d1 8b 50 18 41 d3 e9 48 8b 48 10 83 ea 01 89 50 18 <44> 88 0c 11 49 8b 52 38 48 83 c2 01 49 89 52 38 8b 48 18 85 c9 RIP [] free_block+0x119/0x190 RSP CR2: ffff88010f3beffe ---[ end trace 99d22a980c39d4db ]--- BUG: unable to handle kernel paging request at ffffffffffffffd8 IP: [] kthread_data+0x10/0x20 PGD 188d067 PUD 188f067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4 CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G B D 4.4.6-4.el7xen.x86_64 #1 task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000 RIP: 0010:[] [] kthread_data+0x10/0x20 RSP: 0018:ffff880051703a60 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 RDX: ffffffff81bd4080 RSI: 0000000000000001 RDI: ffff8800346bf1c0 RBP: ffff880051703a60 R08: ffff88007c8e19d0 R09: 0000c88ed0b2b602 R10: 000000000000001f R11: 0000000000000000 R12: ffff8800346bf1c0 R13: 0000000000000000 R14: 00000000000163c0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 000000007bc8d000 CR4: 0000000000040620 Stack: ffff880051703a78 ffffffff81094741 ffff88007fd163c0 ffff880051703ac0 ffffffff8165858f ffffffff810d2e27 ffff8800346bf1c0 ffff880051704000 ffff8800346bf510 0000000000000046 ffff8800346bf1c0 ffff88007d3facc0 Call Trace: [] wq_worker_sleeping+0x11/0x90 [] __schedule+0x3bf/0x880 [] ? call_rcu_sched+0x17/0x20 [] schedule+0x35/0x80 [] do_exit+0x659/0xad0 [] oops_end+0x9a/0xd0 [] no_context+0x10d/0x360 [] __bad_area_nosemaphore+0x109/0x210 [] bad_area_nosemaphore+0x13/0x20 [] __do_page_fault+0x80/0x3f0 [] do_page_fault+0x22/0x30 [] page_fault+0x28/0x30 [] ? free_block+0x119/0x190 [] drain_array+0xab/0x130 [] cache_reap+0x6c/0x230 [] process_one_work+0x151/0x400 [] worker_thread+0x11a/0x460 [] ? __schedule+0x2bf/0x880 [] ? rescuer_thread+0x2f0/0x2f0 [] kthread+0xc9/0xe0 [] ? kthread_park+0x60/0x60 [] ret_from_fork+0x3f/0x70 [] ? kthread_park+0x60/0x60 Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 0b 5e 7a 81 e8 e6 34 fe ff e9 d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 RIP [] kthread_data+0x10/0x20 RSP CR2: ffffffffffffffd8 ---[ end trace 99d22a980c39d4dc ]--- Fixing recursive fault but reboot is needed! There is nothing further logged on the console after, and any previous messages are concerning an NFS server. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897