xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Steven Haigh <netwiz@crc.id.au>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU
Date: Sat, 2 Apr 2016 12:50:07 +1100	[thread overview]
Message-ID: <56FF254F.3050103__38527.4805514657$1459561922$gmane$org@crc.id.au> (raw)
In-Reply-To: <56FA8DDD.7070406@oracle.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 10386 bytes --]

On 30/03/2016 1:14 AM, Boris Ostrovsky wrote:
> On 03/29/2016 04:56 AM, Steven Haigh wrote:
>>
>> Interestingly enough, this just happened again - but on a different
>> virtual machine. I'm starting to wonder if this may have something to do
>> with the uptime of the machine - as the system that this seems to happen
>> to is always different.
>>
>> Destroying it and monitoring it again has so far come up blank.
>>
>> I've thrown the latest lot of kernel messages here:
>>      http://paste.fedoraproject.org/346802/59241532
> 
> Would be good to see full console log. The one that you posted starts
> with an error so I wonder what was before that.

Ok, so I had a virtual machine do this again today. Both vcpus went to
100% usage and essentially hung. I attached to the screen console that
was connected via 'xl console' and copied the entire buffer to paste below:

yum-cron[30740]: segfault at 1781ab8 ip 00007f2a7fcd282f sp
00007ffe8655fe90 error 5 in libpython2.7.so.1.0[7f2a7fbf5000+178000]
swap_free: Bad swap file entry 2a2b7d5bb69515d8
BUG: Bad page map in process yum-cron  pte:56fab76d2a2bb06a pmd:0309e067
addr:0000000001780000 vm_flags:00100073 anon_vma:ffff88007b974c08
mapping:          (null) index:1780
file:          (null) fault:          (null) mmap:          (null)
readpage:          (null)
CPU: 0 PID: 30740 Comm: yum-cron Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
 0000000000000000 ffff88004176bac0 ffffffff81323d17 0000000001780000
 00003ffffffff000 ffff88004176bb08 ffffffff8117e574 ffffffff81193d6e
 0000000000001780 ffff88000309ec00 0000000001780000 56fab76d2a2bb06a
Call Trace:
 [<ffffffff81323d17>] dump_stack+0x63/0x8c
 [<ffffffff8117e574>] print_bad_pte+0x1e4/0x290
 [<ffffffff81193d6e>] ? swap_info_get+0x7e/0xe0
 [<ffffffff8117fdef>] unmap_single_vma+0x4ff/0x840
 [<ffffffff81180837>] unmap_vmas+0x47/0x90
 [<ffffffff811891f8>] exit_mmap+0x98/0x150
 [<ffffffff8107a557>] mmput+0x47/0x100
 [<ffffffff8107f81e>] do_exit+0x24e/0xad0
 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0
 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0
 [<ffffffff81017208>] do_signal+0x28/0x630
 [<ffffffff81152d7d>] ? printk+0x4d/0x4f
 [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30
 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a
 [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0
 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95
 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20
 [<ffffffff8165cae5>] retint_user+0x8/0x13
BUG: Bad page state in process yum-cron  pfn:0f3bf
page:ffffea00003cefc0 count:0 mapcount:7 mapping:ffff88000f3bf008
index:0xffff88000f3bf000
flags: 0x10000000000094(referenced|dirty|slab)
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags:
flags: 0x80(slab)
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 0 PID: 30740 Comm: yum-cron Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
 0000000000000000 ffff88004176b958 ffffffff81323d17 ffffea00003cefc0
 ffffffff817ab348 ffff88004176b980 ffffffff811c1ab5 ffffea00003cefc0
 0000000000000001 0000000000000000 ffff88004176b9d0 ffffffff81159584
Call Trace:
 [<ffffffff81323d17>] dump_stack+0x63/0x8c
 [<ffffffff811c1ab5>] bad_page.part.69+0xdf/0xfc
 [<ffffffff81159584>] free_pages_prepare+0x294/0x2a0
 [<ffffffff8115b6c1>] free_hot_cold_page+0x31/0x160
 [<ffffffff8115b839>] free_hot_cold_page_list+0x49/0xb0
 [<ffffffff811621b5>] release_pages+0xc5/0x260
 [<ffffffff8119380d>] free_pages_and_swap_cache+0x7d/0x90
 [<ffffffff8117ddd6>] tlb_flush_mmu_free+0x36/0x60
 [<ffffffff8117ff54>] unmap_single_vma+0x664/0x840
 [<ffffffff81180837>] unmap_vmas+0x47/0x90
 [<ffffffff811891f8>] exit_mmap+0x98/0x150
 [<ffffffff8107a557>] mmput+0x47/0x100
 [<ffffffff8107f81e>] do_exit+0x24e/0xad0
 [<ffffffff8108011f>] do_group_exit+0x3f/0xa0
 [<ffffffff8108aa03>] get_signal+0x1c3/0x5e0
 [<ffffffff81017208>] do_signal+0x28/0x630
 [<ffffffff81152d7d>] ? printk+0x4d/0x4f
 [<ffffffff810c735f>] ? vprintk_default+0x1f/0x30
 [<ffffffff8107938c>] ? bad_area_access_error+0x43/0x4a
 [<ffffffff810603ac>] ? __do_page_fault+0x22c/0x3f0
 [<ffffffff8107808d>] exit_to_usermode_loop+0x4c/0x95
 [<ffffffff81003878>] prepare_exit_to_usermode+0x18/0x20
 [<ffffffff8165cae5>] retint_user+0x8/0x13
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:0 val:-1
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:1 val:2
BUG: Bad rss-counter state mm:ffff88007b99e4c0 idx:2 val:-1
yum-cron[4197]: segfault at 32947fcb ip 00007ff0fa1bf8bd sp
00007ffdb1c54990 error 4 in libpython2.7.so.1.0[7ff0fa13a000+178000]
BUG: unable to handle kernel paging request at ffff88010f3beffe
IP: [<ffffffff811a7fb9>] free_block+0x119/0x190
PGD 188b063 PUD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G    B
4.4.6-4.el7xen.x86_64 #1
Workqueue: events cache_reap
task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000
RIP: 0010:[<ffffffff811a7fb9>]  [<ffffffff811a7fb9>] free_block+0x119/0x190
RSP: 0018:ffff880051703d40  EFLAGS: 00010082
RAX: ffffea00003cefc0 RBX: ffffea0000000000 RCX: ffff88000f3bf000
RDX: 00000000fffffffe RSI: ffff88007fd19c40 RDI: ffff88007d012100
RBP: ffff880051703d68 R08: ffff880051703d88 R09: 0000000000000006
R10: ffff88007d01a940 R11: 0000000080000000 R12: 000077ff80000000
R13: ffff88007fd19c58 R14: ffff88007d01a948 R15: ffff88007d01a968
FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88010f3beffe CR3: 000000007bc8d000 CR4: 0000000000040620
Stack:
 ffff88007fd19bf0 ffff880051703d88 ffff88007d01a940 000000000000000b
 ffff88007d012100 ffff880051703dc0 ffffffff811a871b 0000000b51703dd8
 ffff88007fd19c00 ffff880051703d88 ffff880051703d88 ffff88007d012100
Call Trace:
 [<ffffffff811a871b>] drain_array+0xab/0x130
 [<ffffffff811a955c>] cache_reap+0x6c/0x230
 [<ffffffff81093831>] process_one_work+0x151/0x400
 [<ffffffff8109437a>] worker_thread+0x11a/0x460
 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
 [<ffffffff810993d9>] kthread+0xc9/0xe0
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
Code: 41 89 d1 4c 0f af c9 0f b6 4f 1c 49 c1 e9 20 44 29 ca d3 ea 0f b6
4f 1d 41 01 d1 8b 50 18 41 d3 e9 48 8b 48 10 83 ea 01 89 50 18 <44> 88
0c 11 49 8b 52 38 48 83 c2 01 49 89 52 38 8b 48 18 85 c9
RIP  [<ffffffff811a7fb9>] free_block+0x119/0x190
 RSP <ffff880051703d40>
CR2: ffff88010f3beffe
---[ end trace 99d22a980c39d4db ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff81099b40>] kthread_data+0x10/0x20
PGD 188d067 PUD 188f067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache
x86_pkg_temp_thermal coretemp crct10dif_pclmul aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd pcspkr nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables xen_netfront crc32c_intel
xen_gntalloc xen_evtchn ipv6 autofs4
CPU: 1 PID: 8519 Comm: kworker/1:2 Tainted: G    B D
4.4.6-4.el7xen.x86_64 #1
task: ffff8800346bf1c0 ti: ffff880051700000 task.ti: ffff880051700000
RIP: 0010:[<ffffffff81099b40>]  [<ffffffff81099b40>] kthread_data+0x10/0x20
RSP: 0018:ffff880051703a60  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
RDX: ffffffff81bd4080 RSI: 0000000000000001 RDI: ffff8800346bf1c0
RBP: ffff880051703a60 R08: ffff88007c8e19d0 R09: 0000c88ed0b2b602
R10: 000000000000001f R11: 0000000000000000 R12: ffff8800346bf1c0
R13: 0000000000000000 R14: 00000000000163c0 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 000000007bc8d000 CR4: 0000000000040620
Stack:
 ffff880051703a78 ffffffff81094741 ffff88007fd163c0 ffff880051703ac0
 ffffffff8165858f ffffffff810d2e27 ffff8800346bf1c0 ffff880051704000
 ffff8800346bf510 0000000000000046 ffff8800346bf1c0 ffff88007d3facc0
Call Trace:
 [<ffffffff81094741>] wq_worker_sleeping+0x11/0x90
 [<ffffffff8165858f>] __schedule+0x3bf/0x880
 [<ffffffff810d2e27>] ? call_rcu_sched+0x17/0x20
 [<ffffffff81658a85>] schedule+0x35/0x80
 [<ffffffff8107fc29>] do_exit+0x659/0xad0
 [<ffffffff8101a60a>] oops_end+0x9a/0xd0
 [<ffffffff8105f7ad>] no_context+0x10d/0x360
 [<ffffffff8105fb09>] __bad_area_nosemaphore+0x109/0x210
 [<ffffffff8105fc23>] bad_area_nosemaphore+0x13/0x20
 [<ffffffff81060200>] __do_page_fault+0x80/0x3f0
 [<ffffffff81060592>] do_page_fault+0x22/0x30
 [<ffffffff8165e0e8>] page_fault+0x28/0x30
 [<ffffffff811a7fb9>] ? free_block+0x119/0x190
 [<ffffffff811a871b>] drain_array+0xab/0x130
 [<ffffffff811a955c>] cache_reap+0x6c/0x230
 [<ffffffff81093831>] process_one_work+0x151/0x400
 [<ffffffff8109437a>] worker_thread+0x11a/0x460
 [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
 [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
 [<ffffffff810993d9>] kthread+0xc9/0xe0
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
 [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
 [<ffffffff81099310>] ? kthread_park+0x60/0x60
Code: ff ff eb 92 be 3a 02 00 00 48 c7 c7 0b 5e 7a 81 e8 e6 34 fe ff e9
d5 fe ff ff 90 66 66 66 66 90 55 48 8b 87 00 04 00 00 48 89 e5 <48> 8b
40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff81099b40>] kthread_data+0x10/0x20
 RSP <ffff880051703a60>
CR2: ffffffffffffffd8
---[ end trace 99d22a980c39d4dc ]---
Fixing recursive fault but reboot is needed!

There is nothing further logged on the console after, and any previous
messages are concerning an NFS server.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

      parent reply	other threads:[~2016-04-02  1:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  2:53 4.4: INFO: rcu_sched self-detected stall on CPU Steven Haigh
2016-03-25 12:23 ` Boris Ostrovsky
     [not found] ` <56F52DBF.5080006@oracle.com>
2016-03-25 14:05   ` Steven Haigh
     [not found]   ` <56F545B1.8080609@crc.id.au>
2016-03-25 14:44     ` Boris Ostrovsky
     [not found]     ` <56F54EE0.6030004@oracle.com>
2016-03-25 16:04       ` Steven Haigh
     [not found]       ` <56F56172.9020805@crc.id.au>
2016-03-25 16:20         ` Boris Ostrovsky
     [not found]         ` <56F5653B.1090700@oracle.com>
2016-03-25 21:07           ` Steven Haigh
     [not found]           ` <56F5A87A.8000903@crc.id.au>
2016-03-29  8:56             ` Steven Haigh
     [not found]             ` <56FA4336.2030301@crc.id.au>
2016-03-29 14:14               ` Boris Ostrovsky
     [not found]               ` <56FA8DDD.7070406@oracle.com>
2016-03-29 17:44                 ` Steven Haigh
     [not found]                 ` <56FABF17.7090608@crc.id.au>
2016-03-29 18:04                   ` Steven Haigh
     [not found]                   ` <56FAC3AC.9050802@crc.id.au>
2016-03-30 13:44                     ` Boris Ostrovsky
2016-05-02 20:54                     ` gregkh
2016-04-02  1:50                 ` Steven Haigh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='56FF254F.3050103__38527.4805514657$1459561922$gmane$org@crc.id.au' \
    --to=netwiz@crc.id.au \
    --cc=boris.ostrovsky@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).