From: Evgenii Shatokhin <eshatokhin@virtuozzo.com>
To: Bob Liu <bob.liu@oracle.com>
Cc: Juergen Gross <jgross@suse.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
George Dunlap <George.Dunlap@citrix.com>,
xen-devel@lists.xen.org, David Vrabel <david.vrabel@citrix.com>,
Konstantin Khorenko <khorenko@virtuozzo.com>,
Roger Pau Monne <roger.paumonne@citrix.com>
Subject: Re: [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711
Date: Wed, 10 Aug 2016 15:33:47 +0300 [thread overview]
Message-ID: <57AB1F2B.5080403@virtuozzo.com> (raw)
In-Reply-To: <57877FCF.1040105@oracle.com>
On 14.07.2016 15:04, Bob Liu wrote:
>
> On 07/14/2016 07:49 PM, Evgenii Shatokhin wrote:
>> On 11.07.2016 15:04, Bob Liu wrote:
>>>
>>>
>>> On 07/11/2016 04:50 PM, Evgenii Shatokhin wrote:
>>>> On 06.06.2016 11:42, Dario Faggioli wrote:
>>>>> Just Cc-ing some Linux, block, and Xen on CentOS people...
>>>>>
>>>>
>>>> Ping.
>>>>
>>>> Any suggestions how to debug this or what might cause the problem?
>>>>
>>>> Obviously, we cannot control Xen on the Amazon's servers. But perhaps there is something we can do at the kernel's side, is it?
>>>>
>>>>> On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote:
>>>>>> (Resending this bug report because the message I sent last week did
>>>>>> not
>>>>>> make it to the mailing list somehow.)
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> One of our users gets kernel panics from time to time when he tries
>>>>>> to
>>>>>> use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic
>>>>>> happens within minutes from the moment the instance starts. The
>>>>>> problem
>>>>>> does not show up every time, however.
>>>>>>
>>>>>> The user first observed the problem with a custom kernel, but it was
>>>>>> found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from
>>>>>> CentOS7 was affected as well.
>>>
>>> Please try this patch:
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7b0767502b5db11cb1f0daef2d01f6d71b1192dc
>>>
>>> Regards,
>>> Bob
>>>
>>
>> Unfortunately, it did not help. The same BUG_ON() in blkfront_setup_indirect() still triggers in our kernel based on RHEL's 3.10.0-327.18.2, where I added the patch.
>>
>> As far as I can see, the patch makes sure the indirect pages are added to the list only if (!info->feature_persistent) holds. I suppose it holds in our case and the pages are added to the list because the triggered BUG_ON() is here:
>>
>> if (!info->feature_persistent && info->max_indirect_segments) {
>> <...>
>> BUG_ON(!list_empty(&info->indirect_pages));
>> <...>
>> }
>>
>
> That's odd.
> Could you please try to reproduce this issue with a recent upstream kernel?
>
> Thanks,
> Bob
No luck with the upstream kernel 4.7.0 so far due to unrelated issues
(bad initrd, I suppose, so the system does not even boot).
However, the problem reproduced with the stable upstream kernel 3.14.74.
After the system booted the second time with this kernel, that BUG_ON
triggered:
kernel BUG at drivers/block/xen-blkfront.c:1701
>
>> So the problem is still out there somewhere, it seems.
>>
>> Regards,
>> Evgenii
>>
>>>>>>
>>>>>> The part of the system log he was able to retrieve is attached. Here
>>>>>> is
>>>>>> the bug info, for convenience:
>>>>>>
>>>>>> ------------------------------------
>>>>>> [ 2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711!
>>>>>> [ 2.246912] invalid opcode: 0000 [#1] SMP
>>>>>> [ 2.246912] Modules linked in: ata_generic pata_acpi
>>>>>> crct10dif_pclmul
>>>>>> crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel
>>>>>> xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul
>>>>>> glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc
>>>>>> dm_mirror
>>>>>> dm_region_hash dm_log dm_mod scsi_transport_iscsi
>>>>>> [ 2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted
>>>>>> 3.10.0-327.18.2.el7.x86_64 #1
>>>>>> [ 2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon
>>>>>> 12/07/2015
>>>>>> [ 2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti:
>>>>>> ffff8800e98bc000
>>>>>> [ 2.246912] RIP: 0010:[<ffffffffa015584f>] [<ffffffffa015584f>]
>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>> [ 2.246912] RSP: 0018:ffff8800e98bfcd0 EFLAGS: 00010283
>>>>>> [ 2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX:
>>>>>> 0000000000000020
>>>>>> [ 2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI:
>>>>>> ffff8800353e15d0
>>>>>> [ 2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09:
>>>>>> ffff8800eb403c00
>>>>>> [ 2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12:
>>>>>> ffff8800e98c4000
>>>>>> [ 2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15:
>>>>>> ffff8800353e15c0
>>>>>> [ 2.246912] FS: 0000000000000000(0000) GS:ffff8800efc20000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 2.246912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4:
>>>>>> 00000000001406e0
>>>>>> [ 2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>> 0000000000000000
>>>>>> [ 2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>>>>> 0000000000000400
>>>>>> [ 2.246912] Stack:
>>>>>> [ 2.246912] 0000000000000020 0000000000000001 00000020a0157217
>>>>>> 00000100e98bfdbc
>>>>>> [ 2.246912] 0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000
>>>>>> ffff8800e98c4000
>>>>>> [ 2.246912] ffff8800e98ce040 0000000000000001 ffff8800e98bfe08
>>>>>> ffffffffa0155d4c
>>>>>> [ 2.246912] Call Trace:
>>>>>> [ 2.246912] [<ffffffffa0155d4c>] blkback_changed+0x4ec/0xfc8
>>>>>> [xen_blkfront]
>>>>>> [ 2.246912] [<ffffffff813a6fd0>] ? xenbus_gather+0x170/0x190
>>>>>> [ 2.246912] [<ffffffff816322f5>] ? __slab_free+0x10e/0x277
>>>>>> [ 2.246912] [<ffffffff813a805d>]
>>>>>> xenbus_otherend_changed+0xad/0x110
>>>>>> [ 2.246912] [<ffffffff813a7257>] ? xenwatch_thread+0x77/0x180
>>>>>> [ 2.246912] [<ffffffff813a9ba3>] backend_changed+0x13/0x20
>>>>>> [ 2.246912] [<ffffffff813a7246>] xenwatch_thread+0x66/0x180
>>>>>> [ 2.246912] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
>>>>>> [ 2.246912] [<ffffffff813a71e0>] ?
>>>>>> unregister_xenbus_watch+0x1f0/0x1f0
>>>>>> [ 2.246912] [<ffffffff810a5aef>] kthread+0xcf/0xe0
>>>>>> [ 2.246912] [<ffffffff810a5a20>] ?
>>>>>> kthread_create_on_node+0x140/0x140
>>>>>> [ 2.246912] [<ffffffff81646118>] ret_from_fork+0x58/0x90
>>>>>> [ 2.246912] [<ffffffff810a5a20>] ?
>>>>>> kthread_create_on_node+0x140/0x140
>>>>>> [ 2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89
>>>>>> 45
>>>>>> b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e
>>>>>> fe
>>>>>> ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44
>>>>>> 00
>>>>>> [ 2.246912] RIP [<ffffffffa015584f>]
>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>> [ 2.246912] RSP <ffff8800e98bfcd0>
>>>>>> [ 2.491574] ---[ end trace 8a9b992812627c71 ]---
>>>>>> [ 2.495618] Kernel panic - not syncing: Fatal exception
>>>>>> ------------------------------------
>>>>>>
>>>>>> Xen version 4.2.
>>>>>>
>>>>>> EC2 instance type: c3.large with EBS magnetic storage, if that
>>>>>> matters.
>>>>>>
>>>>>> Here is the code where the BUG_ON triggers (drivers/block/xen-
>>>>>> blkfront.c):
>>>>>> ------------------------------------
>>>>>> if (!info->feature_persistent && info->max_indirect_segments) {
>>>>>> /*
>>>>>> * We are using indirect descriptors but not persistent
>>>>>> * grants, we need to allocate a set of pages that can be
>>>>>> * used for mapping indirect grefs
>>>>>> */
>>>>>> int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE;
>>>>>>
>>>>>> BUG_ON(!list_empty(&info->indirect_pages)); // << This one hits.
>>>>>> for (i = 0; i < num; i++) {
>>>>>> struct page *indirect_page = alloc_page(GFP_NOIO);
>>>>>> if (!indirect_page)
>>>>>> goto out_of_memory;
>>>>>> list_add(&indirect_page->lru, &info->indirect_pages);
>>>>>> }
>>>>>> }
>>>>>> ------------------------------------
>>>>>>
>>>>>> As we checked, 'info->indirect_pages' list indeed contained around
>>>>>> 30
>>>>>> elements at that point.
>>>>>>
>>>>>> Any ideas what may cause this and how to fix it?
>>>>>>
>>>>>> If any other data are needed, please let me know.
>>>>>>
>>>>>> References:
>>>>>> [1] https://bugs.openvz.org/browse/OVZ-6718
> .
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-08-10 12:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-06 8:24 [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711 Evgenii Shatokhin
2016-06-06 8:42 ` Dario Faggioli
2016-07-11 8:50 ` Evgenii Shatokhin
2016-07-11 10:37 ` George Dunlap
2016-07-11 14:34 ` Evgenii Shatokhin
2016-07-11 12:04 ` Bob Liu
2016-07-11 14:08 ` Evgenii Shatokhin
2016-07-14 11:49 ` Evgenii Shatokhin
2016-07-14 12:04 ` Bob Liu
2016-07-14 12:53 ` Evgenii Shatokhin
2016-08-10 12:33 ` Evgenii Shatokhin [this message]
2016-08-10 12:49 ` Bob Liu
2016-08-10 14:54 ` Evgenii Shatokhin
2016-08-11 2:10 ` Bob Liu
2016-08-11 7:45 ` Evgenii Shatokhin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57AB1F2B.5080403@virtuozzo.com \
--to=eshatokhin@virtuozzo.com \
--cc=George.Dunlap@citrix.com \
--cc=bob.liu@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=jgross@suse.com \
--cc=khorenko@virtuozzo.com \
--cc=roger.paumonne@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).