Just Cc-ing some Linux, block, and Xen on CentOS people... On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote: > (Resending this bug report because the message I sent last week did > not  > make it to the mailing list somehow.) > > Hi, > > One of our users gets kernel panics from time to time when he tries > to  > use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic  > happens within minutes from the moment the instance starts. The > problem  > does not show up every time, however. > > The user first observed the problem with a custom kernel, but it was  > found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from  > CentOS7 was affected as well. > > The part of the system log he was able to retrieve is attached. Here > is  > the bug info, for convenience: > > ------------------------------------ > [    2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711! > [    2.246912] invalid opcode: 0000 [#1] SMP > [    2.246912] Modules linked in: ata_generic pata_acpi > crct10dif_pclmul  > crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel  > xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul  > glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc > dm_mirror  > dm_region_hash dm_log dm_mod scsi_transport_iscsi > [    2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted  > 3.10.0-327.18.2.el7.x86_64 #1 > [    2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon > 12/07/2015 > [    2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti:  > ffff8800e98bc000 > [    2.246912] RIP: 0010:[]  []  > blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] > [    2.246912] RSP: 0018:ffff8800e98bfcd0  EFLAGS: 00010283 > [    2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX:  > 0000000000000020 > [    2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI:  > ffff8800353e15d0 > [    2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09:  > ffff8800eb403c00 > [    2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12:  > ffff8800e98c4000 > [    2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15:  > ffff8800353e15c0 > [    2.246912] FS:  0000000000000000(0000) GS:ffff8800efc20000(0000)  > knlGS:0000000000000000 > [    2.246912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [    2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4:  > 00000000001406e0 > [    2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2:  > 0000000000000000 > [    2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:  > 0000000000000400 > [    2.246912] Stack: > [    2.246912]  0000000000000020 0000000000000001 00000020a0157217  > 00000100e98bfdbc > [    2.246912]  0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000  > ffff8800e98c4000 > [    2.246912]  ffff8800e98ce040 0000000000000001 ffff8800e98bfe08  > ffffffffa0155d4c > [    2.246912] Call Trace: > [    2.246912]  [] blkback_changed+0x4ec/0xfc8  > [xen_blkfront] > [    2.246912]  [] ? xenbus_gather+0x170/0x190 > [    2.246912]  [] ? __slab_free+0x10e/0x277 > [    2.246912]  [] > xenbus_otherend_changed+0xad/0x110 > [    2.246912]  [] ? xenwatch_thread+0x77/0x180 > [    2.246912]  [] backend_changed+0x13/0x20 > [    2.246912]  [] xenwatch_thread+0x66/0x180 > [    2.246912]  [] ? wake_up_atomic_t+0x30/0x30 > [    2.246912]  [] ? > unregister_xenbus_watch+0x1f0/0x1f0 > [    2.246912]  [] kthread+0xcf/0xe0 > [    2.246912]  [] ? > kthread_create_on_node+0x140/0x140 > [    2.246912]  [] ret_from_fork+0x58/0x90 > [    2.246912]  [] ? > kthread_create_on_node+0x140/0x140 > [    2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89 > 45  > b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e > fe  > ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 > 00 > [    2.246912] RIP  []  > blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] > [    2.246912]  RSP > [    2.491574] ---[ end trace 8a9b992812627c71 ]--- > [    2.495618] Kernel panic - not syncing: Fatal exception > ------------------------------------ > > Xen version 4.2. > > EC2 instance type: c3.large with EBS magnetic storage, if that > matters. > > Here is the code where the BUG_ON triggers (drivers/block/xen- > blkfront.c): > ------------------------------------ > if (!info->feature_persistent && info->max_indirect_segments) { >      /* >          * We are using indirect descriptors but not persistent >          * grants, we need to allocate a set of pages that can be >          * used for mapping indirect grefs >          */ >      int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE; > >      BUG_ON(!list_empty(&info->indirect_pages)); // << This one hits. >      for (i = 0; i < num; i++) { >          struct page *indirect_page = alloc_page(GFP_NOIO); >          if (!indirect_page) >              goto out_of_memory; >          list_add(&indirect_page->lru, &info->indirect_pages); >      } > } > ------------------------------------ > > As we checked, 'info->indirect_pages' list indeed contained around > 30  > elements at that point. > > Any ideas what may cause this and how to fix it? > > If any other data are needed, please let me know. > > Regards, > Evgenii > > References: > [1] https://bugs.openvz.org/browse/OVZ-6718 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)