All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Anthony Wright <anthony@overnetdata.com>,
	Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: Todd Deshane <todd.deshane@xen.org>, xen-devel@lists.xensource.com
Subject: Re: phy disks and vifs timing out in DomU
Date: Fri, 29 Jul 2011 16:01:03 -0400	[thread overview]
Message-ID: <20110729200103.GA25148@dumpdata.com> (raw)
In-Reply-To: <4E32FE8A.1060908@overnetdata.com>


[Ian, I copied you on this b/c of the netbk issue - read on]

> >>>>> On Thu, Jul 28, 2011 at 7:24 AM, Anthony Wright <anthony@overnetdata.com> wrote:
> >>>>>> I have a 32 bit 3.0 Dom0 kernel running Xen 4.1. I am trying to run a 32 bit PV DomU with two tap:aio disks, two phy disks & 1 vif. The two tap:aio disks are working fine, but the phy disks and the vif don't work and I get the following error messages from the DomU kernel during boot:
> >>>>>>
> >>>>>> [    1.783658] Using IPI No-Shortcut mode
> >>>>>> [   11.880061] XENBUS: Timeout connecting to device: device/vbd/51729 (state 3)
> >>>>>> [   11.880072] XENBUS: Timeout connecting to device: device/vbd/51745 (state 3)

Hm, which version of DomU were these? I wonder if this is related to the 'feature-barrier'
that is not supported with 3.0. Do you see anything in the DomU about the disks?
or xen-blkfront? Can you run the guests with 'initcall_debug loglevel=8 debug' to see
if if the blkfront is actually running on those disks.

Any idea where the source for those DomU's is? If it is an issue with 'feature-barrier'
it looks like it can't handle not having that option visible which it should.


> > What device does that correspond to (hint: run xl block-list or xm block-list)?
> >
> The output from block-list is:
> 
> Vdev  BE  handle state evt-ch ring-ref BE-path
> 51729 0   764    3     10     10       /local/domain/0/backend/vbd/764/51729
> 51745 0   764    3     11     11       /local/domain/0/backend/vbd/764/51745
> 51713 0   764    4     8      8       
> /local/domain/0/backend/qdisk/764/51713
> 51714 0   764    4     9      9       
> /local/domain/0/backend/qdisk/764/51714
> 
> The two vbds map to two LVM logical volumes in two different volume groups.

qdisk.. ok so it does swap over to QEMU internal AIO path. From the output it looks
like the ones that hang are the 'phy' types? Is that right?

> 
> On 29/07/2011 17:06, Konrad Rzeszutek Wilk wrote:
> >> > I have installed virtually identical systems on two physical machines -
> >> > identical (and I mean identical) xen, dom0, domU with possibly a
> > md5sum match?
> Yes - md5sum match on all the key components, i.e. xen, dom0 kernel,
> 99.9% of the root filesystem, the domU kernel & 99.9% of the domU
> filesystem. Where there isn't a precise match is on some of the config
> files. I don't think these should have any effect, but I will have a go
> at mirroring the disks (I can't swap disks since one is SATA & the other
> IDE).
> 
> I also was having problems with the vif device, and got a kernel bug
> report that could potentially relate to it. I've attached two syslogs.

Yeah, that is bad. I actually see a similar issue if I kill forcibly the guests.
I hadn't yet narrowed it down - .. you are looking to be using 4.1.. But not 
4.1.1 right?

Can you describe to me how you get the netbk crash?

> 2011 Jul 29 07:02:10 kernel: [   33.242680] vbd vbd-1-51745: 1 mapping ring-ref 11 port 11                                                                
> 2011 Jul 29 07:02:10 kernel: [   33.253038] vif vif-1-0: vif1.0: failed to map tx ring. err=-12 status=-1                                                 
> 2011 Jul 29 07:02:10 kernel: [   33.253065] vif vif-1-0: 1 mapping shared-frames 768/769 port 12                                                          
> 2011 Jul 29 07:02:43 kernel: [   66.103514] vif vif-1-0: 2 reading script                                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.106265] br-internal: port 1(vif1.0) entering disabled state                                                           
> 2011 Jul 29 07:02:43 kernel: [   66.106309] libfcoe_device_notification: NETDEV_UNREGISTER vif1.0                                                         
> 2011 Jul 29 07:02:43 kernel: [   66.106333] br-internal: port 1(vif1.0) entering disabled state                                                           
> 2011 Jul 29 07:02:43 kernel: [   66.106372] br-internal: mixed no checksumming and other settings.                                                        
> 2011 Jul 29 07:02:43 kernel: [   66.114097] ------------[ cut here ]------------                                                                          
> 2011 Jul 29 07:02:43 kernel: [   66.114878] kernel BUG at mm/vmalloc.c:2164!                                                                              
> 2011 Jul 29 07:02:43 kernel: [   66.115058] invalid opcode: 0000 [#1] SMP                                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Modules linked in:                                                                                            
> 2011 Jul 29 07:02:43 kernel: [   66.115376]                                                                                                               
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Pid: 20, comm: xenwatch Not tainted 3.0.0 #1 MSI MS-7309/MS-7309                                              
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP: 0061:[<c0494bff>] EFLAGS: 00010203 CPU: 1                                                                
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EIP is at free_vm_area+0xf/0x19                                                                               
> 2011 Jul 29 07:02:43 kernel: [   66.115376] EAX: 00000000 EBX: cf866480 ECX: 00000018 EDX: 00000000                                                       
> 2011 Jul 29 07:02:43 kernel: [   66.115376] ESI: cfa06800 EDI: d076c400 EBP: cfa06c00 ESP: d0ce7eb4                                                       
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Process xenwatch (pid: 20, ti=d0ce6000 task=d0c55140 task.ti=d0ce6000)                                        
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Stack:                                                                                                        
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 c09e87aa fffc6e63 c0c4bd65 d0ce7ecc cfa06844 d0ce7ecc d0ce7ecc                                      
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  cfa06c00 cfa06800 d076c400 cfa06c94 c09eace0 d04cd380 00000000 fffffffe                                      
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  d0ce7f9c c061fe74 d04cd2e0 d076c420 d076c400 d0ce7f9c c09e9f8c d076c400                                      
> 2011 Jul 29 07:02:43 kernel: [   66.115376] Call Trace:                                                                                                   
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e87aa>] ? xen_netbk_unmap_frontend_rings+0xbf/0xd3                                                      
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0c4bd65>] ? netdev_run_todo+0x1b7/0x1cc                                                                   
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09eace0>] ? xenvif_disconnect+0xd0/0xe4                                                                   
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061fe74>] ? xenbus_rm+0x37/0x3e                                                                           
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c09e9f8c>] ? netback_remove+0x40/0x5d                                                                      
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c062075d>] ? xenbus_dev_remove+0x2c/0x3d                                                                   
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06620e6>] ? __device_release_driver+0x42/0x79                                                             
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06621ac>] ? device_release_driver+0xf/0x17                                                                
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0661818>] ? bus_remove_device+0x75/0x84                                                                   
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0660693>] ? device_del+0xe6/0x125                                                                         
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06606da>] ? device_unregister+0x8/0x10                                                                    
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c06205f0>] ? xenbus_dev_changed+0x71/0x129                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c0405394>] ? check_events+0x8/0xc                                                                          
> 2011 Jul 29 07:02:43 kernel: [   66.115376]  [<c061f711>] ? xenwatch_thread+0xeb/0x113                                                                    
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c044792c>] ? wake_up_bit+0x53/0x53                                                                         
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c061f626>] ? xenbus_thread+0x1cc/0x1cc                                                                     
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0447616>] ? kthread+0x63/0x68                                                                             
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c04475b3>] ? kthread_worker_fn+0x122/0x122                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.129624]  [<c0e0f036>] ? kernel_thread_helper+0x6/0x10                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.129624] Code: c1 00 00 00 01 89 f0 e8 a1 ff ff ff 81 6b 08 00 10 00 00 eb 02 31 db 89 d8 5b 5e c3 53 89 c3 8b 40 04 e8 9b ff ff ff 39 d8 74 04 <0f> 0b eb fe 5b e9 73 95 00 00 57 89 d7 56 31 f6 53 89 c3 eb 09                                                                 
> 2011 Jul 29 07:02:43 kernel: [   66.129624] EIP: [<c0494bff>] free_vm_area+0xf/0x19 SS:ESP 0069:d0ce7eb4                                                  
> 2011 Jul 29 07:02:43 kernel: [   66.129624] ---[ end trace 7bb110af96f32256 ]---                                                                          

  reply	other threads:[~2011-07-29 20:01 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <29902981.10.1311837224851.JavaMail.root@zimbra.overnetdata.com>
2011-07-28  7:24 ` phy disks and vifs timing out in DomU Anthony Wright
2011-07-28 15:01   ` Todd Deshane
2011-07-28 15:36     ` Anthony Wright
2011-07-28 15:46       ` Todd Deshane
2011-07-28 16:00         ` Anthony Wright
2011-07-29 15:55           ` Konrad Rzeszutek Wilk
2011-07-29 18:40             ` Anthony Wright
2011-07-29 20:01               ` Konrad Rzeszutek Wilk [this message]
2011-07-30 17:05                 ` Anthony Wright
2011-08-01 11:03                   ` Anthony Wright
2011-07-28 16:28       ` Ian Campbell
2011-07-29  7:53         ` Kernel bug from 3.0 (was phy disks and vifs timing out in DomU) Anthony Wright
2011-08-03 15:28           ` Konrad Rzeszutek Wilk
2011-08-09 16:35             ` Konrad Rzeszutek Wilk
2011-08-19 10:22             ` Anthony Wright
2011-08-19 12:56               ` Konrad Rzeszutek Wilk
2011-08-22 11:02                 ` Anthony Wright
2011-08-25 20:31                 ` Anthony Wright
2011-08-26 14:26                   ` Konrad Rzeszutek Wilk
2011-08-26 14:44                     ` Konrad Rzeszutek Wilk
2011-08-29 12:13                       ` Anthony Wright
2011-08-31 16:58                       ` David Vrabel
2011-08-31 17:07                         ` Konrad Rzeszutek Wilk
2011-09-01  7:42                           ` Ian Campbell
2011-09-01 14:23                             ` Konrad Rzeszutek Wilk
2011-09-01 15:12                               ` David Vrabel
2011-09-01 15:37                                 ` Konrad Rzeszutek Wilk
2011-09-01 15:43                                   ` Ian Campbell
2011-09-01 16:07                                     ` Konrad Rzeszutek Wilk
2011-09-07 12:57                                 ` Anthony Wright
2011-09-07 18:35                                   ` Konrad Rzeszutek Wilk
2011-09-01 15:12                               ` Ian Campbell
2011-09-01 15:38                                 ` Konrad Rzeszutek Wilk
2011-09-01 15:44                                   ` Ian Campbell
2011-09-01 17:34                                     ` Jeremy Fitzhardinge
2011-09-01 19:19                                       ` Ian Campbell
2011-09-01 17:32                             ` Jeremy Fitzhardinge
2011-09-01 19:21                               ` Ian Campbell
2011-09-01 20:34                                 ` Jeremy Fitzhardinge
2011-09-02  7:17                                   ` Ian Campbell
2011-09-02 20:26                                     ` Jeremy Fitzhardinge
2011-09-03 10:27                                       ` Ian Campbell
2011-09-23 12:35                                         ` Anthony Wright
2011-09-23 12:49                                           ` David Vrabel
2011-08-29 17:33                     ` Anthony Wright
2011-08-25 21:11                 ` Anthony Wright
2011-08-26  7:10                   ` Sander Eikelenboom
2011-08-26 11:23                     ` Pasi Kärkkäinen
2011-08-26 12:16                   ` Stefano Stabellini
2011-08-26 12:15                     ` Anthony Wright
2011-08-26 12:32                       ` Stefano Stabellini
2011-07-29 15:48         ` phy disks and vifs timing out in DomU (only on certain hardware) Anthony Wright
2011-07-29 16:06           ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110729200103.GA25148@dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=anthony@overnetdata.com \
    --cc=todd.deshane@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.