From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Adams Subject: Re: [Xen-devel] pci-passthrough in pvops causing offline raid Date: Thu, 11 Nov 2010 17:38:50 +0000 Message-ID: <20101111173850.GA8756@campbell-lange.net> References: <20101111102416.GA32457@campbell-lange.net> <20101111165340.GB30006@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20101111165340.GB30006@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-users-bounces@lists.xensource.com Errors-To: xen-users-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Thu, Nov 11, 2010 at 11:53:40AM -0500, Konrad Rzeszutek Wilk wrote: > On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote: > > Hi All, > > > > Running xen 4.0.1-rc6, debian squeeze 2.6.32-21. > > > > In a voip setup, where I have forwarded the onboard NIC interfaces > > through to domU using the following grub config: > > > > module /vmlinuz-2.6.32-5-xen-amd64 placeholder root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro quiet xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0 > > > > I'm having a serious issue where the raid card goes offline after an > > indefinate period of time. Sometimes runs fine for a week, other times 1 > > day before I get "offline device" errors. Rebooting the machine fixes it > > straight away, and everything is back online. > > > > What in the Xen pciback is causing the raid card to go offline? The > > only devices hidden are the 2 onboard NIC's. > > You need to give more details. Is the RAID card a 3Ware? An LSI? Do you > run with an IOMMU? When the RAID card goes offline, do you see a stop of > IRQs going to the device? Are the IRQs for the RAID card sent to all of your > CPUs or just a specific one? Are you pinning your guests to specific CPUs? > Does the issue disappear if you don't passthrough the NIC interfaces? If so have > you run this setup for "a week" to make sure? It is an Areca 1220. I can't see anything when the device goes offline apart from [77324.264270] sd 0:0:0:1: rejecting I/O to offline device [77334.005854] sd 0:0:0:0: rejecting I/O to offline device Unfortunately nothing get's logged because there is nothing to write to anymore. I'm not sure how I can see the IRQs otherwise. There is no pinning being done at all, and the machine was running for a few months OK before the pciback was added. Is my kernel module line correct above? are the xen-pciback.permissive and resource_alignment options required? Also I am passing through the onboard NIC's - is this something that should be avoided or is it ok to do? > > > > I know that this issue is with Xen, as I had this running on a different > > server (same xen setup) and it had the same issues, which I initially > > thought were to do with the raid card. > > So you never ran this setup on this kernel (2.6.32-5) without the Xen hypervisor? no, its always had the hypervisor - but it was running ok before the pciback options were added. This week, it's seemed to happen approximately every 24 hours. > > > > > Is there known issues in this kernel and xen version with pciback? I'm > > No. It all works perfectly :-) > > > going to update to the current package versions this evening (4.0.1-1 > > and 2.6.32-27) however would appreciate if anyone has any other insight > > into this issue, or even just a note to say it is a bug that has been > > fixed in current versions! > > Well, there were issues with the LSI cards having a hidden PCI device. But those > are pretty obvious as you can't even use it correctly. There is also > a problem with 3Ware 9506 IDE card - which on my box stops sending IRQs > on the IOAPIC it has been assigned (28) and instead uses another one (17). > Not sure if this is just the PCI card using the wrong PCI interrupt pin on the > card and it ends up poking the wrong IOAPIC. Thanks, Mark