From mboxrd@z Thu Jan  1 00:00:00 1970
From: Richie <listmail@triad.rr.com>
Subject: Re: Re: [Xen-devel] pci-passthrough in pvops causing
	offline raid
Date: Thu, 11 Nov 2010 12:40:24 -0500
Message-ID: <4CDC2A88.9040703@triad.rr.com>
References: <20101111102416.GA32457@campbell-lange.net>
	<20101111165340.GB30006@dumpdata.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-users-bounces@lists.xensource.com>
In-Reply-To: <20101111165340.GB30006@dumpdata.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>,
	<mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-users@lists.xensource.com>
List-Help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>,
	<mailto:xen-users-request@lists.xensource.com?subject=subscribe>
Sender: xen-users-bounces@lists.xensource.com
Errors-To: xen-users-bounces@lists.xensource.com
To: Mark Adams <mark@campbell-lange.net>
Cc: xen-devel@lists.xensource.com, xen-users@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On 11/11/2010 11:53 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 11, 2010 at 10:24:17AM +0000, Mark Adams wrote:
>    
>> Hi All,
>>
>> Running xen 4.0.1-rc6, debian squeeze 2.6.32-21.
>>
>> In a voip setup, where I have forwarded the onboard NIC interfaces
>> through to domU using the following grub config:
>>
>> module  /vmlinuz-2.6.32-5-xen-amd64 placeholder root=UUID=25c3ac79-6850-498d-afcf-ea42970e94fd ro  quiet xen-pciback.permissive xen-pciback.hide=(02:00.0)(03:00.0) pci=resource_alignment=02:00.0;03:00.0
>>
>> I'm having a serious issue where the raid card goes offline after an
>> indefinate period of time. Sometimes runs fine for a week, other times 1
>> day before I get "offline device" errors. Rebooting the machine fixes it
>> straight away, and everything is back online.
>>
>> What in the Xen pciback is causing the raid card to go offline? The
>> only devices hidden are the 2 onboard NIC's.
>>      
> You need to give more details. Is the RAID card a 3Ware? An LSI? Do you
> run with an IOMMU? When the RAID card goes offline, do you see a stop of
> IRQs going to the device? Are the IRQs for the RAID card sent to all of your
> CPUs or just a specific one? Are you pinning your guests to specific CPUs?
> Does the issue disappear if you don't passthrough the NIC interfaces? If so have
> you run this setup for "a week" to make sure?
>    
>> I know that this issue is with Xen, as I had this running on a different
>> server (same xen setup) and it had the same issues, which I initially
>> thought were to do with the raid card.
>>      
> So you never ran this setup on this kernel (2.6.32-5) without the Xen hypervisor?
>
>    
>> Is there known issues in this kernel and xen version with pciback? I'm
>>      
> No. It all works perfectly :-)
>
>    
>> going to update to the current package versions this evening (4.0.1-1
>> and 2.6.32-27) however would appreciate if anyone has any other insight
>> into this issue, or even just a note to say it is a bug that has been
>> fixed in current versions!
>>      
> Well, there were issues with the LSI cards having a hidden PCI device. But those
> are pretty obvious as you can't even use it correctly. There is also
> a problem with 3Ware 9506 IDE card - which on my box stops sending IRQs
> on the IOAPIC it has been assigned (28) and instead uses another one (17).
> Not sure if this is just the PCI card using the wrong PCI interrupt pin on the
> card and it ends up poking the wrong IOAPIC.
>
>    

Note:  I have no idea if this would be related to your issue or that my 
assessment is completely accurate.

I had an issue that I feel the debian squeeze kernel running under domU 
played a part in.  My dom0 is 2.6.34.7 Xenified w/Andrew lyon's patches 
and I running Xen 4.0.2-pre (xen testing). I passthrough a pci tuner 
card but have not considered that this could also contribute.

Sometimes when I shutdown the domU and upon halt I started getting 
libata style DRIVE_NOT_READY errors in my dom0.  Either one drive would 
drop from my mdadm raid (which houses my lvm filesystems including root 
for dom0 and domU) or perhaps they would drop and cause a panic.  A 
reboot fixes everything though a rebuild would occur.   I was not able 
to capture those errors in the few times it happened, but  I have since 
changed to use a 2.6.31 pvops kernel from jeremy's stable branch in my 
domU and I have yet to reproduce the issue.  I did note that it might 
take a number of days for the problem to manifest and so far I've tested 
a domU shutdown after 24 and 72 hours using the new kernel with no 
issues.  My next test is @ 7 days.

I wish I had more information myself, but I don't.  Regardless of the 
accuracy of this claim, I recommend trying other kernels to see if the 
problem persists.