From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Garron Subject: Re: Making snapshot of logical volumes handling HVM domU causes OOPS and instability Date: Tue, 31 Aug 2010 04:16:09 -0400 Message-ID: <4C7CBA49.2030306@sce.pridelands.org> References: <4C7864BB.1010808@sce.pridelands.org> <4C7BE1C6.5030602@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Xu, Dongxiao" Cc: Jeremy Fitzhardinge , "xen-devel@lists.xensource.com" , Daniel Stodden List-Id: xen-devel@lists.xenproject.org >> Scott Garron wrote: >>> Another issue that comes up is that if I run the 2.6.32.18 pvops >>> kernel for my Linux domUs, after a time (usually only about an >>> hour or so), the network interfaces stop responding. > Jeremy Fitzhardinge wrote: >> That's a separate problem in netfront that appears to be a bug in >> the "smartpoll" code. I think Dongxiao is looking into it. On 8/31/2010 2:59 AM, Xu, Dongxiao wrote: > Yes, I tried to reproduce these days, however I could catch it > locally. I tried both netperf and ping for a long time, but the bug > is not triggered. What workload are you using when met the bug? I'd say that the whole machine is under moderate to high utilization because it has 10 virtual machines running - three of which are Windows 2008 Servers as HVM guests. However, as far as the "load" goes, most of the virtual machines are fairly idle and probably not under much stress, overall. Just to give you an idea, we have a 10Mbit/s connection to the Internet, and this server's physical network interface (all 10 of the domUs' traffic, combined) usually accounts for less than 2Mbit/s of the outbound traffic at any given point in the day. Aside from Windows being Windows (the HVM guests are running graphical desktops), I wouldn't say that any of them cause a high CPU load, either. Database load is fairly low to moderate on guests running MySQL and/or PostgreSQL. The only guest that seems to use more CPU and RAM is one serving e-mail, and that's because it runs ClamAV and SpamAssassin. That e-mail server was one that kept its network connectivity the longest, though (after a few hours, it did stop responding, but that was after some guests with lighter loads stopped responding). An observation that I made, and it may just be coincidental, but at least noteworthy, is that the virtual machines that are assigned less RAM seem to lose connectivity more quickly than those with more RAM. The most recent time that I was able to trigger the bug, the virtual machine that lost connectivity was only assigned 384MB RAM, running 2.6.32.18. At the time, the rest of my paravirtualized guests were running 2.6.31.14, and they didn't experience the problem. I've previously triggered the bug in multiple domUs that were running a more recent kernel (I think it was 2.6.32.17 - before I reverted to a netback-patched 2.6.31.14 kernel), and the first ones to disappear from the network were ones that were only assigned 256MB. Eventually, they all disappeared, though. The only "load" on one of the first to disappear is an installation of bind9, servicing about 50 domain names - none of which receive an abnormally high hit count. The first time I noticed the problem, I had started 7 paravirtualized guests, of varying memory assignments. The moment I started the 8th guest, an HVM Windows 2008 Server, the networking on all of the running of the guests (the paravirt ones) stopped responding at the same time. That may also be something to try/look at. After a reboot, I avoided starting any of the HVM guests, and the connectivity lasted a couple of hours on the 7 running paravirt guests, but started disappearing one guest at a time, over the course of the next few hours. I didn't mention in my previous e-mail that in order to get networking to work in a stable fashion in the 2.6.31.14 kernel (the one I reverted to), I had to apply the patch mentioned here: http://lists.xensource.com/archives/html/xen-devel/2010-05/msg01570.html Otherwise, networking became unstable immediately at the time of guest creation. That patch was already applied to the 2.6.32.18 kernel that is giving me the eventual network loss problems, though. More specifics about my configuration can be found here: http://www.pridelands.org/~simba/hurricane-server.txt -- Scott Garron