State of GPLPV tests

* State of GPLPV tests - 28.11.11
@ 2011-11-28 13:49 Andreas Kinzler
  2011-11-28 23:16 ` James Harper
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Kinzler @ 2011-11-28 13:49 UTC (permalink / raw)
  To: James Harper, xen-devel

Hello James,

I am still running tests 7 days a week on two test systems. Results are 
quite discouraging though. After experiencing crash after crash I wanted 
to test if the configuration I called "stable" (Xen 4.0.1, GPLPV 
0.11.0.213, dom0 kernel 2.6.32.18-pvops0-ak3) was stable indeed. But 
even that config crashed when running my torture test. It is stable on 
our production systems - running other workloads of course.

 > One thing I thought of... virtualisation gives an interesting
 > opportunity to exaggerate race conditions. If you have 8 vCPU's in a
 > DomU but only let one or two physical CPUs service those 8 vCPU's,then
 > it can give rise to race conditions which could only be rarely seen
 > (or never seen) in normal operation. It's awful for performance but
 > if you could try that and see if it gives rise to crashes a bit
 > more frequently it might help us track down the problem.

What exactly is the config you are talking about in terms of Xen/dom0 
command line? In terms of domU config files?

As always, I monitor your mercurial repo ;-) How would you see the 
relationship of commits 952+953 to our problem? 952 seems to affect LSO 
in some way since LsoV1TransmitComplete.TcpPayload is finally wrong 
(could it be negative since tx_length is smaller than the fixed 
tx_length?). What about 953?

One more thought: As mentioned earlier crashes often occurred after an 
uptime of 9-10 days and these crashes occurred too consistently to be a 
"by chance" event. In my torture tests I am NOT USING a Windows NTP 
service (I use the meinberg NTP daemon on Windows). But on production I 
do. Can you see any possible impact here?

Regards Andreas

^ permalink raw reply	[flat|nested] 5+ messages in thread