All of lore.kernel.org
 help / color / mirror / Atom feed
* Stability report GPLPV 0.11.0.308
@ 2011-09-05 10:13 Andreas Kinzler
  2011-09-05 10:31 ` James Harper
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-05 10:13 UTC (permalink / raw)
  To: James Harper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

Hello James,

I am doing quite rigorous torture tests with Xen and GPLPV. Let me first 
repeat the test setup:

Use Xen 4.1.1 and kernel 2.6.32.36 (commit ae333e9).
Configure 2 HVMs called VM1 and VM2 as follows (per HVM): 2 VCPUs, 2 
virtual disks, 1024 MB RAM, viridian=1
Install Windows 2008 R2 SP1, do install everything twice - never clone. 
Install GPLPV, iometer 2006.07.27, prime95 26.6 x64, ActiveState Perl 
5.12.4 x64, wget for Windows and the attached perl script.

Run iometer with 2 workers on the same but separate second virtual disk, 
queue depth 4 per worker, access specification "All in one".
Run prime95 torture test with "In-place large FFTs". On VM1 use the task 
manager to set affinity to VCPU2, on VM2 set affinity to VCPU1.
Run the perl script to fetch a good mix of some large (50-500 MB) and 
many small (some KB) files from a high performance FTP server on the LAN 
(I use vsftpd).

This generates quite some load as vmstat shows:
virt5620 ~ # vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa
  0  0      0 723408   6860  33860    0    0 82113 82132 22503 30252  2 
12 84  0
  0  0      0 723408   6860  33860    0    0 80117 82913 23109 30776  1 
13 83  0
  4  0      0 723408   6860  33860    0    0 92555 87013 28411 33283  2 
12 84  0
  4  0      0 723408   6860  33860    0    0 82678 85775 26228 31739  1 
13 83  0
  5  0      0 723408   6860  33860    0    0 82252 84837 24180 29723  1 
14 82  0

With GPLPV 0.11.0.308 it worked perfectly and with very good performance 
for over 9 days but then when I wanted to monitor the status, I was no 
longer able to connect via remote desktop. When examining the file 
system of the HVMs I found that somehow even the prime95 processes did stop.

Any ideas? Could c/s 948 make any difference? Network worked perfectly 
for 9 days, so I ask myself if the count of c/s 948 is used at all?

Regards Andreas


[-- Attachment #2: test-via-wget.pl --]
[-- Type: text/plain, Size: 725 bytes --]


use IPC::Run3;

$url      = "ftp://10.0.0.3";
$ftpUser  = "ftpuser";
$ftpPass  = "ftp";
$sleepSec = 60;

sub runWget()
{
    my ($stdout, $stderr);
    my @args = ("wget", "-O", "nul", "-r", "-v",
                "--user=$ftpUser", "--password=$ftpPass",
                $url);
    IPC::Run3::run3(\@args, undef, \$stdout, \$stderr);
    $r = $?;
    open FILE, ">last-stdout" or die;
    print FILE $stdout;
    close FILE;
    open FILE, ">last-stderr" or die;
    print FILE $stderr;
    close FILE;
    return $r;
}

$iter = 1;

while(1)
{
    my ($r);

    $r = runWget;
    if ($r)
    {
        print("\nError!\n");
        exit 1;
    }
    print "Iteration #$iter completed\n";
    $iter++;
    sleep($sleepSec);
}

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Stability report GPLPV 0.11.0.308
  2011-09-05 10:13 Stability report GPLPV 0.11.0.308 Andreas Kinzler
@ 2011-09-05 10:31 ` James Harper
  2011-09-05 13:58   ` Andreas Kinzler
  0 siblings, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-05 10:31 UTC (permalink / raw)
  To: Andreas Kinzler, xen-devel

> 
> With GPLPV 0.11.0.308 it worked perfectly and with very good
performance
> for over 9 days but then when I wanted to monitor the status, I was no
> longer able to connect via remote desktop. When examining the file
> system of the HVMs I found that somehow even the prime95 processes did
stop.
> 
> Any ideas? Could c/s 948 make any difference? Network worked perfectly
> for 9 days, so I ask myself if the count of c/s 948 is used at all?
> 

There is some combination that I can't reproduce that seems to cause a
problem when that count isn't passed in correctly. So it is a bug, but
I'm not sure if it causes the problems you are seeing.

I can give you a link to a build with that fix applied if you want to
test further.

Thanks

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Stability report GPLPV 0.11.0.308
  2011-09-05 10:31 ` James Harper
@ 2011-09-05 13:58   ` Andreas Kinzler
  2011-09-06  5:21     ` James Harper
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-05 13:58 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

 >> With GPLPV 0.11.0.308 it worked perfectly and with very good performance
 >> for over 9 days but then when I wanted to monitor the status, I was no
 >> longer able to connect via remote desktop. When examining the file
 >> system of the HVMs I found that somehow even the prime95 processes 
did stop.
 >> Any ideas? Could c/s 948 make any difference? Network worked perfectly
 >> for 9 days, so I ask myself if the count of c/s 948 is used at all?
 >There is some combination that I can't reproduce that seems to cause a
 >problem when that count isn't passed in correctly. So it is a bug, but
 >I'm not sure if it causes the problems you are seeing.

What exactly is the problem you encountered?

 >I can give you a link to a build with that fix applied if you want to
 >test further.

Thanks. I have plans to test the 0.11.0.312 version. I have a complete 
build system here and a kernel-mode enabled signing certificate so I can 
use that for my tests.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Stability report GPLPV 0.11.0.308
  2011-09-05 13:58   ` Andreas Kinzler
@ 2011-09-06  5:21     ` James Harper
  2011-09-12 21:39       ` Andreas Kinzler
  2011-09-19 11:35       ` Andreas Kinzler
  0 siblings, 2 replies; 21+ messages in thread
From: James Harper @ 2011-09-06  5:21 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
>  >> With GPLPV 0.11.0.308 it worked perfectly and with very good
performance
>  >> for over 9 days but then when I wanted to monitor the status, I
was no
>  >> longer able to connect via remote desktop. When examining the file
>  >> system of the HVMs I found that somehow even the prime95 processes
> did stop.
>  >> Any ideas? Could c/s 948 make any difference? Network worked
perfectly
>  >> for 9 days, so I ask myself if the count of c/s 948 is used at
all?
>  >There is some combination that I can't reproduce that seems to cause
a
>  >problem when that count isn't passed in correctly. So it is a bug,
but
>  >I'm not sure if it causes the problems you are seeing.
> 
> What exactly is the problem you encountered?
> 
>  >I can give you a link to a build with that fix applied if you want
to
>  >test further.
> 
> Thanks. I have plans to test the 0.11.0.312 version. I have a complete
> build system here and a kernel-mode enabled signing certificate so I
can
> use that for my tests.
> 

I actually looked back through the changes and there is still a fix to
come - basically xennet allocates new buffers when it needs them, but
never frees them again so if there is a really big burst of traffic it
could end up taking all the available memory. That could cause the
problem you are seeing. I'm a bit busy with some other things at the
moment but I hope to have a fix by the end of the week.

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-06  5:21     ` James Harper
@ 2011-09-12 21:39       ` Andreas Kinzler
  2011-09-19 11:35       ` Andreas Kinzler
  1 sibling, 0 replies; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-12 21:39 UTC (permalink / raw)
  To: James Harper, xen-devel

On 06.09.2011 07:21, James Harper wrote:

 > I actually looked back through the changes and there is still a fix 
to come - basically
 > xennet allocates new buffers when it needs them, but never frees them 
again so if
 > there is a really big burst of traffic it could end up taking all the 
available memory. That
 > could cause the problem you are seeing. I'm a bit busy with some 
other things at the
 > moment but I hope to have a fix by the end of the week.

I am currently running the torture test on 0.11.0.312 and did not see 
any increase in usage
of kernel memory (uptime is over 7 days now with quite extreme load on 
net/disk).
Anything new about your fix?

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-06  5:21     ` James Harper
  2011-09-12 21:39       ` Andreas Kinzler
@ 2011-09-19 11:35       ` Andreas Kinzler
  2011-09-21 23:30         ` James Harper
  1 sibling, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-19 11:35 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hello James,

the torture test of GPLPV 0.11.0.312 failed (as 0.11.0.308 did). What 
really puzzles me is that the uptime was 9-10 days for both VMs (as in 
0.11.0.308). One could think that there is something about the uptime of 
9-10 days. There is no noticeable malfunction in dom0 and while the 
domUs were running they looked perfectly. Really odd.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-19 11:35       ` Andreas Kinzler
@ 2011-09-21 23:30         ` James Harper
  2011-09-22  9:49           ` Andreas Kinzler
  0 siblings, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-21 23:30 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> Hello James,
> 
> the torture test of GPLPV 0.11.0.312 failed (as 0.11.0.308 did). What
really
> puzzles me is that the uptime was 9-10 days for both VMs (as in
0.11.0.308).
> One could think that there is something about the uptime of
> 9-10 days. There is no noticeable malfunction in dom0 and while the
domUs
> were running they looked perfectly. Really odd.
> 

I haven't tested it well, but the latest should be a bit more stable.
Could you try one of the following from
http://www.meadowcourt.org/private:
gplpv_Vista2008x64_0.11.0.265.msi
gplpv_2000_0.11.0.322_debug.msi
gplpv_XP_0.11.0.322_debug.msi
gplpv_2003x32_0.11.0.322_debug.msi
gplpv_2003x64_0.11.0.322_debug.msi
gplpv_Vista2008x32_0.11.0.322_debug.msi
gplpv_Vista2008x64_0.11.0.322_debug.msi
gplpv_2000_0.11.0.322.msi
gplpv_XP_0.11.0.322.msi
gplpv_2003x32_0.11.0.322.msi
gplpv_2003x64_0.11.0.322.msi
gplpv_Vista2008x32_0.11.0.322.msi
gplpv_Vista2008x64_0.11.0.322.msi

thanks

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-21 23:30         ` James Harper
@ 2011-09-22  9:49           ` Andreas Kinzler
  2011-09-22 10:10             ` James Harper
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-22  9:49 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hello James,

>  I haven't tested it well, but the latest should be a bit more stable.
>  Could you try one of the following from
>  http://www.meadowcourt.org/private

Gives me an HTTP 403. I'd really prefer to fetch it from your mercurial repo since I don't want to use testsigning and use our real certificate instead.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-22  9:49           ` Andreas Kinzler
@ 2011-09-22 10:10             ` James Harper
  2011-09-23 20:57               ` Andreas Kinzler
  0 siblings, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-22 10:10 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> Hello James,
> 
> >  I haven't tested it well, but the latest should be a bit more
stable.
> >  Could you try one of the following from
> > http://www.meadowcourt.org/private
> 
> Gives me an HTTP 403.

You need to append the filename to the url - no browsing on that
directory.

> I'd really prefer to fetch it from your mercurial repo
> since I don't want to use testsigning and use our real certificate
instead.

My laptop broke and I haven't gotten mercurial set up yet. I'll push it
as soon as I can.

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-22 10:10             ` James Harper
@ 2011-09-23 20:57               ` Andreas Kinzler
  2011-09-23 23:49                 ` James Harper
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-23 20:57 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

Hello James,

I did take a look at your commit 950 and I think there are 3 typos (see 
my patch). Anyway, I don't think that memory problems are causing the 
stability issues and (as some kind of "proof") I did not notice any 
increase in kernel memory usage during uptime of the VMs.

Actually I am not really sure if xennet is even the problem since in 
none of the crash scenarios there was something in the Windows event 
log. In my last test I was able to enter my password via VNC (it did not 
login though) - this should have written some entries to the security 
log which it did not. So I assume that xenvbd was dead too (killed by 
xennet) or is actually the real reason for the stability problems.

Regards Andreas

[-- Attachment #2: diff.patch --]
[-- Type: text/plain, Size: 1244 bytes --]

--- x\xennet6_rx.c	2011-09-22 13:47:46.000000000 +0200
+++ xennet6_rx.c	2011-09-23 22:39:47.892796400 +0200
@@ -50,8 +50,8 @@
   pb->mdl = IoAllocateMdl(pb->virtual, PAGE_SIZE, FALSE, FALSE, NULL);
   if (!pb->mdl)
   {
-    NdisFreeMemory(pb->virtual, sizeof(shared_buffer_t), 0);
-    NdisFreeMemory(pb, PAGE_SIZE, 0);
+    NdisFreeMemory(pb->virtual, PAGE_SIZE, 0);
+    NdisFreeMemory(pb, sizeof(shared_buffer_t), 0);
     return NULL;
   }
   pb->gref = (grant_ref_t)xi->vectors.GntTbl_GrantAccess(xi->vectors.context, 0,
@@ -59,8 +59,8 @@
   if (pb->gref == INVALID_GRANT_REF)
   {
     IoFreeMdl(pb->mdl);
-    NdisFreeMemory(pb->virtual, sizeof(shared_buffer_t), 0);
-    NdisFreeMemory(pb, PAGE_SIZE, 0);
+    NdisFreeMemory(pb->virtual, PAGE_SIZE, 0);
+    NdisFreeMemory(pb, sizeof(shared_buffer_t), 0);
     return NULL;
   }
   MmBuildMdlForNonPagedPool(pb->mdl);
@@ -85,8 +85,8 @@
     if (xi->rx_pb_free > RX_MAX_PB_FREELIST)
     {
       IoFreeMdl(pb->mdl);
-      NdisFreeMemory(pb->virtual, sizeof(shared_buffer_t), 0);
-      NdisFreeMemory(pb, PAGE_SIZE, 0);
+      NdisFreeMemory(pb->virtual, PAGE_SIZE, 0);
+      NdisFreeMemory(pb, sizeof(shared_buffer_t), 0);
       return;
     }
     pb->mdl->ByteCount = PAGE_SIZE;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-23 20:57               ` Andreas Kinzler
@ 2011-09-23 23:49                 ` James Harper
  2011-09-26 14:44                   ` Andreas Kinzler
  0 siblings, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-23 23:49 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
> I did take a look at your commit 950 and I think there are 3 typos
(see my
> patch).

Thanks for that. I guess the length parameter mustn't be verified or I
would have crashed...

> Anyway, I don't think that memory problems are causing the stability
> issues and (as some kind of "proof") I did not notice any increase in
kernel
> memory usage during uptime of the VMs.
> 
> Actually I am not really sure if xennet is even the problem since in
none of
> the crash scenarios there was something in the Windows event log. In
my
> last test I was able to enter my password via VNC (it did not login
though) -
> this should have written some entries to the security log which it did
not. So I
> assume that xenvbd was dead too (killed by
> xennet) or is actually the real reason for the stability problems.
> 

I'm just setting up a few new servers and on one of them a Windows
2008R2 machine hung and there were lots of xenvbd errors in the logs.
The underlying block device for that DomU is iSCSI and I assumed the
problem was there (I was doing a lot of testing on the SAN at the time)
but maybe not.

No evidence of problems in /var/log/xen/qemu-dm-<domu name>.log?

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-23 23:49                 ` James Harper
@ 2011-09-26 14:44                   ` Andreas Kinzler
  2011-09-27  0:59                     ` James Harper
  2011-09-27  5:45                     ` James Harper
  0 siblings, 2 replies; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-26 14:44 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

On 24.09.2011 01:49, James Harper wrote:
> I'm just setting up a few new servers and on one of them a Windows
> 2008R2 machine hung and there were lots of xenvbd errors in the logs.
> The underlying block device for that DomU is iSCSI and I assumed the
> problem was there (I was doing a lot of testing on the SAN at the time)
> but maybe not.

Hmmm, do you have any more ideas? Anything how I could help you debugging?

> No evidence of problems in /var/log/xen/qemu-dm-<domu name>.log?

I did not see anything special.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-26 14:44                   ` Andreas Kinzler
@ 2011-09-27  0:59                     ` James Harper
  2011-09-27  5:45                     ` James Harper
  1 sibling, 0 replies; 21+ messages in thread
From: James Harper @ 2011-09-27  0:59 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
> On 24.09.2011 01:49, James Harper wrote:
> > I'm just setting up a few new servers and on one of them a Windows
> > 2008R2 machine hung and there were lots of xenvbd errors in the
logs.
> > The underlying block device for that DomU is iSCSI and I assumed the
> > problem was there (I was doing a lot of testing on the SAN at the
> > time) but maybe not.
> 
> Hmmm, do you have any more ideas? Anything how I could help you
> debugging?

You could look for all the failure paths (eg can't allocate buffer etc)
and put debug prints there. I'm almost in a position to be able to start
testing a bit better now.

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-26 14:44                   ` Andreas Kinzler
  2011-09-27  0:59                     ` James Harper
@ 2011-09-27  5:45                     ` James Harper
  2011-09-30  9:17                       ` Andreas Kinzler
  1 sibling, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-27  5:45 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
> On 24.09.2011 01:49, James Harper wrote:
> > I'm just setting up a few new servers and on one of them a Windows
> > 2008R2 machine hung and there were lots of xenvbd errors in the
logs.
> > The underlying block device for that DomU is iSCSI and I assumed the
> > problem was there (I was doing a lot of testing on the SAN at the
> > time) but maybe not.
> 
> Hmmm, do you have any more ideas? Anything how I could help you
> debugging?
> 
> > No evidence of problems in /var/log/xen/qemu-dm-<domu name>.log?
> 
> I did not see anything special.
> 

I just had a crash under heavy testing:

12961573357740:
12961573357740: *** Assertion failed: pi.curr_mdl
12961573357740: ***   Source File:
c:\projects\win-pvdrivers.hg\xennet\xennet6_tx.c, line 308
12961573357740:

Can you have a look in your logs for anything like this? I'm curious as
to if we are chasing the same problem or a different one.

I haven't checked yet to make sure I am running the latest code on that
server so it could just be something stupid...

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-27  5:45                     ` James Harper
@ 2011-09-30  9:17                       ` Andreas Kinzler
  2011-09-30 10:06                         ` James Harper
                                           ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Andreas Kinzler @ 2011-09-30  9:17 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hello James,

> 12961573357740: *** Assertion failed: pi.curr_mdl
> 12961573357740: ***   Source File:
> c:\projects\win-pvdrivers.hg\xennet\xennet6_tx.c, line 308

I took this report about a problem you had with "tx" to modify my tests 
and to make them mostly tx-based while previously they were mostly 
rx-based. Tests are running for 2d 18h now - no problems so far.

I wanted to tell you about one interesting observation. In my tests I 
did two runs with modified xenvbd drivers. In run #1 I switched to the 
scsiport driver of 0.11.0.312 and this made one domU crash after one day 
while with 0.11.0.312 storport version I always had more than 9 days (as 
I reported earlier). In run #2 I forward-ported xenvbd from 0.11.0.213 
(which is totally stable on our systems) and again one domU crashed 
after one day. This is really interesting and leads me to two thoughts:

1) xennet has some problem, but still why does scsiport vs. storport 
make a difference then?
2) perhaps there is some new bug outside xennet and outside xenvbd (some 
infrastructure thing: event handling, PCI, ...) and this is the real reason.

> Can you have a look in your logs for anything like this? I'm curious as
> to if we are chasing the same problem or a different one.

I am not running kernel debugging so far (have played with it though). 
So I cannot say.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-30  9:17                       ` Andreas Kinzler
@ 2011-09-30 10:06                         ` James Harper
  2011-09-30 10:09                         ` James Harper
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: James Harper @ 2011-09-30 10:06 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
> 1) xennet has some problem, but still why does scsiport vs. storport
make a
> difference then?

If I'm running over the end of a buffer then anything goes...

> 2) perhaps there is some new bug outside xennet and outside xenvbd
(some
> infrastructure thing: event handling, PCI, ...) and this is the real
reason.

Could be.

> > Can you have a look in your logs for anything like this? I'm curious
> > as to if we are chasing the same problem or a different one.
> 
> I am not running kernel debugging so far (have played with it though).
> So I cannot say.
> 

You only need to be running the debug version of the drivers for this to
be logged.

Also are you running with the driver verifier enabled? That can help
catch bugs and also allows you to notice memory leaks a bit better.

One other thing to try is running with the checked build of windows.
I've used the checked kernel and hal under 2003 and it picked up one
error - it basically just checks parameters etc everywhere it can to
make sure you aren't doing anything you shouldn't. I seem to remember
the checked NDIS driver threw a fit though because xen was presenting 4
CPU's but a the cpuid registers said it was a single cpu with 4 cores so
there are little things like that that can be troublesome with the
checked builds...

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-30  9:17                       ` Andreas Kinzler
  2011-09-30 10:06                         ` James Harper
@ 2011-09-30 10:09                         ` James Harper
  2011-10-01 11:10                           ` Andreas Kinzler
  2011-09-30 10:17                         ` James Harper
       [not found]                         ` <AEC6C66638C05B468B556EA548C1A77D01E5E2E4@trantor>
  3 siblings, 1 reply; 21+ messages in thread
From: James Harper @ 2011-09-30 10:09 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> 
> > 12961573357740: *** Assertion failed: pi.curr_mdl
> > 12961573357740: ***   Source File:
> > c:\projects\win-pvdrivers.hg\xennet\xennet6_tx.c, line 308
> 
> I took this report about a problem you had with "tx" to modify my
tests and
> to make them mostly tx-based while previously they were mostly
rx-based.
> Tests are running for 2d 18h now - no problems so far.
> 
> I wanted to tell you about one interesting observation. In my tests I
did two
> runs with modified xenvbd drivers. In run #1 I switched to the
scsiport driver
> of 0.11.0.312 and this made one domU crash after one day while with
> 0.11.0.312 storport version I always had more than 9 days (as I
reported
> earlier). In run #2 I forward-ported xenvbd from 0.11.0.213 (which is
totally
> stable on our systems) and again one domU crashed after one day. This
is
> really interesting and leads me to two thoughts:
> 

Actually one other thing you could try is simply using the Windows 2003
version of the drivers. That uses ndis5 and scsiport instead of ndis6
and storport. If that worked we could try running with ndis5 + storport
and see if that works okay. As long as they are from the same patchlevel
it shouldn't matter if you use one compiled for windows 2008 and one for
windows 2003 (it's possible that it might matter but I can't think of
anything).

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RE: Stability report GPLPV 0.11.0.308
  2011-09-30  9:17                       ` Andreas Kinzler
  2011-09-30 10:06                         ` James Harper
  2011-09-30 10:09                         ` James Harper
@ 2011-09-30 10:17                         ` James Harper
       [not found]                         ` <AEC6C66638C05B468B556EA548C1A77D01E5E2E4@trantor>
  3 siblings, 0 replies; 21+ messages in thread
From: James Harper @ 2011-09-30 10:17 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

> Hello James,
> 
> > 12961573357740: *** Assertion failed: pi.curr_mdl
> > 12961573357740: ***   Source File:
> > c:\projects\win-pvdrivers.hg\xennet\xennet6_tx.c, line 308
> 
> I took this report about a problem you had with "tx" to modify my
tests and
> to make them mostly tx-based while previously they were mostly
rx-based.
> Tests are running for 2d 18h now - no problems so far.
> 

I'm not quite sure but I think my testing was in the rx direction just
prior to the crash. I think the testing finished and then this crash
happened a few minutes later. I think it also happened when the machine
hadn't been up for very long too... I haven't been able to reproduce it
since though which is frustrating.

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
       [not found]                         ` <AEC6C66638C05B468B556EA548C1A77D01E5E2E4@trantor>
@ 2011-10-01  9:28                           ` Andreas Kinzler
  0 siblings, 0 replies; 21+ messages in thread
From: Andreas Kinzler @ 2011-10-01  9:28 UTC (permalink / raw)
  To: James Harper, xen-devel

Hello,

> What version of Xen are you testing with? I'm using the latest xen 4.1

I am using Xen 4.1.1 official.

> from hg and this morning I couldn't log into my server via RDP because
> the date had advanced about 2 months. I was testing it hard enough that
> it lost network connectivity for a bit so I'm wondering if that had
> something to do with it... have you seen anything like that during
> intensive testing?

Yes, there might be additional problems that are cause by time issues, 
but that does not explain why the Windows log does not mention my login 
try (see earlier mail).

I have one production system with Xen 4.1.1 and GPLPV 0.11.0.213 which 
has an uptime of more then 100 days.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-09-30 10:09                         ` James Harper
@ 2011-10-01 11:10                           ` Andreas Kinzler
  2011-10-10 16:07                             ` Andreas Kinzler
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Kinzler @ 2011-10-01 11:10 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hello James,

> Actually one other thing you could try is simply using the Windows 2003
> version of the drivers. That uses ndis5 and scsiport instead of ndis6
> and storport. If that worked we could try running with ndis5 + storport
> and see if that works okay. As long as they are from the same patchlevel
> it shouldn't matter if you use one compiled for windows 2008 and one for
> windows 2003 (it's possible that it might matter but I can't think of
> anything).

After 3d:18h I stopped my tx-based test with no problems to far. My 
conclusion: the switch of rx-based to tx-based did not change anything.

I now compiled 0.11.0.312 with scsiport and ndis5 (and patched the .inf 
file, deleted the [XenGplPv.NT$ARCH$.6.0] section). Test is now running.

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RE: Stability report GPLPV 0.11.0.308
  2011-10-01 11:10                           ` Andreas Kinzler
@ 2011-10-10 16:07                             ` Andreas Kinzler
  0 siblings, 0 replies; 21+ messages in thread
From: Andreas Kinzler @ 2011-10-10 16:07 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel

Hello James,

>> Actually one other thing you could try is simply using the Windows
>> 2003 version of the drivers. That uses ndis5 and scsiport instead
>> of ndis6 and storport. If that worked we could try running with
>> ndis5 + storport and see if that works okay. As long as they are
>> from the same patchlevel it shouldn't matter if you use one
>> compiled for windows 2008 and one for windows 2003 (it's possible
>> that it might matter but I can't think of anything).
> I now compiled 0.11.0.312 with scsiport and ndis5 (and patched the
> .inf file, deleted the [XenGplPv.NT$ARCH$.6.0] section). Test is now
> running.

Crashed after 1-2 days, but actually I found that ndis5 of 0.11.0.213
has major differences from ndis5 of 0.11.0.312 so I am not sure what
that means?

The whole reason I am doing all the testing is because the net
performance of 0.11.0.213 is not good enough and 0.11.0.312 has near
native performance on gigabit links - but even the ndis5 driver of
0.11.0.312 has very good performance so it does not seem to be an NDIS 6
improvement.

Any news on your side?

Regards Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-10-10 16:07 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-05 10:13 Stability report GPLPV 0.11.0.308 Andreas Kinzler
2011-09-05 10:31 ` James Harper
2011-09-05 13:58   ` Andreas Kinzler
2011-09-06  5:21     ` James Harper
2011-09-12 21:39       ` Andreas Kinzler
2011-09-19 11:35       ` Andreas Kinzler
2011-09-21 23:30         ` James Harper
2011-09-22  9:49           ` Andreas Kinzler
2011-09-22 10:10             ` James Harper
2011-09-23 20:57               ` Andreas Kinzler
2011-09-23 23:49                 ` James Harper
2011-09-26 14:44                   ` Andreas Kinzler
2011-09-27  0:59                     ` James Harper
2011-09-27  5:45                     ` James Harper
2011-09-30  9:17                       ` Andreas Kinzler
2011-09-30 10:06                         ` James Harper
2011-09-30 10:09                         ` James Harper
2011-10-01 11:10                           ` Andreas Kinzler
2011-10-10 16:07                             ` Andreas Kinzler
2011-09-30 10:17                         ` James Harper
     [not found]                         ` <AEC6C66638C05B468B556EA548C1A77D01E5E2E4@trantor>
2011-10-01  9:28                           ` Andreas Kinzler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.