xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
@ 2015-08-03 10:47 Ian Campbell
  2015-08-03 11:36 ` Ian Campbell
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ian Campbell @ 2015-08-03 10:47 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

TL;DR: I've dropped a stop file for:
  xen-4.2-testing + qemu-upstream-4.2-testing
  xen-4.3-testing + qemu-upstream-4.3-testing
while we figure this out. I also killed flight 60373 (xen-4.2-testing).

Recently the amount of log files collected by osstest seems to have
rocketed for some flights.

$ du -shc * | sort -h
[...]
975M	60149
1.1G	60155
1.2G	60152
1.2G	60162
1.3G	60154
1.3G	60164
1.3G	60189
1.6G	60157
1.6G	60161
1.7G	60182
1.7G	60192
1.8G	60196
1.8G	60199
11G	60159
12G	60194
14G	60150
15G	60151
84G	total

The ~1G sizes are normal. For e.g. 60199 the single largest subdirectory is
71M, which is, I think, normal.

For those last 4 which are huge contain a few surprisingly large
subdirectories each:
1.6G	60159/test-amd64-i386-libvirt-qcow2
1.6G	60159/test-amd64-i386-libvirt-vhd
1.7G	60159/test-amd64-i386-libvirt-raw
2.4G	60159/test-amd64-amd64-libvirt-raw
2.4G	60159/test-amd64-amd64-libvirt-vhd

1.3G	60194/test-amd64-i386-libvirt-raw
2.0G	60194/test-amd64-i386-libvirt-qcow2
2.4G	60194/test-amd64-i386-libvirt-vhd
2.7G	60194/test-amd64-amd64-libvirt-raw
2.8G	60194/test-amd64-amd64-libvirt-vhd

1.7G	60150/test-amd64-i386-libvirt-vhd
1.8G	60150/test-amd64-i386-libvirt-raw
2.0G	60150/test-amd64-amd64-libvirt
2.4G	60150/test-amd64-amd64-libvirt-raw
2.4G	60150/test-amd64-amd64-libvirt-vhd
2.4G	60150/test-i386-i386-libvirt

2.2G	60151/test-amd64-i386-libvirt-vhd
2.3G	60151/test-amd64-i386-libvirt-qcow2
2.3G	60151/test-amd64-i386-libvirt-raw
2.4G	60151/test-amd64-amd64-libvirt-qcow2
2.4G	60151/test-amd64-amd64-libvirt-raw
2.5G	60151/test-amd64-amd64-libvirt-vhd

Note that it is not all test-*-libvirt*, e.g. taking 60151 one:

$ du -shc 60151/test-*libvirt* | sort -h
1.9M	60151/test-armhf-armhf-libvirt-qcow2
4.3M	60151/test-armhf-armhf-libvirt-vhd
4.4M	60151/test-armhf-armhf-libvirt
7.5M	60151/test-amd64-i386-libvirt
9.0M	60151/test-amd64-amd64-libvirt
9.1M	60151/test-armhf-armhf-libvirt-raw

2.2G	60151/test-amd64-i386-libvirt-vhd
2.3G	60151/test-amd64-i386-libvirt-qcow2
2.3G	60151/test-amd64-i386-libvirt-raw
2.4G	60151/test-amd64-amd64-libvirt-qcow2
2.4G	60151/test-amd64-amd64-libvirt-raw
2.5G	60151/test-amd64-amd64-libvirt-vhd
14G	total

These are the affected branches:

60159: qemu-upstream-4.3-testing
60194: xen-4.3-testing
60150: xen-4.2-testing
60151: xen-4.3-testing

and they are all pretty new:

$ ls -ld 60159 60194 60150 60151
drwxrwsr-x 81 osstest osstest 4096 Aug  2 00:58 60150
drwxrwsr-x 73 osstest osstest 4096 Jul 31 09:44 60151
drwxrwsr-x 45 osstest osstest 4096 Aug  1 22:56 60159
drwxrwsr-x 73 osstest osstest 4096 Aug  3 00:07 60194

By comparison xen-4.5-testing flight 60189 has 1.3G of logging and 
$ ls -ld 60189
drwxrwsr-x 76 osstest osstest 4096 Aug  2 15:29 60189

The last xen-4.4-testing (flight 60152) is also ~1G as expected.

Picking at random http://logs.test-lab.xenproject.org/osstest/logs/60194/test-amd64-amd64-libvirt-raw/info.html
I see:
2.7G	godello1---var-log-libvirt-libxl-libxl-driver.log

and
http://logs.test-lab.xenproject.org/osstest/logs/60194/test-amd64-amd64-libvirt-raw/15.ts-logs-capture.log
shows:

2015-08-02 19:12:41 Z fetching /var/log/libvirt/libxl/libxl-driver.log to godello1---var-log-libvirt-libxl-libxl-driver.log
2015-08-02 19:12:41 Z executing scp ... root@172.16.144.37:/var/log/libvirt/libxl/libxl-driver.log /home/logs/logs/60194/test-amd64-amd64-libvirt-raw/godello1---var-log-libvirt-libxl-libxl-driver.log
2015-08-02 19:13:41 Z command timed out [60]: timeout 90 scp -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=100 -o ServerAliveInterval=100 -o PasswordAuthentication=no -o ChallengeResponseAuthentication=no -o UserKnownHostsFile=tmp/t.known_hosts_60194.test-amd64-amd64-libvirt-raw root@172.16.144.37:/var/log/libvirt/libxl/libxl-driver.log /home/logs/logs/60194/test-amd64-amd64-libvirt-raw/godello1---var-log-libvirt-libxl-libxl-driver.log
2015-08-02 19:13:41 Z logfile /var/log/libvirt/libxl/libxl-driver.log fetch failed: status (timed out) at Osstest/TestSupport.pm line 394.

So apparently the file is actually even bigger!

After the initial expected logging the file is simply full of:

2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1004:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] from domid=0 nentries=1 rc=1
2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   got=domaininfos[0] got->domain=0
2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   got=domaininfos[1] got->domain=-1
2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1023:domain_death_xswatch_callback:  got==gotend

Repeated at around 51KHz.

This appears to have also filled the disk on the logs server:
To xen@xenbits.xen.org:git/osstest.git
 * [new branch]      HEAD -> flight-60151
+ rsync --compress --compress-level=9 --stats --delete -auH -e ssh -o batchmode=yes /home/logs/logs/60151/ osstest@www:/var/www/osstest/logs/60151
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/var/www/osstest/logs/60151/test-amd64-i386-libvirt-qcow2/serial-huxelrebe0.log.0": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
rsync: connection unexpectedly closed (67019 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]
rsync 3072  at ./cr-publish-flight-logs line 59.
[2015-07-31 09:59:08 UTC 24411] ./cr-daily-branch --real xen-4.3-testing status=12

resulting in no email being published for these flights.

As well as the underlying log spam I think in osstest there is a bug in
that it did not either delete or compress the huge log file.

I am going to manually run gzip on these files to alleviate some of the
pressure. They should compress well. If not I may truncate them to around
the point the repetition starts.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-03 10:47 Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3 Ian Campbell
@ 2015-08-03 11:36 ` Ian Campbell
  2015-08-03 13:26   ` Ian Campbell
  2015-08-03 12:44 ` Ian Campbell
  2015-08-11 10:19 ` Ian Campbell
  2 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2015-08-03 11:36 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> 
> I am going to manually run gzip on these files to alleviate some of the
> pressure. They should compress well. If not I may truncate them to around
> the point the repetition starts.

Each one got 99.6% compression, so the logs partition now has ~50% free...

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-03 10:47 Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3 Ian Campbell
  2015-08-03 11:36 ` Ian Campbell
@ 2015-08-03 12:44 ` Ian Campbell
  2015-08-11 10:19 ` Ian Campbell
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2015-08-03 12:44 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> TL;DR: I've dropped a stop file for:
>   xen-4.2-testing + qemu-upstream-4.2-testing
>   xen-4.3-testing + qemu-upstream-4.3-testing
> while we figure this out. I also killed flight 60373 (xen-4.2-testing).

I also subsequently spotted 60229 on qemu-upstream-4.3-testing which I have
killed.

I also observed via flight 60207 on qemu-upstream-4.2-testing that there is
no runtime testing of libvirt in that branch, only build so I removed that
stop file since it isn't going to hit this issue.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-03 11:36 ` Ian Campbell
@ 2015-08-03 13:26   ` Ian Campbell
  2015-08-03 14:16     ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2015-08-03 13:26 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Mon, 2015-08-03 at 12:36 +0100, Ian Campbell wrote:
> On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> > 
> > I am going to manually run gzip on these files to alleviate some of the
> > pressure. They should compress well. If not I may truncate them to 
> > around
> > the point the repetition starts.
> 
> Each one got 99.6% compression, so the logs partition now has ~50% 
> free...

And I now did the same on the copies in www:/var/www since that was full
too. A bunch of reports failed to go out due to this.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-03 13:26   ` Ian Campbell
@ 2015-08-03 14:16     ` Ian Campbell
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2015-08-03 14:16 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Mon, 2015-08-03 at 14:26 +0100, Ian Campbell wrote:
> On Mon, 2015-08-03 at 12:36 +0100, Ian Campbell wrote:
> > On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> > > 
> > > I am going to manually run gzip on these files to alleviate some of 
> > > the
> > > pressure. They should compress well. If not I may truncate them to 
> > > around
> > > the point the repetition starts.
> > 
> > Each one got 99.6% compression, so the logs partition now has ~50% 
> > free...
> 
> And I now did the same on the copies in www:/var/www since that was full
> too. A bunch of reports failed to go out due to this.

A quick

$ zgrep 'No space left on device' for-*/tmp/cr-for-branches.log*

lead me to the following list:

for-libvirt.git				60183
for-osstest.git				60196
for-qemu-mainline.git			60164
for-qemu-upstream-4.3-testing.git	60159
for-seabios.git				60165
for-xen-4.2-testing.git			60150
for-xen-4.2-testing.git			60373
for-xen-4.3-testing.git			60194
for-xen-4.3-testing.git			60151

For each of those I did, as osstest@, from the appropriate for-*
    flight=
    ./cr-publish-flight-logs $flight
    /usr/sbin/sendmail -odi -oee -oi -t < tmp/$flight.email
    mv tmp/$flight.email tmp/$flight.email.sent

That's no doubt imperfect/incomplete, but I think at least makes the
results public.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-03 10:47 Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3 Ian Campbell
  2015-08-03 11:36 ` Ian Campbell
  2015-08-03 12:44 ` Ian Campbell
@ 2015-08-11 10:19 ` Ian Campbell
  2015-08-12 13:15   ` Ian Campbell
  2 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2015-08-11 10:19 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> After the initial expected logging the file is simply full of:
> 
> 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1004:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] from domid=0 nentries=1 rc=1
> 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   got=domaininfos[0] got->domain=0
> 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]   got=domaininfos[1] got->domain=-1
> 2015-08-02 19:12:12 UTC libxl: debug: libxl.c:1023:domain_death_xswatch_callback:  got==gotend
> 
> Repeated at around 51KHz.

This sounds a lot like 4783c99aab8 (see below for full log message), which
perhaps ought to be backported to the effected branches, i.e. 4.2 and 4.3. 

Looks like it was backported to 4.5 (as 0b19348f3cd1) and 4.4 (as
13623d5d8e85) already.

Ian?

Ian.


commit 4783c99aab866f470bd59368cfbf5ad5f677b0ec
Author: Ian Jackson <Ian.Jackson@eu.citrix.com>
Date:   Tue Mar 17 09:30:57 2015 -0600

    libxl: In domain death search, start search at first domid we want
    
    From: Ian Jackson <Ian.Jackson@eu.citrix.com>
    
    When domain_death_xswatch_callback needed a further call to
    xc_domain_getinfolist it would restart it with the last domain it
    found rather than the first one it wants.
    
    If it only wants one it will also only ask for one domain.  The result
    would then be that it gets the previous domain again (ie, the previous
    one to the one it wants), which still doesn't reveal the answer to the
    question, and it would therefore loop again.
    
    It's completely unclear to me why I thought it was a good idea to
    start the xc_domain_getinfolist with the last domain previously found
    rather than the first one left un-confirmed.  The code has been that
    way since it was introduced.
    
    Instead, start each xc_domain_getinfolist at the next domain whose
    status we need to check.
    
    We also need to move the test for !evg into the loop, we now need evg
    to compute the arguments to getinfolist.
    
    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    Reported-by: Jim Fehlig <jfehlig@suse.com>
    Reviewed-by: Jim Fehlig <jfehlig@suse.com>
    Tested-by: Jim Fehlig <jfehlig@suse.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3
  2015-08-11 10:19 ` Ian Campbell
@ 2015-08-12 13:15   ` Ian Campbell
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2015-08-12 13:15 UTC (permalink / raw)
  To: Wei Liu, Jim Fehlig, Ian Jackson, Jan Beulich, Stefano Stabellini
  Cc: xen-devel

On Tue, 2015-08-11 at 11:19 +0100, Ian Campbell wrote:
> On Mon, 2015-08-03 at 11:47 +0100, Ian Campbell wrote:
> > After the initial expected logging the file is simply full of:
> > 
> > 2015-08-02 19:12:12 UTC libxl: debug: 
> > libxl.c:1004:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3] from 
> > domid=0 nentries=1 rc=1
> > 2015-08-02 19:12:12 UTC libxl: debug: 
> > libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]  
> >  got=domaininfos[0] got->domain=0
> > 2015-08-02 19:12:12 UTC libxl: debug: 
> > libxl.c:1015:domain_death_xswatch_callback: [evg=0x7f3cc44fa3f0:3]  
> >  got=domaininfos[1] got->domain=-1
> > 2015-08-02 19:12:12 UTC libxl: debug: 
> > libxl.c:1023:domain_death_xswatch_callback:  got==gotend
> > 
> > Repeated at around 51KHz.
> 
> This sounds a lot like 4783c99aab8 (see below for full log message), 
> which
> perhaps ought to be backported to the effected branches, i.e. 4.2 and 
> 4.3. 
> 
> Looks like it was backported to 4.5 (as 0b19348f3cd1) and 4.4 (as
> 13623d5d8e85) already.
> 
> Ian?

I just saw this one go past on the changelog list, so I've removed 

branches/for-qemu-upstream-4.3-testing.git/stop
branches/for-xen-4.2-testing.git/stop
branches/for-xen-4.3-testing.git/stop 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-12 13:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-03 10:47 Enormous size of libvirt libxl-driver.log with Xen 4.2 and 4.3 Ian Campbell
2015-08-03 11:36 ` Ian Campbell
2015-08-03 13:26   ` Ian Campbell
2015-08-03 14:16     ` Ian Campbell
2015-08-03 12:44 ` Ian Campbell
2015-08-11 10:19 ` Ian Campbell
2015-08-12 13:15   ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).