All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
@ 2019-04-21 11:05 ` Hans
  2019-04-23  5:31 ` Hans
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Hans @ 2019-04-21 11:05 UTC (permalink / raw)
  To: qemu-devel

I will have to put more efford into that: The bug applies to oVirt 4.3
too which seems to use a fuse filesystem.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  New

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
  2019-04-21 11:05 ` [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes Hans
@ 2019-04-23  5:31 ` Hans
  2019-04-29 19:58 ` Hans
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Hans @ 2019-04-23  5:31 UTC (permalink / raw)
  To: qemu-devel

I did some undocumented testing to circle the issue down: It seems to
affect Files that are stored as sparsed files either raw or qcow2.

There was a big event and I had the chance to find some people using
glusterfs with qemu, too. the biggest difference as far as I could see
was that they all where using xfs based backends while I am using ext4.

I have to fix a few more issues in Production this week, then I will run
more tests.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  New

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
  2019-04-21 11:05 ` [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes Hans
  2019-04-23  5:31 ` Hans
@ 2019-04-29 19:58 ` Hans
  2021-04-29  9:53 ` Thomas Huth
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Hans @ 2019-04-29 19:58 UTC (permalink / raw)
  To: qemu-devel

Please note the updates on:

https://bugzilla.redhat.com/show_bug.cgi?id=1701736

It turns out that you can reproduce the broken images on glusterfs fuse
mounts by using:

                aio=native
                cache=none,
                write-cache=on


I have a set of vms running here on my fedora 29 desktop providing a test glusterfs and a vm to reproduce the bug, at least for the current ovirt case. 

** Bug watch added: Red Hat Bugzilla #1701736
   https://bugzilla.redhat.com/show_bug.cgi?id=1701736

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  New

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
                   ` (2 preceding siblings ...)
  2019-04-29 19:58 ` Hans
@ 2021-04-29  9:53 ` Thomas Huth
  2021-05-02  5:50 ` Thomas Huth
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2021-04-29  9:53 UTC (permalink / raw)
  To: qemu-devel

** Tags removed: qemu

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  New

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
                   ` (3 preceding siblings ...)
  2021-04-29  9:53 ` Thomas Huth
@ 2021-05-02  5:50 ` Thomas Huth
  2021-06-18 15:31 ` Thomas Huth
  2021-07-02  4:17 ` Launchpad Bug Tracker
  6 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2021-05-02  5:50 UTC (permalink / raw)
  To: qemu-devel

The QEMU project is currently considering to move its bug tracking to
another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you still think this bug report here is valid, then please switch
the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if
the problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.


** Changed in: qemu
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  Incomplete

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
                   ` (4 preceding siblings ...)
  2021-05-02  5:50 ` Thomas Huth
@ 2021-06-18 15:31 ` Thomas Huth
  2021-07-02  4:17 ` Launchpad Bug Tracker
  6 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2021-06-18 15:31 UTC (permalink / raw)
  To: qemu-devel

** Bug watch removed: Red Hat Bugzilla #1701736
   https://bugzilla.redhat.com/show_bug.cgi?id=1701736

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  Incomplete

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 1793904] Re: files are randomly overwritten by Zero Bytes
       [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
                   ` (5 preceding siblings ...)
  2021-06-18 15:31 ` Thomas Huth
@ 2021-07-02  4:17 ` Launchpad Bug Tracker
  6 siblings, 0 replies; 7+ messages in thread
From: Launchpad Bug Tracker @ 2021-07-02  4:17 UTC (permalink / raw)
  To: qemu-devel

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1793904

Title:
  files are randomly overwritten by Zero Bytes

Status in QEMU:
  Expired

Bug description:
  Hello together,

  I am currently tracking down a "Hard to reproduce" bug on my systems
  that I first discovered during gitlab installation:

  
  Here is the Text from the Gitlab Bug https://gitlab.com/gitlab-org/gitlab-ce/issues/51023
  ----------------------------------------------------------------------------------------------

  Steps to reproduce

  I still do not have all the steps together to reproduce, so far it is:
  apt install gitlab-ce and
  gitlab-rake backup:recovery
  Then it works for some time before it fails.

  What is the current bug behavior?

  I have a 12 hour old Installation of gitlab ce 11.2.3-ce.0 for debian
  stretch on a fresh debian stretch system together with our imported
  data. However it turns out that some gitlab related files contain Zero
  bytes instead of actual data.

  root@gitlab:~# xxd -l 16 /opt/gitlab/bin/gitlab-ctl
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  This behaviour is somewhat strange because it was working for a few
  minutes/hours. I did write a shell script to find out which files are
  affected of this memory loss. It turns out that only files located
  under /opt/gitlab are affected, if I rule out files like
  /var/log/faillog and some postgresql table files.

  What I find even stranger is that it does not seem to affect
  Logfiles/databases/git_repositorys but application files, like .rb
  scripts. and not all of them. No non gitlab package is affected.

  What is the expected correct behavior?
  Binarys and .rb files should stay as they are.

  Possible fixes

  I am still investigating, I hope that it is not an infrastructure problem (libvirt/qemu/glusterfs) it can still be one but the point that files of /opt/gitlab are affected and not any logfile and that we to not have similar problems with any other system leads me to the application for now.
  If I would have used docker the same problem might have caused a reboot of the container.
  But for the Debian package it is a bit of work to recover. That is all a workaround, however.
  ---------------------------------------------------------------------------------------------

  I do have found 2 more systems having the same problem with different
  software:

  root@erp:~# xxd -l 16 /usr/share/perl/5.26.2/constant.pm
  00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

  The Filesize itself is, compared with another machine 00001660 Bytes
  for both the corrupted and the intact file. It looks to me from the
  outside that if some data in the qcow2 file is written too many bytes
  get written so it sometimes overwites data of existing files located
  right after the position in memory where the write goes to.

  I would like to rule out Linux+Ext4 filesystems because I find it
  highly unlikely that such an error keeps undiscovered in that part of
  the environment for long. I think the same might go for qemu.

  Which leaves qemu, gemu+gluster:// mount, qcow2 volumes, glusterfs,
  network. So I am now going to check if I can find any system which
  gets its volumes via fusermount instead of gluster:// path if the
  error is gone there. This may take a while.

  
  ----- some software versions---------------

  QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  libvirt-daemon-driver-storage-gluster/testing,unstable,now 4.6.0-2
  amd64 [installed]

  ii  glusterfs-client   4.1.3-1        amd64

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1793904/+subscriptions


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-02  4:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <153763821634.24720.16203204034487714411.malonedeb@gac.canonical.com>
2019-04-21 11:05 ` [Qemu-devel] [Bug 1793904] Re: files are randomly overwritten by Zero Bytes Hans
2019-04-23  5:31 ` Hans
2019-04-29 19:58 ` Hans
2021-04-29  9:53 ` Thomas Huth
2021-05-02  5:50 ` Thomas Huth
2021-06-18 15:31 ` Thomas Huth
2021-07-02  4:17 ` Launchpad Bug Tracker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.