qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre arents <1912224@bugs.launchpad.net>
To: qemu-devel@nongnu.org
Subject: [Bug 1912224] Re: qemu may freeze during drive-mirroring on fragmented FS
Date: Tue, 02 Feb 2021 08:36:10 -0000	[thread overview]
Message-ID: <161225497153.13540.11108404669369396416.malone@soybean.canonical.com> (raw)
In-Reply-To: 161098039664.6686.1246044899603761821.malonedeb@wampee.canonical.com

I think the issue come from  SEEK_HOLE call.
SEEK_HOLE is fine until we find a hole close to the offset,
It becomes a very expensive call when the HOLE is at the
end of file of a big file (or smaller fragmented file 
because there is a lot of FS extent the driver should check.)

When we run a mirror on a 400GB raw file fully written (not fragmented):
for each 1MB chunk we do:
 mirror_iteration(MirrorBlockJob *s)
   -> bdrv_block_status_above(..)
      ... -> find_allocation (file-posix.c)
	 -> offs = lseek(s->fd, start, SEEK_HOLE);

In strace this result like this:
[pid 105339] 17:41:05.548334 lseek(38, 0, SEEK_HOLE) = 429496729600 <0.015172>
[pid 105339] 17:41:05.564798 lseek(38, 1048576, SEEK_HOLE) = 429496729600 <0.008762>
[pid 105339] 17:41:05.576223 lseek(38, 2097152, SEEK_HOLE) = 429496729600 <0.006250>
[pid 105339] 17:41:05.583299 lseek(38, 3145728, SEEK_HOLE) = 429496729600 <0.005511>
[pid 105339] 17:41:05.589771 lseek(38, 4194304, SEEK_HOLE) = 429496729600 <0.005181>
[pid 105339] 17:41:05.596390 lseek(38, 5242880, SEEK_HOLE) = 429496729600 <0.005829>
[pid 105339] 17:41:05.603473 lseek(38, 6291456, SEEK_HOLE) = 429496729600 <0.005276>
[pid 105339] 17:41:05.609833 lseek(38, 7340032, SEEK_HOLE) = 429496729600 <0.006089>

^^ for each MB FS driver is going accross all file extent till the end of the file,
 the qemu unstability comes from that.

Maybe one way to fix that is to not run SEEK_HOLE at each iteration
but run it only when needed.
Some thing like adding a property in MirrorBlockJob like hole_offest,
that store where is the last known offset where there is a hole.
And pass it on find_allocation and evaluate the need
to run SEEK_HOLE or not.

Like this:
typedef struct MirrorBlockJob {
...
   int64_t hole_offset; /* last known hole_offset during migration */
}

mirror_iteration(MirrorBlockJob *s)
 -> bdrv_block_status_above(...., &s->hole_offset)
   ...   
   -> find_allocation (...., hole_offest)
        evaluate offset and  hole_offest to run or not SEEK_HOLE.

Note this involve adding an additional arg to bdrv_block_status_above(), and we need to update
code for all driver.

Is there a better way to fix that issue ?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1912224

Title:
  qemu may freeze during drive-mirroring on fragmented FS

Status in QEMU:
  New

Bug description:
  
  We have odd behavior in operation where qemu freeze during long
  seconds, We started an thread about that issue here:
  https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05623.html

  It happens at least during openstack nova snapshot (qemu blockdev-mirror)
  or live block migration(which include network copy of disk).

  After further troubleshoots, it seems related to FS fragmentation on
  host.

  reproducible at least on:
  Ubuntu 18.04.3/4.18.0-25-generic/qemu-4.0
  Ubuntu 16.04.6/5.10.6/qemu-5.2.0-rc2

  # Lets create a dedicated file system on a SSD/Nvme 60GB disk in my case:
  $sudo mkfs.ext4 /dev/sda3
  $sudo mount /dev/sda3 /mnt
  $df -h /mnt
  Filesystem      Size  Used Avail Use% Mounted on
  /dev/sda3         59G   53M   56G   1% /mnt

  #Create a fragmented disk on it using 2MB Chunks (about 30min):
  $sudo python3 create_fragged_disk.py /mnt 2
  Filling up FS by Creating chunks files in:  /mnt/chunks
  We are probably full as expected!!:  [Errno 28] No space left on device
  Creating fragged disk file:  /mnt/disk

  $ls -lhs 
  59G -rw-r--r-- 1 root root 59G Jan 15 14:08 /mnt/disk

  $ sudo e4defrag -c /mnt/disk
   Total/best extents                             41971/30
   Average size per extent                        1466 KB
   Fragmentation score                            2
   [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
   This file (/mnt/disk) does not need defragmentation.
   Done.

  # the tool^^^ says it is not enough fragmented to be able to defrag.

  #Inject an image on fragmented disk
  sudo chown ubuntu /mnt/disk
  wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
  qemu-img convert -O raw  bionic-server-cloudimg-amd64.img \
                           bionic-server-cloudimg-amd64.img.raw
  dd conv=notrunc iflag=fullblock if=bionic-server-cloudimg-amd64.img.raw \
                  of=/mnt/disk bs=1M
  virt-customize -a /mnt/disk --root-password password:xxxx

  # logon run console activity ex: ping -i 0.3 127.0.0.1
  $qemu-system-x86_64 -m 2G -enable-kvm  -nographic \
      -chardev socket,id=test,path=/tmp/qmp-monitor,server,nowait \
      -mon chardev=test,mode=control \
      -drive file=/mnt/disk,format=raw,if=none,id=drive-virtio-disk0,cache=none,discard\
      -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on

  $sync
  $echo 3 | sudo tee -a /proc/sys/vm/drop_caches

  #start drive-mirror via qmp on another SSD/nvme partition
  nc -U /tmp/qmp-monitor
  {"execute":"qmp_capabilities"}
  {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/home/ubuntu/mirror","sync":"full","format":"qcow2"}}
  ^^^ qemu console may start to freeze at this step.

  NOTE:
   - smaller chunk sz and bigger disk size the worst it is.
     In operation we also have issue on 400GB disk size with average 13MB/extent
   - Reproducible also on xfs

  
  Expected behavior:
  -------------------
  QEMU should remain steady, eventually only have decrease storage Performance
  or mirroring, because of fragmented fs.

  Observed behavior:
  -------------------
  Perf of mirroring is still quite good even on fragmented FS,
  but it breaks qemu.

  
  ######################  create_fragged_disk.py ############
  import sys
  import os
  import tempfile
  import glob
  import errno

  MNT_DIR = sys.argv[1]
  CHUNK_SZ_MB = int(sys.argv[2])
  CHUNKS_DIR = MNT_DIR + '/chunks'
  DISK_FILE = MNT_DIR + '/disk'

  if not os.path.exists(CHUNKS_DIR):
      os.makedirs(CHUNKS_DIR)

  with open("/dev/urandom", "rb") as f_rand:
       mb_rand=f_rand.read(1024 * 1024)

  print("Filling up FS by Creating chunks files in: ",CHUNKS_DIR)
  try:
      while True:
          tp = tempfile.NamedTemporaryFile(dir=CHUNKS_DIR,delete=False)
          for x in range(CHUNK_SZ_MB):
              tp.write(mb_rand)
          os.fsync(tp)
          tp.close()
  except Exception as ex:
      print("We are probably full as expected!!: ",ex)

  chunks = glob.glob(CHUNKS_DIR + '/*')

  print("Creating fragged disk file: ",DISK_FILE)
  with open(DISK_FILE, "w+b") as f_disk:
      for chunk in chunks:
          try:
              os.unlink(chunk)
              for x in range(CHUNK_SZ_MB):
                  f_disk.write(mb_rand)
              os.fsync(f_disk)
          except IOError as ex:
              if ex.errno != errno.ENOSPC:
                  raise
  ###########################################################3

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1912224/+subscriptions


  parent reply	other threads:[~2021-02-02  8:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18 14:33 [Bug 1912224] [NEW] qemu may freeze during drive-mirroring on fragmented FS Alexandre arents
2021-01-22 13:19 ` [Bug 1912224] " Alexandre arents
2021-01-25  9:34 ` Alexandre arents
2021-02-02  8:36 ` Alexandre arents [this message]
2021-02-03 10:02 ` Max Reitz
2021-02-03 12:24 ` Max Reitz
2021-02-03 14:21 ` Alexandre arents
2021-02-03 16:51 ` Max Reitz
2021-02-04  9:11 ` Alexandre arents
2021-02-10 13:36 ` Alexandre arents
2021-05-12 18:13 ` Thomas Huth
2021-05-14 19:29 ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=161225497153.13540.11108404669369396416.malone@soybean.canonical.com \
    --to=1912224@bugs.launchpad.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).