Hi Lukas, Thanks for your reply. However, we test the question 1 with steps below the error message, we notice the secondary VM's image will break while it reboots. Here is the error message. ------------------------------------------------------------------- [ 1.280299] XFS (sda1): Mounting V5 Filesystem [ 1.428418] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2 [ 1.501320] XFS (sda1): Starting recovery (logdev: internal) [ 1.504076] tsc: Refined TSC clocksource calibration: 3492.211 MHz [ 1.505534] Switched to clocksource tsc [ 2.031027] XFS (sda1): Internal error XFS_WANT_CORRUPTED_GOTO at line 1635 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_extent+0xfc/0x130 [xfs] [ 2.032743] CPU: 0 PID: 300 Comm: mount Not tainted 3.10.0-693.11.6.el7.x86_64 #1 [ 2.033982] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 2.035882] Call Trace: [ 2.036494] [] dump_stack+0x19/0x1b [ 2.037315] [] xfs_error_report+0x3b/0x40 [xfs] [ 2.038150] [] ? xfs_free_extent+0xfc/0x130 [xfs] [ 2.039046] [] xfs_free_ag_extent+0x20a/0x780 [xfs] [ 2.039920] [] xfs_free_extent+0xfc/0x130 [xfs] [ 2.040768] [] xfs_trans_free_extent+0x26/0x60 [xfs] [ 2.041642] [] xlog_recover_process_efi+0x17e/0x1c0 [xfs] [ 2.042558] [] xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs] [ 2.043771] [] xlog_recover_finish+0x21/0xb0 [xfs] [ 2.044650] [] xfs_log_mount_finish+0x34/0x50 [xfs] [ 2.045518] [] xfs_mountfs+0x5d1/0x8b0 [xfs] [ 2.046341] [] ? xfs_filestream_get_parent+0x80/0x80 [xfs] [ 2.047260] [] xfs_fs_fill_super+0x3bb/0x4d0 [xfs] [ 2.048116] [] mount_bdev+0x1b0/0x1f0 [ 2.048881] [] ? xfs_test_remount_options.isra.11+0x70/0x70 [xfs] [ 2.050105] [] xfs_fs_mount+0x15/0x20 [xfs] [ 2.050906] [] mount_fs+0x39/0x1b0 [ 2.051963] [] ? __alloc_percpu+0x15/0x20 [ 2.059431] [] vfs_kern_mount+0x67/0x110 [ 2.060283] [] do_mount+0x233/0xaf0 [ 2.061081] [] ? strndup_user+0x4b/0xa0 [ 2.061844] [] SyS_mount+0x96/0xf0 [ 2.062619] [] system_call_fastpath+0x16/0x1b [ 2.063512] XFS (sda1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c. Caller xlog_recover_process_efi+0x18e/0x1c0 [xfs] [ 2.065260] CPU: 0 PID: 300 Comm: mount Not tainted 3.10.0-693.11.6.el7.x86_64 #1 [ 2.066489] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 2.068023] Call Trace: [ 2.068590] [] dump_stack+0x19/0x1b [ 2.069403] [] xfs_error_report+0x3b/0x40 [xfs] [ 2.070318] [] ? xlog_recover_process_efi+0x18e/0x1c0 [xfs] [ 2.071538] [] xfs_trans_cancel+0xbd/0xe0 [xfs] [ 2.072429] [] xlog_recover_process_efi+0x18e/0x1c0 [xfs] [ 2.073339] [] xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs] [ 2.074561] [] xlog_recover_finish+0x21/0xb0 [xfs] [ 2.075421] [] xfs_log_mount_finish+0x34/0x50 [xfs] [ 2.076301] [] xfs_mountfs+0x5d1/0x8b0 [xfs] [ 2.077128] [] ? xfs_filestream_get_parent+0x80/0x80 [xfs] [ 2.078049] [] xfs_fs_fill_super+0x3bb/0x4d0 [xfs] [ 2.078900] [] mount_bdev+0x1b0/0x1f0 [ 2.079667] [] ? xfs_test_remount_options.isra.11+0x70/0x70 [xfs] [ 2.080883] [] xfs_fs_mount+0x15/0x20 [xfs] [ 2.081687] [] mount_fs+0x39/0x1b0 [ 2.082457] [] ? __alloc_percpu+0x15/0x20 [ 2.083258] [] vfs_kern_mount+0x67/0x110 [ 2.084057] [] do_mount+0x233/0xaf0 [ 2.084797] [] ? strndup_user+0x4b/0xa0 [ 2.085568] [] SyS_mount+0x96/0xf0 [ 2.086324] [] system_call_fastpath+0x16/0x1b [ 2.087161] XFS (sda1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c. Return address = 0xffffffffc0195966 [ 2.088795] XFS (sda1): Corruption of in-memory data detected. Shutting down filesystem [ 2.090273] XFS (sda1): Please umount the filesystem and rectify the problem(s) [ 2.091519] XFS (sda1): Failed to recover EFIs [ 2.092299] XFS (sda1): log mount finish failed [FAILED] Failed to mount /sysroot. . . . Generating "/run/initramfs/rdsosreport.txt" [ 2.178103] blk_update_request: I/O error, dev fd0, sector 0 [ 2.246106] blk_update_request: I/O error, dev fd0, sector 0 ------------------------------------------------------------------- Here is the replicated steps: *1. Start primary VM with command, and do every thing you want on PVM* qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 2048 -qmp stdio -vnc :5 \ -device piix3-usb-uhci,id=puu -device usb-tablet,id=ut -name primary \ -netdev tap,id=hn0,vhost=off,helper=/usr/local/ceph/libexec/qemu-bridge-helper \ -device rtl8139,id=e0,netdev=hn0 \ -drive if=ide,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0.file.filename=$image_path,children.0.driver=qcow2 *2. Add the device and object to PVM with qmp command* {'execute':'qmp_capabilities'} {"execute":"chardev-add", "arguments":{ "id" : "mirror0", "backend" : { "type" : "socket", "data" : { "server": true, "wait": false, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9003" } } } } }} {"execute":"chardev-add", "arguments":{ "id" : "compare1", "backend" : { "type" : "socket", "data" : { "server": true, "wait": true, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9004" } } } } }} {"execute":"chardev-add", "arguments":{ "id" : "compare0", "backend" : { "type" : "socket", "data" : { "server": true, "wait": false, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9001" } } } } }} {"execute":"chardev-add", "arguments":{ "id" : "compare0-0", "backend" : { "type" : "socket", "data" : { "server": false, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9001" } } } } }} {"execute":"chardev-add", "arguments":{ "id" : "compare_out", "backend" : { "type" : "socket", "data" : { "server": true, "wait": false, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9005" } } } } }} {"execute":"chardev-add", "arguments":{ "id" : "compare_out0", "backend" : { "type" : "socket", "data" : { "server": false, "addr": { "type": "inet", "data":{ "host": "127.0.0.1", "port": "9005" } } } } } } {"execute":"object-add", "arguments":{ "qom-type" : "filter-mirror", "id" : "m0", "props": { "netdev": "hn0", "outdev" : "mirror0", "queue" : "tx" } } } {"execute":"object-add", "arguments":{ "qom-type" : "filter-redirector", "id" : "redire0", "props": { "netdev": "hn0", "indev" : "compare_out", "queue" : "rx" } } } {"execute":"object-add", "arguments":{ "qom-type" : "filter-redirector", "id" : "redire1", "props": { "netdev": "hn0", "outdev" : "compare0", "queue" : "rx" } } } {"execute":"object-add", "arguments":{ "qom-type" : "iothread", "id" : "iothread1", "props": {} } } {"execute":"object-add", "arguments":{ "qom-type" : "colo-compare", "id" : "comp0", "props": { "primary_in" : "compare0-0", "secondary_in" : "compare1", "outdev" : "compare_out0", "iothread" : "iothread1"} } } *3. Start the secondary VM with command* qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 2048 -qmp stdio \ -vnc :6 -device piix3-usb-uhci -device usb-tablet -name secondary \ -netdev tap,id=hn0,vhost=off,helper=/usr/local/ceph/libexec/qemu-bridge-helper \ -device rtl8139,id=e0,netdev=hn0 \ -chardev socket,id=red0,host=127.0.0.1,port=9003,reconnect=1 \ -chardev socket,id=red1,host=127.0.0.1,port=9004,reconnect=1 \ -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ -object filter-rewriter,id=rew0,netdev=hn0,queue=all \ -drive if=none,id=colo-disk0,file.filename=$image_path,driver=qcow2,\ node-name=node1 \ -drive if=ide,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,\ top-id=active-disk0,file.file.filename=active-disk.qcow2,\ file.backing.driver=qcow2,file.backing.file.filename=hidden-disk.qcow2,\ file.backing.backing=colo-disk0,node-name=node2 \ -incoming tcp:0:9998 *4. As the document create rbd server and do migrate with qmp command* [image: image.png] *5. Kill the PVM and failover to SVM* [image: image.png] *6. Reboot the secondary VM, then we will get the error.* It is high possibility to occur this error. Therefore, we can solve the image problem by *xfs_repair*, then reboot the VM it will work. Command: xfs_repair -L /dev/sda1 Do you have any idea to occur this problem? Best regard, Daniel Cho. Lukas Straub 於 2019年11月5日 週二 上午2:37寫道: > On Thu, 31 Oct 2019 17:05:20 +0800 > Daniel Cho wrote: > > > Hello all, > > I have some questions about the COLO. > > 1) Could we dynamic set fault tolerance feature on running VM? > > In your document, the primary VM could not start first (if you start > > primary VM, the secondary VM will need to start), it means to if I > > want this VM with fault-tolerance feature, it needs to be set while > > we boot it. > > Hi Daniel, > Yes, this is possible as long you have a quorum block node. The rest > can be added while running. > > > 2) If primary VM or secondary VM broke, could we start the third VM > > to keep fault tolerance feature? > > I'm currently working on this, see my latest PATCH series here: > > https://lore.kernel.org/qemu-devel/cover.1571925699.git.lukasstraub2@web.de/ > > > > > Best regard, > > Daniel Cho. > >