From mboxrd@z Thu Jan 1 00:00:00 1970 From: Srinivas Eeda Date: Thu, 21 Jan 2016 10:14:53 -0800 Subject: [Ocfs2-devel] OCFS2 causing system instability In-Reply-To: References: <569F4A74.4020309@oracle.com> Message-ID: <56A1201D.9050704@oracle.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com Hi Guy, On 01/21/2016 09:46 AM, Guy 2212112 wrote: > Hi, > First, I'm well aware that OCFS2 is not a distributed file system, but > a shared, clustered file system. This is the main reason that I want > to use it - access the same filesystem from multiple nodes. > I've checked the latest Kernel 4.4 release that include the > "errors=continue" option and installed also (manually) the patch > described in this thread - "[PATCH V2] ocfs2: call ocfs2_abort when > journal abort" . > > Unfortunately the issues I've described where not solved. > > Also, I understand that OCFS2 relies on the SAN availability and is > not replicating the data to other locations (like a distributed file > system), so I don't expect to be able to access the data when a > disk/volume is not accessible (for example because of hardware failure). > > In other filesystems, clustered or even local, when a disk/volume > fails - this and only this disk/volume cannot be accessed - and all > the other filesystems continue to function and can accessed and the > whole system stability is definitely not compromised. > > Of course, I can understand that if this specific disk/volume contains > the operating system it probably cause a panic/reboot, or if the > disk/volume is used by the cluster as heartbeat, it may influence the > whole cluster - if it's the only way the nodes in the cluster are > using to communicate between themselves. > > The configuration I use rely on Global heartbeat on three different > dedicated disks and the "simulated error" is on an additional,fourth > disk that doesn't include a heartbeat. By design, this should have worked fine and by design, even if one or more hb disk is failing systems should have survived as long as more than n/2 hb disks are good(where n stands for number of global hb disks <= number of fs disks) So, this looks like a bug and needs to be looked into. I logged a bz to track this https://oss.oracle.com/bugzilla/show_bug.cgi?id=1362 ( I modified your description as I was running into some troubles bz application) > > Errors may occur on storage arrays and if I'm connecting my OCFS2 > cluster to 4 storage arrays with each 10 disks/volumes, I don't expect > that the whole OCFS2 cluster will fail when only one array is down. I > still expect that the other 30 disks from the other 3 remaining arrays > will continue working. > Of course, I will not have any access to the failed array disks. > > I hope this describes better the situation, > > Thanks, > > Guy > > On Wed, Jan 20, 2016 at 10:51 AM, Junxiao Bi > wrote: > > Hi Guy, > > ocfs2 is shared-disk fs, there is no way to do replication like dfs, > also no volume manager integrated in ocfs2. Ocfs2 depends on > underlying > storage stack to handler disk failure, so you can configure multipath, > raid or storage to handle removing disk issue. If io error is still > reported to ocfs2, then there is no way to workaround, ocfs2 will > be set > read-only or even panic to avoid fs corruption. This is the same > behavior with local fs. > If io error not reported to ocfs2, then there is a fix i just > posted to > ocfs2-devel to avoid the node panic, please try patch serial [ocfs2: > o2hb: not fence self if storage down]. Note this is only useful to > o2cb > stack. Nodes will hung on io and wait storage online again. > > For the endless loop you met in "Appendix A1", it is a bug and > fixed by > "[PATCH V2] ocfs2: call ocfs2_abort when journal abort", you can > get it > from ocfs2-devel. This patch will set fs readonly or panic node > since io > error have been reported to ocfs2. > > Thanks, > Junxiao. > > On 01/20/2016 03:19 AM, Guy 1234 wrote: > > Dear OCFS2 guys, > > > > > > > > My name is Guy, and I'm testing ocfs2 due to its features as a > clustered > > filesystem that I need. > > > > As part of the stability and reliability test I?ve performed, I've > > encountered an issue with ocfs2 (format + mount + remove > disk...), that > > I wanted to make sure it is a real issue and not just a > mis-configuration. > > > > > > > > The main concern is that the stability of the whole system is > > compromised when a single disk/volumes fails. It looks like the > OCFS2 is > > not handling the error correctly but stuck in an endless loop that > > interferes with the work of the server. > > > > > > > > I?ve test tested two cluster configurations ? (1) > Corosync/Pacemaker and > > (2) o2cb that react similarly. > > > > Following the process and log entries: > > > > > > Also below additional configuration that were tested. > > > > > > Node 1: > > > > ======= > > > > 1. service corosync start > > > > 2. service dlm start > > > > 3. mkfs.ocfs2 -v -Jblock64 -b 4096 --fs-feature-level=max-features > > --cluster-=pcmk --cluster-name=cluster-name -N 2 /dev/ device> > > > > 4. mount -o > > > rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk > > /dev/ /mnt/ocfs2-mountpoint > > > > > > > > Node 2: > > > > ======= > > > > 5. service corosync start > > > > 6. service dlm start > > > > 7. mount -o > > > rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk > > /dev/ /mnt/ocfs2-mountpoint > > > > > > > > So far all is working well, including reading and writing. > > > > Next > > > > 8. I?ve physically, pull out the disk at /dev/ to > > simulate a hardware failure (that may occur?) , in real life the > disk is > > (hardware or software) protected. Nonetheless, I?m testing a > hardware > > failure that the one of the OCFS2 file systems in my server fails. > > > > Following - messages observed in the system log (see below) and > > > > ==> 9. kernel panic(!) ... in one of the nodes or on both, or > reboot on > > one of the nodes or both. > > > > > > Is there any configuration or set of parameters that will enable the > > system to continue working, disabling the access to the failed disk > > without compromising the system stability and not cause the > kernel to > > panic?! > > > > > > > > From my point of view it looks basics ? when a hardware failure > occurs: > > > > 1. All remaining hardware should continue working > > > > 2. The failed disk/volume should be inaccessible ? but not > compromise > > the whole system availability (Kernel panic). > > > > 3. OCFS2 ?understands? there?s a failed disk and stop trying to > access it. > > > > 3. All disk commands such as mount/umount, df etc. should > continue working. > > > > 4. When a new/replacement drive is connected to the system, it > can be > > accessed. > > > > My settings: > > > > ubuntu 14.04 > > > > linux: 3.16.0-46-generic > > > > mkfs.ocfs2 1.8.4 (downloaded from git) > > > > > > > > > > > > Some other scenarios which also were tested: > > > > 1. Remove the max-features in the mkfs (i.e. mkfs.ocfs2 -v > -Jblock64 -b > > 4096 --cluster-stack=pcmk --cluster-name=cluster-name -N 2 > /dev/ > device>) > > > > This improved in some of the cases with no kernel panic but > still the > > stability of the system was compromised, the syslog indicates that > > something unrecoverable is going on (See below - Appendix A1). > > Furthermore, System is hanging when trying to software reboot. > > > > 2. Also tried with the o2cb stack, with similar outcomes. > > > > 3. The configuration was also tested with (1,2 and 3) Local and > Global > > heartbeat(s) that were NOT on the simulated failed disk, but on > other > > physical disks. > > > > 4. Also tested: > > > > Ubuntu 15.15 > > > > Kernel: 4.2.0-23-generic > > > > mkfs.ocfs2 1.8.4 (git clone > git://oss.oracle.com/git/ocfs2-tools.git > > > ) > > > > > > > > > > > > ============== > > > > Appendix A1: > > > > ============== > > > > from syslog: > > > > [ 1676.608123] (ocfs2cmt,5316,14):ocfs2_commit_thread:2195 > ERROR: status > > = -5, journal is already aborted. > > > > [ 1677.611827] (ocfs2cmt,5316,14):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1678.616634] (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1679.621419] (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1680.626175] (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1681.630981] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1682.107356] INFO: task kworker/u64:0:6 blocked for more than > 120 seconds. > > > > [ 1682.108440] Not tainted 3.16.0-46-generic #62~14.04.1 > > > > [ 1682.109388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > > > [ 1682.110381] kworker/u64:0 D ffff88103fcb30c0 0 6 2 > > 0x00000000 > > > > [ 1682.110401] Workqueue: fw_event0 _firmware_event_work [mpt3sas] > > > > [ 1682.110405] ffff88102910b8a0 0000000000000046 ffff88102977b2f0 > > 00000000000130c0 > > > > [ 1682.110411] ffff88102910bfd8 00000000000130c0 ffff88102928c750 > > ffff88201db284b0 > > > > [ 1682.110415] ffff88201db28000 ffff881028cef000 ffff88201db28138 > > ffff88201db28268 > > > > [ 1682.110419] Call Trace: > > > > [ 1682.110427] [] schedule+0x29/0x70 > > > > [ 1682.110458] [] > ocfs2_clear_inode+0x3b1/0xa30 [ocfs2] > > > > [ 1682.110464] [] ? > prepare_to_wait_event+0x100/0x100 > > > > [ 1682.110487] [] > ocfs2_evict_inode+0x6e/0x730 [ocfs2] > > > > [ 1682.110493] [] evict+0xb4/0x180 > > > > [ 1682.110498] [] dispose_list+0x39/0x50 > > > > [ 1682.110501] [] invalidate_inodes+0x134/0x150 > > > > [ 1682.110506] [] __invalidate_device+0x3a/0x60 > > > > [ 1682.110510] [] invalidate_partition+0x31/0x50 > > > > [ 1682.110513] [] del_gendisk+0xf5/0x290 > > > > [ 1682.110519] [] sd_remove+0x61/0xc0 > > > > [ 1682.110524] [] > __device_release_driver+0x7f/0xf0 > > > > [ 1682.110529] [] device_release_driver+0x23/0x30 > > > > [ 1682.110534] [] bus_remove_device+0x108/0x180 > > > > [ 1682.110538] [] device_del+0x129/0x1c0 > > > > [ 1682.110543] [] __scsi_remove_device+0xd5/0xe0 > > > > [ 1682.110547] [] scsi_remove_device+0x26/0x40 > > > > [ 1682.110551] [] scsi_remove_target+0x170/0x230 > > > > [ 1682.110561] [] sas_rphy_remove+0x65/0x80 > > [scsi_transport_sas] > > > > [ 1682.110570] [] sas_port_delete+0x2d/0x170 > > [scsi_transport_sas] > > > > [ 1682.110575] [] ? sysfs_remove_link+0x19/0x30 > > > > [ 1682.110588] [] > > mpt3sas_transport_port_remove+0x1c9/0x1e0 [mpt3sas] > > > > [ 1682.110598] [] _scsih_remove_device+0x55/0x80 > > [mpt3sas] > > > > [ 1682.110610] [] > > _scsih_device_remove_by_handle.part.21+0x79/0xa0 [mpt3sas] > > > > [ 1682.110619] [] > _firmware_event_work+0x1337/0x1690 > > [mpt3sas] > > > > [ 1682.110626] [] ? native_sched_clock+0x35/0x90 > > > > [ 1682.110630] [] ? sched_clock+0x9/0x10 > > > > [ 1682.110636] [] ? __switch_to+0xe4/0x580 > > > > [ 1682.110640] [] ? > pwq_activate_delayed_work+0x39/0x80 > > > > [ 1682.110644] [] process_one_work+0x182/0x450 > > > > [ 1682.110648] [] worker_thread+0x121/0x570 > > > > [ 1682.110652] [] ? rescuer_thread+0x380/0x380 > > > > [ 1682.110657] [] kthread+0xc9/0xe0 > > > > [ 1682.110662] [] ? > kthread_create_on_node+0x1c0/0x1c0 > > > > [ 1682.110667] [] ret_from_fork+0x58/0x90 > > > > [ 1682.110672] [] ? > kthread_create_on_node+0x1c0/0x1c0 > > > > [ 1682.635761] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1683.640549] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1684.645336] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1685.650114] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1686.654911] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1687.659684] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1688.664466] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1689.669252] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1690.674026] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1691.678810] (ocfs2cmt,5316,9):ocfs2_commit_cache:324 ERROR: > status = -5 > > > > [ 1691.679920] (ocfs2cmt,5316,9):ocfs2_commit_thread:2195 ERROR: > status > > = -5, journal is already aborted. > > > > > > > > Thanks in advance, > > > > Guy > > > > > > > > _______________________________________________ > > Ocfs2-devel mailing list > > Ocfs2-devel at oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > > > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20160121/d6ee5a9a/attachment-0001.html