* WARNING in xfs_lwr.c, xfs_write() @ 2010-05-23 5:20 ` Roman Kononov 0 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-05-23 5:20 UTC (permalink / raw) To: xfs, linux-kernel Under some workload, once per ~10 seconds, I'm getting the following warnings with 2.6.32.13 and 2.6.33.4 (x86_64). Why are they? Thanks. May 22 23:53:13 hrech kernel: WARNING: at /home/stuff/base/linux-2.6.32.13/fs/xfs/linux-2.6/xfs_lrw.c:714 xfs_write+0x8a2/0x8c0() May 22 23:53:13 hrech kernel: Modules linked in: ib_mthca sata_nv 3w_9xxx May 22 23:53:13 hrech kernel: Pid: 30650, comm: postmaster Not tainted 2.6.32.13 #2 May 22 23:53:13 hrech kernel: Call Trace: May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff8103a775>] ? warn_slowpath_common+0x85/0xb0 May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff811b7293>] ? cpumask_next_and+0x23/0x40 May 22 23:53:13 hrech kernel: [<ffffffff81036826>] ? select_task_rq_fair+0x326/0x6a0 May 22 23:53:13 hrech kernel: [<ffffffff810a7869>] ? do_sync_write+0xd9/0x120 May 22 23:53:13 hrech kernel: [<ffffffff8104ef20>] ? autoremove_wake_function+0x0/0x30 May 22 23:53:13 hrech kernel: [<ffffffff81037b1d>] ? wake_up_new_task+0x9d/0xc0 May 22 23:53:13 hrech kernel: [<ffffffff81039b02>] ? do_fork+0x102/0x330 May 22 23:53:13 hrech kernel: [<ffffffff810a8088>] ? vfs_write+0xc8/0x180 May 22 23:53:13 hrech kernel: [<ffffffff810a88a1>] ? sys_pwrite64+0x91/0xa0 May 22 23:53:13 hrech kernel: [<ffffffff8100bc6b>] ? system_call_fastpath+0x16/0x1b May 22 23:53:13 hrech kernel: ---[ end trace 615b846a6bbdf833 ]--- May 22 09:06:25 hrech kernel: WARNING: at /home/stuff/base/linux-2.6.33.4/fs/xfs/linux-2.6/xfs_lrw.c:651 xfs_write+0x961/0x970() May 22 09:06:25 hrech kernel: Modules linked in: dm_mod ib_mthca sata_nv 3w_9xxx May 22 09:06:25 hrech kernel: Pid: 1937, comm: postmaster Not tainted 2.6.33.4 #2 May 22 09:06:25 hrech kernel: Call Trace: May 22 09:06:25 hrech kernel: [<ffffffff81036bb3>] ? warn_slowpath_common+0x73/0xb0 May 22 09:06:25 hrech kernel: [<ffffffff8119cce1>] ? xfs_write+0x961/0x970 May 22 09:06:25 hrech kernel: [<ffffffff810b2b3f>] ? do_sync_write+0xbf/0x100 May 22 09:06:25 hrech kernel: [<ffffffff81030a52>] ? wake_up_new_task+0xc2/0xe0 May 22 09:06:25 hrech kernel: [<ffffffff81035f60>] ? do_fork+0xf0/0x380 May 22 09:06:25 hrech kernel: [<ffffffff810b3236>] ? vfs_write+0xb6/0x170 May 22 09:06:25 hrech kernel: [<ffffffff810b36a3>] ? sys_pwrite64+0x83/0xa0 May 22 09:06:25 hrech kernel: [<ffffffff81002ceb>] ? system_call_fastpath+0x16/0x1b May 22 09:06:25 hrech kernel: ---[ end trace 62b123c1948e55fa ]--- ^ permalink raw reply [flat|nested] 33+ messages in thread
* WARNING in xfs_lwr.c, xfs_write() @ 2010-05-23 5:20 ` Roman Kononov 0 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-05-23 5:20 UTC (permalink / raw) To: xfs, linux-kernel Under some workload, once per ~10 seconds, I'm getting the following warnings with 2.6.32.13 and 2.6.33.4 (x86_64). Why are they? Thanks. May 22 23:53:13 hrech kernel: WARNING: at /home/stuff/base/linux-2.6.32.13/fs/xfs/linux-2.6/xfs_lrw.c:714 xfs_write+0x8a2/0x8c0() May 22 23:53:13 hrech kernel: Modules linked in: ib_mthca sata_nv 3w_9xxx May 22 23:53:13 hrech kernel: Pid: 30650, comm: postmaster Not tainted 2.6.32.13 #2 May 22 23:53:13 hrech kernel: Call Trace: May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff8103a775>] ? warn_slowpath_common+0x85/0xb0 May 22 23:53:13 hrech kernel: [<ffffffff8118baf2>] ? xfs_write+0x8a2/0x8c0 May 22 23:53:13 hrech kernel: [<ffffffff811b7293>] ? cpumask_next_and+0x23/0x40 May 22 23:53:13 hrech kernel: [<ffffffff81036826>] ? select_task_rq_fair+0x326/0x6a0 May 22 23:53:13 hrech kernel: [<ffffffff810a7869>] ? do_sync_write+0xd9/0x120 May 22 23:53:13 hrech kernel: [<ffffffff8104ef20>] ? autoremove_wake_function+0x0/0x30 May 22 23:53:13 hrech kernel: [<ffffffff81037b1d>] ? wake_up_new_task+0x9d/0xc0 May 22 23:53:13 hrech kernel: [<ffffffff81039b02>] ? do_fork+0x102/0x330 May 22 23:53:13 hrech kernel: [<ffffffff810a8088>] ? vfs_write+0xc8/0x180 May 22 23:53:13 hrech kernel: [<ffffffff810a88a1>] ? sys_pwrite64+0x91/0xa0 May 22 23:53:13 hrech kernel: [<ffffffff8100bc6b>] ? system_call_fastpath+0x16/0x1b May 22 23:53:13 hrech kernel: ---[ end trace 615b846a6bbdf833 ]--- May 22 09:06:25 hrech kernel: WARNING: at /home/stuff/base/linux-2.6.33.4/fs/xfs/linux-2.6/xfs_lrw.c:651 xfs_write+0x961/0x970() May 22 09:06:25 hrech kernel: Modules linked in: dm_mod ib_mthca sata_nv 3w_9xxx May 22 09:06:25 hrech kernel: Pid: 1937, comm: postmaster Not tainted 2.6.33.4 #2 May 22 09:06:25 hrech kernel: Call Trace: May 22 09:06:25 hrech kernel: [<ffffffff81036bb3>] ? warn_slowpath_common+0x73/0xb0 May 22 09:06:25 hrech kernel: [<ffffffff8119cce1>] ? xfs_write+0x961/0x970 May 22 09:06:25 hrech kernel: [<ffffffff810b2b3f>] ? do_sync_write+0xbf/0x100 May 22 09:06:25 hrech kernel: [<ffffffff81030a52>] ? wake_up_new_task+0xc2/0xe0 May 22 09:06:25 hrech kernel: [<ffffffff81035f60>] ? do_fork+0xf0/0x380 May 22 09:06:25 hrech kernel: [<ffffffff810b3236>] ? vfs_write+0xb6/0x170 May 22 09:06:25 hrech kernel: [<ffffffff810b36a3>] ? sys_pwrite64+0x83/0xa0 May 22 09:06:25 hrech kernel: [<ffffffff81002ceb>] ? system_call_fastpath+0x16/0x1b May 22 09:06:25 hrech kernel: ---[ end trace 62b123c1948e55fa ]--- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-23 5:20 ` Roman Kononov @ 2010-05-23 10:18 ` Dave Chinner -1 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-05-23 10:18 UTC (permalink / raw) To: Roman Kononov; +Cc: xfs, linux-kernel On Sun, May 23, 2010 at 12:20:23AM -0500, Roman Kononov wrote: > Under some workload, once per ~10 seconds, I'm getting the following warnings > with 2.6.32.13 and 2.6.33.4 (x86_64). Why are they? You've got some workload that is mixing direct IO writes with some form of buffered or mmap IO on the same file and they are racing. Mixing different types of IO on the one inode is also known as A Really Bad Idea because there is no guarantee of coherency between them.... Can you find out what the application is triggering this? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-05-23 10:18 ` Dave Chinner 0 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-05-23 10:18 UTC (permalink / raw) To: Roman Kononov; +Cc: linux-kernel, xfs On Sun, May 23, 2010 at 12:20:23AM -0500, Roman Kononov wrote: > Under some workload, once per ~10 seconds, I'm getting the following warnings > with 2.6.32.13 and 2.6.33.4 (x86_64). Why are they? You've got some workload that is mixing direct IO writes with some form of buffered or mmap IO on the same file and they are racing. Mixing different types of IO on the one inode is also known as A Really Bad Idea because there is no guarantee of coherency between them.... Can you find out what the application is triggering this? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-23 10:18 ` Dave Chinner @ 2010-05-23 14:23 ` Roman Kononov -1 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-05-23 14:23 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-kernel On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > You've got some workload that is mixing direct IO writes with some > form of buffered or mmap IO on the same file and they are racing. > Mixing different types of IO on the one inode is also known as A > Really Bad Idea because there is no guarantee of coherency between > them.... > > Can you find out what the application is triggering this? This is severely modified Postgresql, which does mix direct IO with buffered one. You say "they are racing". Do you mean that this can cause file system corruption? Doest it simply warn that direct user data races with buffered user data and one of them wins? This warning "taints" the kernel. Should it be safe to do different types of IOs on different non-overlapping 4-KiB-aligned regions of the same file (I am unsure if this is what the application really does)? Thanks, Roman ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-05-23 14:23 ` Roman Kononov 0 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-05-23 14:23 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-kernel, xfs On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > You've got some workload that is mixing direct IO writes with some > form of buffered or mmap IO on the same file and they are racing. > Mixing different types of IO on the one inode is also known as A > Really Bad Idea because there is no guarantee of coherency between > them.... > > Can you find out what the application is triggering this? This is severely modified Postgresql, which does mix direct IO with buffered one. You say "they are racing". Do you mean that this can cause file system corruption? Doest it simply warn that direct user data races with buffered user data and one of them wins? This warning "taints" the kernel. Should it be safe to do different types of IOs on different non-overlapping 4-KiB-aligned regions of the same file (I am unsure if this is what the application really does)? Thanks, Roman _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-23 14:23 ` Roman Kononov @ 2010-05-24 1:19 ` Dave Chinner -1 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-05-24 1:19 UTC (permalink / raw) To: Roman Kononov; +Cc: xfs, linux-kernel On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > > You've got some workload that is mixing direct IO writes with some > > form of buffered or mmap IO on the same file and they are racing. > > Mixing different types of IO on the one inode is also known as A > > Really Bad Idea because there is no guarantee of coherency between > > them.... > > > > Can you find out what the application is triggering this? > > This is severely modified Postgresql, which does mix direct IO with > buffered one. I hope you keep plenty of backups, then... > You say "they are racing". Do you mean that this can cause file system > corruption? ... because it's Not filesystem corruption you need to be worried about, it's *silent data corruption* that these races can cause. > Doest it simply warn that direct user data races with > buffered user data and one of them wins? Yes, that's right. No guarantee of who wins is given, though. > This warning "taints" the kernel. Yup, the application is doing something dangerous, and this warning is there to let us know that the data corruption is the user's fault, not the filesystem... > Should it be safe to do different types of IOs on different > non-overlapping 4-KiB-aligned regions of the same file (I am unsure > if this is what the application really does)? Yes, it should be safe, but the kernel code can't know whether this is true or not - there are no specific interlocks with direct IO to prevent concurrent buffered IO to the same region while a direct IO is in progress. XFS does best effort attempts to maintain coherency does not provide any guarantees, hence the warning when known race conditions are tripped. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-05-24 1:19 ` Dave Chinner 0 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-05-24 1:19 UTC (permalink / raw) To: Roman Kononov; +Cc: linux-kernel, xfs On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > > You've got some workload that is mixing direct IO writes with some > > form of buffered or mmap IO on the same file and they are racing. > > Mixing different types of IO on the one inode is also known as A > > Really Bad Idea because there is no guarantee of coherency between > > them.... > > > > Can you find out what the application is triggering this? > > This is severely modified Postgresql, which does mix direct IO with > buffered one. I hope you keep plenty of backups, then... > You say "they are racing". Do you mean that this can cause file system > corruption? ... because it's Not filesystem corruption you need to be worried about, it's *silent data corruption* that these races can cause. > Doest it simply warn that direct user data races with > buffered user data and one of them wins? Yes, that's right. No guarantee of who wins is given, though. > This warning "taints" the kernel. Yup, the application is doing something dangerous, and this warning is there to let us know that the data corruption is the user's fault, not the filesystem... > Should it be safe to do different types of IOs on different > non-overlapping 4-KiB-aligned regions of the same file (I am unsure > if this is what the application really does)? Yes, it should be safe, but the kernel code can't know whether this is true or not - there are no specific interlocks with direct IO to prevent concurrent buffered IO to the same region while a direct IO is in progress. XFS does best effort attempts to maintain coherency does not provide any guarantees, hence the warning when known race conditions are tripped. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-24 1:19 ` Dave Chinner @ 2010-06-12 5:00 ` Ilia Mirkin -1 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-12 5:00 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, xfs, linux-kernel Sorry to pick up an old-ish thread, but I have a similar situation: On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> > Can you find out what the application is triggering this? I noticed this happening with mysql and xtrabackup -- the latter opens up mysql's files while mysql is still running (and modifying its own files) and backs them up in a (hopefully) safe way. mysql had been running on the machine without any such warnings for a while before we ran the backup, so I'm pretty sure that the backup is involved, although its process is never listed. Specifically the warning is: [2584257.839386] ------------[ cut here ]------------ [2584257.839395] WARNING: at fs/xfs/linux-2.6/xfs_lrw.c:651 xfs_write+0x3dc/0x784() [2584257.839398] Hardware name: PowerEdge R710 [2584257.839399] Modules linked in: nfsd cifs iTCO_wdt iTCO_vendor_support [2584257.839406] Pid: 7761, comm: mysqld Not tainted 2.6.33-gentoo-r2 #1 [2584257.839407] Call Trace: [2584257.839411] [<ffffffff8120da46>] ? xfs_write+0x3dc/0x784 [2584257.839415] [<ffffffff81038733>] warn_slowpath_common+0x77/0xa4 [2584257.839417] [<ffffffff8103876f>] warn_slowpath_null+0xf/0x11 [2584257.839419] [<ffffffff8120da46>] xfs_write+0x3dc/0x784 [2584257.839424] [<ffffffff810033ce>] ? apic_timer_interrupt+0xe/0x20 [2584257.839427] [<ffffffff8120a51a>] xfs_file_aio_write+0x5a/0x5c [2584257.839430] [<ffffffff810d7cbe>] do_sync_write+0xc0/0x106 [2584257.839435] [<ffffffff810ff862>] ? __fsnotify_parent+0xc7/0xd3 [2584257.839437] [<ffffffff810d8624>] vfs_write+0xab/0x105 [2584257.839439] [<ffffffff810d86da>] sys_pwrite64+0x5c/0x7d [2584257.839442] [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b [2584257.839444] ---[ end trace 8b0c2a6e5e86745f ]--- > Yes, it should be safe, but the kernel code can't know whether this > is true or not - there are no specific interlocks with direct IO to > prevent concurrent buffered IO to the same region while a direct IO > is in progress. XFS does best effort attempts to maintain coherency > does not provide any guarantees, hence the warning when known race > conditions are tripped. Would it be safe to remove the warning at fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of these (51 in this particular instance) every time we run a backup... IOW, is the warning purely something along the lines of "Userspace is doing something wonky, but the underlying FS will still be fine no matter what" kind of deal, or could there be an actual problem with the XFS metadata itself? Thanks for any advice, Ilia Mirkin imirkin@alum.mit.edu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-12 5:00 ` Ilia Mirkin 0 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-12 5:00 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, linux-kernel, xfs Sorry to pick up an old-ish thread, but I have a similar situation: On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> > Can you find out what the application is triggering this? I noticed this happening with mysql and xtrabackup -- the latter opens up mysql's files while mysql is still running (and modifying its own files) and backs them up in a (hopefully) safe way. mysql had been running on the machine without any such warnings for a while before we ran the backup, so I'm pretty sure that the backup is involved, although its process is never listed. Specifically the warning is: [2584257.839386] ------------[ cut here ]------------ [2584257.839395] WARNING: at fs/xfs/linux-2.6/xfs_lrw.c:651 xfs_write+0x3dc/0x784() [2584257.839398] Hardware name: PowerEdge R710 [2584257.839399] Modules linked in: nfsd cifs iTCO_wdt iTCO_vendor_support [2584257.839406] Pid: 7761, comm: mysqld Not tainted 2.6.33-gentoo-r2 #1 [2584257.839407] Call Trace: [2584257.839411] [<ffffffff8120da46>] ? xfs_write+0x3dc/0x784 [2584257.839415] [<ffffffff81038733>] warn_slowpath_common+0x77/0xa4 [2584257.839417] [<ffffffff8103876f>] warn_slowpath_null+0xf/0x11 [2584257.839419] [<ffffffff8120da46>] xfs_write+0x3dc/0x784 [2584257.839424] [<ffffffff810033ce>] ? apic_timer_interrupt+0xe/0x20 [2584257.839427] [<ffffffff8120a51a>] xfs_file_aio_write+0x5a/0x5c [2584257.839430] [<ffffffff810d7cbe>] do_sync_write+0xc0/0x106 [2584257.839435] [<ffffffff810ff862>] ? __fsnotify_parent+0xc7/0xd3 [2584257.839437] [<ffffffff810d8624>] vfs_write+0xab/0x105 [2584257.839439] [<ffffffff810d86da>] sys_pwrite64+0x5c/0x7d [2584257.839442] [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b [2584257.839444] ---[ end trace 8b0c2a6e5e86745f ]--- > Yes, it should be safe, but the kernel code can't know whether this > is true or not - there are no specific interlocks with direct IO to > prevent concurrent buffered IO to the same region while a direct IO > is in progress. XFS does best effort attempts to maintain coherency > does not provide any guarantees, hence the warning when known race > conditions are tripped. Would it be safe to remove the warning at fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of these (51 in this particular instance) every time we run a backup... IOW, is the warning purely something along the lines of "Userspace is doing something wonky, but the underlying FS will still be fine no matter what" kind of deal, or could there be an actual problem with the XFS metadata itself? Thanks for any advice, Ilia Mirkin imirkin@alum.mit.edu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-06-12 5:00 ` Ilia Mirkin @ 2010-06-13 22:47 ` Dave Chinner -1 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-06-13 22:47 UTC (permalink / raw) To: Ilia Mirkin; +Cc: Roman Kononov, xfs, linux-kernel On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: > Sorry to pick up an old-ish thread, but I have a similar situation: > > On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > >> > Can you find out what the application is triggering this? > > I noticed this happening with mysql and xtrabackup -- the latter opens > up mysql's files while mysql is still running (and modifying its own > files) and backs them up in a (hopefully) safe way. That's not safe at all - there's no guarantee you'll end up with a consistent database image doing backups like this. Have you ever tried to restore and use one of these backups? > mysql had been > running on the machine without any such warnings for a while before we > ran the backup, so I'm pretty sure that the backup is involved, > although its process is never listed. Specifically the warning is: > > [2584257.839386] ------------[ cut here ]------------ > [2584257.839395] WARNING: at fs/xfs/linux-2.6/xfs_lrw.c:651 > xfs_write+0x3dc/0x784() > [2584257.839398] Hardware name: PowerEdge R710 > [2584257.839399] Modules linked in: nfsd cifs iTCO_wdt iTCO_vendor_support > [2584257.839406] Pid: 7761, comm: mysqld Not tainted 2.6.33-gentoo-r2 #1 > [2584257.839407] Call Trace: > [2584257.839411] [<ffffffff8120da46>] ? xfs_write+0x3dc/0x784 > [2584257.839415] [<ffffffff81038733>] warn_slowpath_common+0x77/0xa4 > [2584257.839417] [<ffffffff8103876f>] warn_slowpath_null+0xf/0x11 > [2584257.839419] [<ffffffff8120da46>] xfs_write+0x3dc/0x784 > [2584257.839424] [<ffffffff810033ce>] ? apic_timer_interrupt+0xe/0x20 > [2584257.839427] [<ffffffff8120a51a>] xfs_file_aio_write+0x5a/0x5c > [2584257.839430] [<ffffffff810d7cbe>] do_sync_write+0xc0/0x106 > [2584257.839435] [<ffffffff810ff862>] ? __fsnotify_parent+0xc7/0xd3 > [2584257.839437] [<ffffffff810d8624>] vfs_write+0xab/0x105 > [2584257.839439] [<ffffffff810d86da>] sys_pwrite64+0x5c/0x7d > [2584257.839442] [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b > [2584257.839444] ---[ end trace 8b0c2a6e5e86745f ]--- > > > Yes, it should be safe, but the kernel code can't know whether this > > is true or not - there are no specific interlocks with direct IO to > > prevent concurrent buffered IO to the same region while a direct IO > > is in progress. XFS does best effort attempts to maintain coherency > > does not provide any guarantees, hence the warning when known race > > conditions are tripped. > > Would it be safe to remove the warning at > fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to > xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of > these (51 in this particular instance) every time we run a backup... You can if you want, but then you won't know when your backup or database might have been corrupted, right? > IOW, is the warning purely something along the lines of "Userspace is > doing something wonky, but the underlying FS will still be fine no > matter what" kind of deal, or could there be an actual problem with > the XFS metadata itself? Nothing wrong with the filesystem metadata will occur - as I said eariler in the thread that this is a warning to tell us that data corruption is possible due to userspace doing something stupid, not a filesystem bug. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-13 22:47 ` Dave Chinner 0 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-06-13 22:47 UTC (permalink / raw) To: Ilia Mirkin; +Cc: Roman Kononov, linux-kernel, xfs On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: > Sorry to pick up an old-ish thread, but I have a similar situation: > > On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > >> > Can you find out what the application is triggering this? > > I noticed this happening with mysql and xtrabackup -- the latter opens > up mysql's files while mysql is still running (and modifying its own > files) and backs them up in a (hopefully) safe way. That's not safe at all - there's no guarantee you'll end up with a consistent database image doing backups like this. Have you ever tried to restore and use one of these backups? > mysql had been > running on the machine without any such warnings for a while before we > ran the backup, so I'm pretty sure that the backup is involved, > although its process is never listed. Specifically the warning is: > > [2584257.839386] ------------[ cut here ]------------ > [2584257.839395] WARNING: at fs/xfs/linux-2.6/xfs_lrw.c:651 > xfs_write+0x3dc/0x784() > [2584257.839398] Hardware name: PowerEdge R710 > [2584257.839399] Modules linked in: nfsd cifs iTCO_wdt iTCO_vendor_support > [2584257.839406] Pid: 7761, comm: mysqld Not tainted 2.6.33-gentoo-r2 #1 > [2584257.839407] Call Trace: > [2584257.839411] [<ffffffff8120da46>] ? xfs_write+0x3dc/0x784 > [2584257.839415] [<ffffffff81038733>] warn_slowpath_common+0x77/0xa4 > [2584257.839417] [<ffffffff8103876f>] warn_slowpath_null+0xf/0x11 > [2584257.839419] [<ffffffff8120da46>] xfs_write+0x3dc/0x784 > [2584257.839424] [<ffffffff810033ce>] ? apic_timer_interrupt+0xe/0x20 > [2584257.839427] [<ffffffff8120a51a>] xfs_file_aio_write+0x5a/0x5c > [2584257.839430] [<ffffffff810d7cbe>] do_sync_write+0xc0/0x106 > [2584257.839435] [<ffffffff810ff862>] ? __fsnotify_parent+0xc7/0xd3 > [2584257.839437] [<ffffffff810d8624>] vfs_write+0xab/0x105 > [2584257.839439] [<ffffffff810d86da>] sys_pwrite64+0x5c/0x7d > [2584257.839442] [<ffffffff81002a6b>] system_call_fastpath+0x16/0x1b > [2584257.839444] ---[ end trace 8b0c2a6e5e86745f ]--- > > > Yes, it should be safe, but the kernel code can't know whether this > > is true or not - there are no specific interlocks with direct IO to > > prevent concurrent buffered IO to the same region while a direct IO > > is in progress. XFS does best effort attempts to maintain coherency > > does not provide any guarantees, hence the warning when known race > > conditions are tripped. > > Would it be safe to remove the warning at > fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to > xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of > these (51 in this particular instance) every time we run a backup... You can if you want, but then you won't know when your backup or database might have been corrupted, right? > IOW, is the warning purely something along the lines of "Userspace is > doing something wonky, but the underlying FS will still be fine no > matter what" kind of deal, or could there be an actual problem with > the XFS metadata itself? Nothing wrong with the filesystem metadata will occur - as I said eariler in the thread that this is a warning to tell us that data corruption is possible due to userspace doing something stupid, not a filesystem bug. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-06-13 22:47 ` Dave Chinner @ 2010-06-13 23:10 ` Ilia Mirkin -1 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-13 23:10 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, xfs, linux-kernel On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: >> Sorry to pick up an old-ish thread, but I have a similar situation: >> >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> >> > Can you find out what the application is triggering this? >> >> I noticed this happening with mysql and xtrabackup -- the latter opens >> up mysql's files while mysql is still running (and modifying its own >> files) and backs them up in a (hopefully) safe way. > > That's not safe at all - there's no guarantee you'll end up with a > consistent database image doing backups like this. Have you ever > tried to restore and use one of these backups? Yep, works great. [Used it to initialize a slave, did the full checksums, so it's unlikely to have randomly corrupt data.] It's the only credible way to backup a sizeable mysql db, since it works online with InnoDB; the other options involve either only using MyISAM (non-transactional) or locking the db for the duration (we couldn't wait that long, but attempting to do it on a backup machine looked like it was going to take somewhere between 3 and 7 days, although we gave up after 24 hours... not something we can afford to do with any kind of regularity). >> >> Would it be safe to remove the warning at >> fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to >> xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of >> these (51 in this particular instance) every time we run a backup... > > You can if you want, but then you won't know when your backup or > database might have been corrupted, right? No, but I wouldn't know that without the warnings either -- for all I know xtrabackup could be buggy in all kinds of ways. The only real way to check is to use the backup data in some way. > >> IOW, is the warning purely something along the lines of "Userspace is >> doing something wonky, but the underlying FS will still be fine no >> matter what" kind of deal, or could there be an actual problem with >> the XFS metadata itself? > > Nothing wrong with the filesystem metadata will occur - as I said > eariler in the thread that this is a warning to tell us that data > corruption is possible due to userspace doing something stupid, not > a filesystem bug. OK, thanks for the clarification. Ideally these wouldn't taint the kernel either -- perhaps these can be downgraded to a message that explicitly suggests that nothing is wrong with kernel-space things, only user-space? The backtrace doesn't really get you much, so really all you want to show is the offending process... Thanks, Ilia Mirkin imirkin@alum.mit.edu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-13 23:10 ` Ilia Mirkin 0 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-13 23:10 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, linux-kernel, xfs On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: >> Sorry to pick up an old-ish thread, but I have a similar situation: >> >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> >> > Can you find out what the application is triggering this? >> >> I noticed this happening with mysql and xtrabackup -- the latter opens >> up mysql's files while mysql is still running (and modifying its own >> files) and backs them up in a (hopefully) safe way. > > That's not safe at all - there's no guarantee you'll end up with a > consistent database image doing backups like this. Have you ever > tried to restore and use one of these backups? Yep, works great. [Used it to initialize a slave, did the full checksums, so it's unlikely to have randomly corrupt data.] It's the only credible way to backup a sizeable mysql db, since it works online with InnoDB; the other options involve either only using MyISAM (non-transactional) or locking the db for the duration (we couldn't wait that long, but attempting to do it on a backup machine looked like it was going to take somewhere between 3 and 7 days, although we gave up after 24 hours... not something we can afford to do with any kind of regularity). >> >> Would it be safe to remove the warning at >> fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to >> xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of >> these (51 in this particular instance) every time we run a backup... > > You can if you want, but then you won't know when your backup or > database might have been corrupted, right? No, but I wouldn't know that without the warnings either -- for all I know xtrabackup could be buggy in all kinds of ways. The only real way to check is to use the backup data in some way. > >> IOW, is the warning purely something along the lines of "Userspace is >> doing something wonky, but the underlying FS will still be fine no >> matter what" kind of deal, or could there be an actual problem with >> the XFS metadata itself? > > Nothing wrong with the filesystem metadata will occur - as I said > eariler in the thread that this is a warning to tell us that data > corruption is possible due to userspace doing something stupid, not > a filesystem bug. OK, thanks for the clarification. Ideally these wouldn't taint the kernel either -- perhaps these can be downgraded to a message that explicitly suggests that nothing is wrong with kernel-space things, only user-space? The backtrace doesn't really get you much, so really all you want to show is the offending process... Thanks, Ilia Mirkin imirkin@alum.mit.edu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-06-13 23:10 ` Ilia Mirkin @ 2010-06-14 1:29 ` Dave Chinner -1 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-06-14 1:29 UTC (permalink / raw) To: Ilia Mirkin; +Cc: Roman Kononov, xfs, linux-kernel On Sun, Jun 13, 2010 at 07:10:30PM -0400, Ilia Mirkin wrote: > On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: > >> Sorry to pick up an old-ish thread, but I have a similar situation: > >> > >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > >> >> > Can you find out what the application is triggering this? > >> > >> I noticed this happening with mysql and xtrabackup -- the latter opens > >> up mysql's files while mysql is still running (and modifying its own > >> files) and backs them up in a (hopefully) safe way. > > > > That's not safe at all - there's no guarantee you'll end up with a > > consistent database image doing backups like this. Have you ever > > tried to restore and use one of these backups? > > Yep, works great. [Used it to initialize a slave, did the full > checksums, so it's unlikely to have randomly corrupt data.] You were lucky, I'd say. xtrabackup is supposed to be tightly integrated with mysql, so perhaps it should be using the same IO methods that the admin has selected for their database. Maybe you need to talk to the xtrabackup folks to get them to add a "backup via direct IO" method if the mysql database is using direct IO so that other uses don't have the same issues. > >> Would it be safe to remove the warning at > >> fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to > >> xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of > >> these (51 in this particular instance) every time we run a backup... > > > > You can if you want, but then you won't know when your backup or > > database might have been corrupted, right? > > No, but I wouldn't know that without the warnings either -- for all I > know xtrabackup could be buggy in all kinds of ways. The only real way > to check is to use the backup data in some way. Yup, but you still can't rely on the backup for disaster recovery without first doing a full application level consistency check it if one of these warnings was generated while it was being taken. > >> IOW, is the warning purely something along the lines of "Userspace is > >> doing something wonky, but the underlying FS will still be fine no > >> matter what" kind of deal, or could there be an actual problem with > >> the XFS metadata itself? > > > > Nothing wrong with the filesystem metadata will occur - as I said > > eariler in the thread that this is a warning to tell us that data > > corruption is possible due to userspace doing something stupid, not > > a filesystem bug. > > OK, thanks for the clarification. Ideally these wouldn't taint the > kernel either Why not? Something has potentially compromised the integrity of the system and that's exactly what the taint flag is there for. > -- perhaps these can be downgraded to a message that > explicitly suggests that nothing is wrong with kernel-space things, > only user-space? The backtrace doesn't really get you much, so really > all you want to show is the offending process... They are there to be meaningful to the XFS developer, not the user, and it conveys all the information we need to start a deeper investigation. IOWs, it's a defensive mechanism that we have in place because direct IO is effectively handing responsibility for data integrity to userspace. Hence when userspace is doing something obviously dangerous to data integrity we want loud, noticable warnings so that the filesystem is not blamed for the data corruption that will inevitably occur. And from a "I read it on the interwebs so it must be true" perspective, without a loud obnoxious warning we'll never hear about problems until someone flames us about silent data corruption on a random blog that gets slashdotted and then referenced for the next 10 years as the next canonical "XFS eats my data!" reference for the clueless.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-14 1:29 ` Dave Chinner 0 siblings, 0 replies; 33+ messages in thread From: Dave Chinner @ 2010-06-14 1:29 UTC (permalink / raw) To: Ilia Mirkin; +Cc: Roman Kononov, linux-kernel, xfs On Sun, Jun 13, 2010 at 07:10:30PM -0400, Ilia Mirkin wrote: > On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: > >> Sorry to pick up an old-ish thread, but I have a similar situation: > >> > >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: > >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: > >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: > >> >> > Can you find out what the application is triggering this? > >> > >> I noticed this happening with mysql and xtrabackup -- the latter opens > >> up mysql's files while mysql is still running (and modifying its own > >> files) and backs them up in a (hopefully) safe way. > > > > That's not safe at all - there's no guarantee you'll end up with a > > consistent database image doing backups like this. Have you ever > > tried to restore and use one of these backups? > > Yep, works great. [Used it to initialize a slave, did the full > checksums, so it's unlikely to have randomly corrupt data.] You were lucky, I'd say. xtrabackup is supposed to be tightly integrated with mysql, so perhaps it should be using the same IO methods that the admin has selected for their database. Maybe you need to talk to the xtrabackup folks to get them to add a "backup via direct IO" method if the mysql database is using direct IO so that other uses don't have the same issues. > >> Would it be safe to remove the warning at > >> fs/xfs/linux-2.6/xfs_lrw.c:651 (which looks like it has moved to > >> xfs_file.c in 2.6.34)? It seems undesirable to get a long stream of > >> these (51 in this particular instance) every time we run a backup... > > > > You can if you want, but then you won't know when your backup or > > database might have been corrupted, right? > > No, but I wouldn't know that without the warnings either -- for all I > know xtrabackup could be buggy in all kinds of ways. The only real way > to check is to use the backup data in some way. Yup, but you still can't rely on the backup for disaster recovery without first doing a full application level consistency check it if one of these warnings was generated while it was being taken. > >> IOW, is the warning purely something along the lines of "Userspace is > >> doing something wonky, but the underlying FS will still be fine no > >> matter what" kind of deal, or could there be an actual problem with > >> the XFS metadata itself? > > > > Nothing wrong with the filesystem metadata will occur - as I said > > eariler in the thread that this is a warning to tell us that data > > corruption is possible due to userspace doing something stupid, not > > a filesystem bug. > > OK, thanks for the clarification. Ideally these wouldn't taint the > kernel either Why not? Something has potentially compromised the integrity of the system and that's exactly what the taint flag is there for. > -- perhaps these can be downgraded to a message that > explicitly suggests that nothing is wrong with kernel-space things, > only user-space? The backtrace doesn't really get you much, so really > all you want to show is the offending process... They are there to be meaningful to the XFS developer, not the user, and it conveys all the information we need to start a deeper investigation. IOWs, it's a defensive mechanism that we have in place because direct IO is effectively handing responsibility for data integrity to userspace. Hence when userspace is doing something obviously dangerous to data integrity we want loud, noticable warnings so that the filesystem is not blamed for the data corruption that will inevitably occur. And from a "I read it on the interwebs so it must be true" perspective, without a loud obnoxious warning we'll never hear about problems until someone flames us about silent data corruption on a random blog that gets slashdotted and then referenced for the next 10 years as the next canonical "XFS eats my data!" reference for the clueless.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-06-14 1:29 ` Dave Chinner @ 2010-06-14 3:27 ` Ilia Mirkin -1 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-14 3:27 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, xfs, linux-kernel On Sun, Jun 13, 2010 at 9:29 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sun, Jun 13, 2010 at 07:10:30PM -0400, Ilia Mirkin wrote: >> On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: >> >> Sorry to pick up an old-ish thread, but I have a similar situation: >> >> >> >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: >> >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> >> >> > Can you find out what the application is triggering this? >> >> >> >> I noticed this happening with mysql and xtrabackup -- the latter opens >> >> up mysql's files while mysql is still running (and modifying its own >> >> files) and backs them up in a (hopefully) safe way. >> > >> > That's not safe at all - there's no guarantee you'll end up with a >> > consistent database image doing backups like this. Have you ever >> > tried to restore and use one of these backups? >> >> Yep, works great. [Used it to initialize a slave, did the full >> checksums, so it's unlikely to have randomly corrupt data.] > > You were lucky, I'd say. xtrabackup is supposed to be tightly > integrated with mysql, so perhaps it should be using the same IO > methods that the admin has selected for their database. Maybe you > need to talk to the xtrabackup folks to get them to add a "backup > via direct IO" method if the mysql database is using direct IO so > that other uses don't have the same issues. Maybe. We've been using this technique, although on a different physical machine and with ext3, for quite some time (and we verify all backups). I did notice that there is a minor difference in configuration, esp wrt direct IO, so I'll check it out in more detail. [We're now setting innodb_flush_method to O_DIRECT whereas we weren't before... although based on the documentation and a cursory understanding of how xtrabackup works, this shouldn't be harmful.] > And from a "I read it on the interwebs so it must be true" > perspective, without a loud obnoxious warning we'll never hear about > problems until someone flames us about silent data corruption on a > random blog that gets slashdotted and then referenced for the next > 10 years as the next canonical "XFS eats my data!" reference for the > clueless.... Instead it will be "mysql works fine on ext3, but with xfs it spams the logs with warnings, therefore xfs must be broken". I don't think there's anything realistically that you can do about uninformed users and FUD. Although I wasn't suggesting to get rid of the warning, rather to make it more explicit as to what it's warning about. I interpret a WARN as a BUG that can be recovered but where the underlying system needs a careful look; my first inclination after seeing a fs-related WARN would be to take the system down and run an fsck. What's happening here seems more akin to getting a WARN when calling an ioctl with invalid parameters. --- Ilia Mirkin imirkin@alum.mit.edu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-14 3:27 ` Ilia Mirkin 0 siblings, 0 replies; 33+ messages in thread From: Ilia Mirkin @ 2010-06-14 3:27 UTC (permalink / raw) To: Dave Chinner; +Cc: Roman Kononov, linux-kernel, xfs On Sun, Jun 13, 2010 at 9:29 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sun, Jun 13, 2010 at 07:10:30PM -0400, Ilia Mirkin wrote: >> On Sun, Jun 13, 2010 at 6:47 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Sat, Jun 12, 2010 at 01:00:52AM -0400, Ilia Mirkin wrote: >> >> Sorry to pick up an old-ish thread, but I have a similar situation: >> >> >> >> On Sun, May 23, 2010 at 9:19 PM, Dave Chinner <david@fromorbit.com> wrote: >> >> > On Sun, May 23, 2010 at 09:23:44AM -0500, Roman Kononov wrote: >> >> >> On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> >> >> > Can you find out what the application is triggering this? >> >> >> >> I noticed this happening with mysql and xtrabackup -- the latter opens >> >> up mysql's files while mysql is still running (and modifying its own >> >> files) and backs them up in a (hopefully) safe way. >> > >> > That's not safe at all - there's no guarantee you'll end up with a >> > consistent database image doing backups like this. Have you ever >> > tried to restore and use one of these backups? >> >> Yep, works great. [Used it to initialize a slave, did the full >> checksums, so it's unlikely to have randomly corrupt data.] > > You were lucky, I'd say. xtrabackup is supposed to be tightly > integrated with mysql, so perhaps it should be using the same IO > methods that the admin has selected for their database. Maybe you > need to talk to the xtrabackup folks to get them to add a "backup > via direct IO" method if the mysql database is using direct IO so > that other uses don't have the same issues. Maybe. We've been using this technique, although on a different physical machine and with ext3, for quite some time (and we verify all backups). I did notice that there is a minor difference in configuration, esp wrt direct IO, so I'll check it out in more detail. [We're now setting innodb_flush_method to O_DIRECT whereas we weren't before... although based on the documentation and a cursory understanding of how xtrabackup works, this shouldn't be harmful.] > And from a "I read it on the interwebs so it must be true" > perspective, without a loud obnoxious warning we'll never hear about > problems until someone flames us about silent data corruption on a > random blog that gets slashdotted and then referenced for the next > 10 years as the next canonical "XFS eats my data!" reference for the > clueless.... Instead it will be "mysql works fine on ext3, but with xfs it spams the logs with warnings, therefore xfs must be broken". I don't think there's anything realistically that you can do about uninformed users and FUD. Although I wasn't suggesting to get rid of the warning, rather to make it more explicit as to what it's warning about. I interpret a WARN as a BUG that can be recovered but where the underlying system needs a careful look; my first inclination after seeing a fs-related WARN would be to take the system down and run an fsck. What's happening here seems more akin to getting a WARN when calling an ioctl with invalid parameters. --- Ilia Mirkin imirkin@alum.mit.edu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-06-14 3:27 ` Ilia Mirkin @ 2010-06-14 15:11 ` Roman Kononov -1 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-06-14 15:11 UTC (permalink / raw) To: Ilia Mirkin; +Cc: Dave Chinner, xfs, linux-kernel 2010-06-13 23:27 CDT, Ilia Mirkin <imirkin@alum.mit.edu> said: >Instead it will be "mysql works fine on ext3, but with xfs it spams >the logs with warnings, therefore xfs must be broken". I don't think >there's anything realistically that you can do about uninformed users >and FUD. Although I wasn't suggesting to get rid of the warning, >rather to make it more explicit as to what it's warning about. I >interpret a WARN as a BUG that can be recovered but where the >underlying system needs a careful look; my first inclination after >seeing a fs-related WARN would be to take the system down and run an >fsck. What's happening here seems more akin to getting a WARN when >calling an ioctl with invalid parameters. I agree. My reaction to this WARN was horrible: I brought the system down, started fsck-ing and re-installing older kernels, with all kinds of FUD, which took me considerable time. The message was not well explained on the Internet, nor was it clear reading the source code. After talking to the mailing list and investigation of my S/W, I've realized that the system works fine, and the warning now sounds to me as useless and unwanted noise of quite high volume. I am suggesting to issue a notice once per filesytem/mount without taint. The notice could be as such: "WARNING: Userspace issues direct IO which races with buffered or mmap IO on the same file (inode <number>, device <name>). File data corruption is possible. This message is issued only once per mount". Thanks. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() @ 2010-06-14 15:11 ` Roman Kononov 0 siblings, 0 replies; 33+ messages in thread From: Roman Kononov @ 2010-06-14 15:11 UTC (permalink / raw) To: Ilia Mirkin; +Cc: linux-kernel, xfs 2010-06-13 23:27 CDT, Ilia Mirkin <imirkin@alum.mit.edu> said: >Instead it will be "mysql works fine on ext3, but with xfs it spams >the logs with warnings, therefore xfs must be broken". I don't think >there's anything realistically that you can do about uninformed users >and FUD. Although I wasn't suggesting to get rid of the warning, >rather to make it more explicit as to what it's warning about. I >interpret a WARN as a BUG that can be recovered but where the >underlying system needs a careful look; my first inclination after >seeing a fs-related WARN would be to take the system down and run an >fsck. What's happening here seems more akin to getting a WARN when >calling an ioctl with invalid parameters. I agree. My reaction to this WARN was horrible: I brought the system down, started fsck-ing and re-installing older kernels, with all kinds of FUD, which took me considerable time. The message was not well explained on the Internet, nor was it clear reading the source code. After talking to the mailing list and investigation of my S/W, I've realized that the system works fine, and the warning now sounds to me as useless and unwanted noise of quite high volume. I am suggesting to issue a notice once per filesytem/mount without taint. The notice could be as such: "WARNING: Userspace issues direct IO which races with buffered or mmap IO on the same file (inode <number>, device <name>). File data corruption is possible. This message is issued only once per mount". Thanks. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-23 14:23 ` Roman Kononov (?) (?) @ 2010-05-24 4:12 ` Stan Hoeppner 2010-05-24 5:16 ` Stewart Smith 2010-05-24 19:34 ` Roman Kononov -1 siblings, 2 replies; 33+ messages in thread From: Stan Hoeppner @ 2010-05-24 4:12 UTC (permalink / raw) To: xfs Roman Kononov put forth on 5/23/2010 9:23 AM: > On 2010-05-23, 20:18:56 +1000, Dave Chinner <david@fromorbit.com> wrote: >> You've got some workload that is mixing direct IO writes with some >> form of buffered or mmap IO on the same file and they are racing. >> Mixing different types of IO on the one inode is also known as A >> Really Bad Idea because there is no guarantee of coherency between >> them.... >> >> Can you find out what the application is triggering this? > > This is severely modified Postgresql, which does mix direct IO with > buffered one. "The whole notion of "direct IO" is totally braindamaged. Just say no. This is your brain: O This is your brain on O_DIRECT: . Any questions?" Linus From: http://lkml.org/lkml/2007/1/10/235 -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-24 4:12 ` Stan Hoeppner @ 2010-05-24 5:16 ` Stewart Smith 2010-05-24 19:34 ` Roman Kononov 1 sibling, 0 replies; 33+ messages in thread From: Stewart Smith @ 2010-05-24 5:16 UTC (permalink / raw) To: Stan Hoeppner, xfs On Sun, 23 May 2010 23:12:24 -0500, Stan Hoeppner <stan@hardwarefreak.com> wrote: > "The whole notion of "direct IO" is totally braindamaged. Just say no. > > This is your brain: O > This is your brain on O_DIRECT: . > > Any questions?" > > > Linus > > From: http://lkml.org/lkml/2007/1/10/235 and the alternative is...... \0 (null). We can have very explicit knowledge about buffers and IO in userspace. Much better than you are ever going to have guessing it in kernel IO paths. There currently exists *no* usable and reliable way of transmitting this information to the kernel other than O_DIRECT. -- Stewart Smith _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-24 4:12 ` Stan Hoeppner 2010-05-24 5:16 ` Stewart Smith @ 2010-05-24 19:34 ` Roman Kononov 2010-05-26 7:06 ` Dave Chinner 1 sibling, 1 reply; 33+ messages in thread From: Roman Kononov @ 2010-05-24 19:34 UTC (permalink / raw) To: xfs On Sun, 23 May 2010 23:12:24 -0500 Stan Hoeppner <stan@hardwarefreak.com> wrote: > "The whole notion of "direct IO" is totally braindamaged. Just say no. ... > From: http://lkml.org/lkml/2007/1/10/235 I definitely measure dramatic overall performance benefit using O_DIRECT carefully. In that thread, it is doubtful that madvise+mmap+msync allow asynchronous zero-copy reads and writes to/from already pinned by a device driver memory of data produced/consumed by that device, without cache pollution and with intelligent handling of disk errors. Am I wrong? Thanks, Roman _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-24 19:34 ` Roman Kononov @ 2010-05-26 7:06 ` Dave Chinner 2010-05-26 15:07 ` NOW: o_direct -- WAS: " Stan Hoeppner 0 siblings, 1 reply; 33+ messages in thread From: Dave Chinner @ 2010-05-26 7:06 UTC (permalink / raw) To: Roman Kononov; +Cc: xfs On Mon, May 24, 2010 at 02:34:28PM -0500, Roman Kononov wrote: > On Sun, 23 May 2010 23:12:24 -0500 Stan Hoeppner > <stan@hardwarefreak.com> wrote: > > "The whole notion of "direct IO" is totally braindamaged. Just say no. > ... > > From: http://lkml.org/lkml/2007/1/10/235 > > I definitely measure dramatic overall performance benefit using > O_DIRECT carefully. > > In that thread, it is doubtful that madvise+mmap+msync allow > asynchronous zero-copy reads and writes to/from already pinned by a > device driver memory of data produced/consumed by that device, without > cache pollution and with intelligent handling of disk errors. Am I > wrong? No, you are not wrong. Remember, just because Linus asserts something it doesn't mean he is right. Yes, he's right an awful lot of the time, but not always. In this case, most people with experience in writing high performance IO engines with tell your that mmap() and advisory interfaces are no substitute for the fine grained control of IO issue that direct IO provides you with. And in the case of XFS, mmap serialiseѕ write page faults to different areas of the same file, whereas direct IO allows concurrent reads and writes to different regions of the same file. That makes direct IO far more scalable than than any mmap interface will ever be.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-26 7:06 ` Dave Chinner @ 2010-05-26 15:07 ` Stan Hoeppner 2010-05-27 11:05 ` Michael Monnerie ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Stan Hoeppner @ 2010-05-26 15:07 UTC (permalink / raw) To: xfs Dave Chinner put forth on 5/26/2010 2:06 AM: > On Mon, May 24, 2010 at 02:34:28PM -0500, Roman Kononov wrote: >> On Sun, 23 May 2010 23:12:24 -0500 Stan Hoeppner >> <stan@hardwarefreak.com> wrote: >>> "The whole notion of "direct IO" is totally braindamaged. Just say no. >> ... >>> From: http://lkml.org/lkml/2007/1/10/235 >> >> I definitely measure dramatic overall performance benefit using >> O_DIRECT carefully. >> >> In that thread, it is doubtful that madvise+mmap+msync allow >> asynchronous zero-copy reads and writes to/from already pinned by a >> device driver memory of data produced/consumed by that device, without >> cache pollution and with intelligent handling of disk errors. Am I >> wrong? > > No, you are not wrong. > > Remember, just because Linus asserts something it doesn't mean he is > right. Yes, he's right an awful lot of the time, but not always. In > this case, most people with experience in writing high performance > IO engines with tell your that mmap() and advisory interfaces are no > substitute for the fine grained control of IO issue that direct IO > provides you with. > > And in the case of XFS, mmap serialiseѕ write page faults to > different areas of the same file, whereas direct IO allows > concurrent reads and writes to different regions of the same file. > That makes direct IO far more scalable than than any mmap interface > will ever be.... > > Cheers, > > Dave. Please educate the ignorant a little bit Dave. I'm not a programmer, or at least, haven't been one for a couple of decades. If o_direct is superior to mmap, why then don't, say, Postfix and Dovecot use it instead of mmap? Email servers are some of the most disk I/O bound applications on the planet. I would think on heavily loaded mail servers (smtp or imap), at $big_isp for example, buffer cache would yield very little performance gain, and may even slow the system down due to buffer cache thrashing. Why do you think Wietse and Timo don't use o_direct instead of mmap? Timo is working on a complex and aggressive totally asynchronous I/O subsystem for a future Dovecot release in an effort to speed up I/O on loaded systems. Could o_direct not be the solution? AFAIK, both Postfix and Dovecot support running on just about every Unix like OS on the planet. Is o_direct not a portable interface, limited to Linux only? Is o_direct a POSIX standard? I'm sure o_direct isn't the best fit for many I/O bound applications. I'm just trying to understand why. What applications are the best candidates for using o_direct? Why would one want to avoid using o_direct in an I/O bound application? My apologies for the newbish questions. If this has all been asked/answered before, please just point me to any papers that explain the pros/cons of o_direct. Thanks. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-26 15:07 ` NOW: o_direct -- WAS: " Stan Hoeppner @ 2010-05-27 11:05 ` Michael Monnerie 2010-05-27 11:47 ` Christoph Hellwig 2010-05-27 14:05 ` Stewart Smith 2 siblings, 0 replies; 33+ messages in thread From: Michael Monnerie @ 2010-05-27 11:05 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 693 bytes --] On Mittwoch, 26. Mai 2010 Stan Hoeppner wrote: > My apologies for the newbish questions. If this has all been > asked/answered before, please just point me to any papers that > explain the pros/cons of o_direct. Thanks. I'd recommend the kernel pages on http://lwn.net From time to time you see the discussions about o_direct, always a subject of discussion. I'm not a programmer so I don't care ;-) -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-26 15:07 ` NOW: o_direct -- WAS: " Stan Hoeppner 2010-05-27 11:05 ` Michael Monnerie @ 2010-05-27 11:47 ` Christoph Hellwig 2010-05-27 13:58 ` Stewart Smith 2010-05-28 0:25 ` Stan Hoeppner 2010-05-27 14:05 ` Stewart Smith 2 siblings, 2 replies; 33+ messages in thread From: Christoph Hellwig @ 2010-05-27 11:47 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs O_DIRECT is not a Posix standard and not very portable. It originated on IRIX, and Linux inherited it during the 2.4 kernel series days. These days FreeBSD/NetBSD and AIX support it as well, but for example Solaris, HP-UX and OpenBSD don't, nevermind Windows or Mac OS. I have no idea why the MTAs don't want to use it - it's generally easier to use then memory mapped I/O, and has much more deterministic performance. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-27 11:47 ` Christoph Hellwig @ 2010-05-27 13:58 ` Stewart Smith 2010-05-27 14:57 ` Christoph Hellwig 2010-05-28 0:25 ` Stan Hoeppner 1 sibling, 1 reply; 33+ messages in thread From: Stewart Smith @ 2010-05-27 13:58 UTC (permalink / raw) To: Christoph Hellwig, Stan Hoeppner; +Cc: xfs On Thu, 27 May 2010 07:47:37 -0400, Christoph Hellwig <hch@infradead.org> wrote: > O_DIRECT is not a Posix standard and not very portable. It originated > on IRIX, and Linux inherited it during the 2.4 kernel series days. > These days FreeBSD/NetBSD and AIX support it as well, but for example > Solaris, HP-UX and OpenBSD don't, nevermind Windows or Mac OS. There is O_DIRECT type functionality available on Windows, with similar restrictions for aligned IO too. You have to use the Win32 APIs to do it though, the POSIX ones won't get you it (or more than 2048 files open at once). In practice we've only ever found Solaris (other than linux) to be reliable with O_DIRECT (at least on UFS... ZFS is... well... I wouldn't run a database server on it yet). -- Stewart Smith _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-27 13:58 ` Stewart Smith @ 2010-05-27 14:57 ` Christoph Hellwig 2010-05-27 15:45 ` Stewart Smith 0 siblings, 1 reply; 33+ messages in thread From: Christoph Hellwig @ 2010-05-27 14:57 UTC (permalink / raw) To: Stewart Smith; +Cc: Christoph Hellwig, Stan Hoeppner, xfs On Thu, May 27, 2010 at 11:58:55PM +1000, Stewart Smith wrote: > There is O_DIRECT type functionality available on Windows, with similar > restrictions for aligned IO too. You have to use the Win32 APIs to do it > though, the POSIX ones won't get you it (or more than 2048 files open at > once). > > In practice we've only ever found Solaris (other than linux) to be > reliable with O_DIRECT (at least on UFS... ZFS is... well... I wouldn't > run a database server on it yet). Solaris doesn't support O_DIRECT either, it instead has a separate directio call - just another pointless difference. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-27 14:57 ` Christoph Hellwig @ 2010-05-27 15:45 ` Stewart Smith 0 siblings, 0 replies; 33+ messages in thread From: Stewart Smith @ 2010-05-27 15:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Stan Hoeppner, xfs On Thu, 27 May 2010 10:57:14 -0400, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, May 27, 2010 at 11:58:55PM +1000, Stewart Smith wrote: > > There is O_DIRECT type functionality available on Windows, with similar > > restrictions for aligned IO too. You have to use the Win32 APIs to do it > > though, the POSIX ones won't get you it (or more than 2048 files open at > > once). > > > > In practice we've only ever found Solaris (other than linux) to be > > reliable with O_DIRECT (at least on UFS... ZFS is... well... I wouldn't > > run a database server on it yet). > > Solaris doesn't support O_DIRECT either, it instead has a separate > directio call - just another pointless difference. Oh yeah, casually forgot about that. Shows how much I'm writing new code on Solaris that is IO performance critical (on Solaris). -- Stewart Smith _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-27 11:47 ` Christoph Hellwig 2010-05-27 13:58 ` Stewart Smith @ 2010-05-28 0:25 ` Stan Hoeppner 1 sibling, 0 replies; 33+ messages in thread From: Stan Hoeppner @ 2010-05-28 0:25 UTC (permalink / raw) To: xfs Christoph Hellwig put forth on 5/27/2010 6:47 AM: > O_DIRECT is not a Posix standard and not very portable. It originated > on IRIX, and Linux inherited it during the 2.4 kernel series days. > These days FreeBSD/NetBSD and AIX support it as well, but for example > Solaris, HP-UX and OpenBSD don't, nevermind Windows or Mac OS. > > I have no idea why the MTAs don't want to use it - it's generally easier > to use then memory mapped I/O, and has much more deterministic > performance. Thanks for the background Christoph. I can now see why Postfix and Dovecot in particular don't use O_DIRECT: portability. They both are developed to run on every Unix/like OS you mention above, half of which don't offer O_DIRECT. I'm guessing the same may likely be true for the other SMTP MTAs and IMAP servers. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-26 15:07 ` NOW: o_direct -- WAS: " Stan Hoeppner 2010-05-27 11:05 ` Michael Monnerie 2010-05-27 11:47 ` Christoph Hellwig @ 2010-05-27 14:05 ` Stewart Smith 2010-05-28 0:42 ` Stan Hoeppner 2 siblings, 1 reply; 33+ messages in thread From: Stewart Smith @ 2010-05-27 14:05 UTC (permalink / raw) To: Stan Hoeppner, xfs On Wed, 26 May 2010 10:07:18 -0500, Stan Hoeppner <stan@hardwarefreak.com> wrote: > Please educate the ignorant a little bit Dave. I'm not a programmer, or at > least, haven't been one for a couple of decades. If o_direct is superior to > mmap, why then don't, say, Postfix and Dovecot use it instead of mmap? Email > servers are some of the most disk I/O bound applications on the planet. I > would think on heavily loaded mail servers (smtp or imap), at $big_isp for > example, buffer cache would yield very little performance gain, and may even > slow the system down due to buffer cache thrashing. email servers are metadata heavy workloads, not data heavy. They do lots of create/rename/delete of small files. O_DIRECT requires you to do IO in multiples of 512bytes aligned to 512byte boundaries. things like email servers... generally don't need/do that. Database servers tend to do that, so they use O_DIRECT. Also, email smtpd delivering a message on a machine, you could quite likely have imapd come along and read that soon after, so using the cache makes sense. > Why do you think Wietse and Timo don't use o_direct instead of mmap? Timo is > working on a complex and aggressive totally asynchronous I/O subsystem for a > future Dovecot release in an effort to speed up I/O on loaded systems. Could > o_direct not be the solution? AFAIK, both Postfix and Dovecot support running > on just about every Unix like OS on the planet. Is o_direct not a portable > interface, limited to Linux only? Is o_direct a POSIX standard? not posix. but you can get the functionality out of linux through opening with O_DIRECT, solaris by doing o_direct(), and who cares about the rest :) -- Stewart Smith _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: NOW: o_direct -- WAS: Re: WARNING in xfs_lwr.c, xfs_write() 2010-05-27 14:05 ` Stewart Smith @ 2010-05-28 0:42 ` Stan Hoeppner 0 siblings, 0 replies; 33+ messages in thread From: Stan Hoeppner @ 2010-05-28 0:42 UTC (permalink / raw) To: xfs Stewart Smith put forth on 5/27/2010 9:05 AM: > Also, email smtpd delivering a message on a machine, you could quite > likely have imapd come along and read that soon after, so using the > cache makes sense. Ah, that's a good point. I've been using Postfix and Dovecot Local Delivery Agent long enough that I'd forgotten about those folks who have their MTA drop mail into /var/mail/%u etc which is then picked up by their imapd. Dovecot LDA receives from Postfix via pipe, so mail doesn't actually hit the disk until Dovecot decides where it goes, via sieve scripts, for example. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2010-06-14 15:11 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-05-23 5:20 WARNING in xfs_lwr.c, xfs_write() Roman Kononov 2010-05-23 5:20 ` Roman Kononov 2010-05-23 10:18 ` Dave Chinner 2010-05-23 10:18 ` Dave Chinner 2010-05-23 14:23 ` Roman Kononov 2010-05-23 14:23 ` Roman Kononov 2010-05-24 1:19 ` Dave Chinner 2010-05-24 1:19 ` Dave Chinner 2010-06-12 5:00 ` Ilia Mirkin 2010-06-12 5:00 ` Ilia Mirkin 2010-06-13 22:47 ` Dave Chinner 2010-06-13 22:47 ` Dave Chinner 2010-06-13 23:10 ` Ilia Mirkin 2010-06-13 23:10 ` Ilia Mirkin 2010-06-14 1:29 ` Dave Chinner 2010-06-14 1:29 ` Dave Chinner 2010-06-14 3:27 ` Ilia Mirkin 2010-06-14 3:27 ` Ilia Mirkin 2010-06-14 15:11 ` Roman Kononov 2010-06-14 15:11 ` Roman Kononov 2010-05-24 4:12 ` Stan Hoeppner 2010-05-24 5:16 ` Stewart Smith 2010-05-24 19:34 ` Roman Kononov 2010-05-26 7:06 ` Dave Chinner 2010-05-26 15:07 ` NOW: o_direct -- WAS: " Stan Hoeppner 2010-05-27 11:05 ` Michael Monnerie 2010-05-27 11:47 ` Christoph Hellwig 2010-05-27 13:58 ` Stewart Smith 2010-05-27 14:57 ` Christoph Hellwig 2010-05-27 15:45 ` Stewart Smith 2010-05-28 0:25 ` Stan Hoeppner 2010-05-27 14:05 ` Stewart Smith 2010-05-28 0:42 ` Stan Hoeppner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.