All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Ma <tao.ma@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: avoid direct write if we fall back to buffered
Date: Wed, 14 Apr 2010 08:13:23 +0800	[thread overview]
Message-ID: <4BC508A3.4070104@oracle.com> (raw)
In-Reply-To: <20100413235434.GA5530@mail.oracle.com>

Joel Becker wrote:
> On Mon, Apr 12, 2010 at 01:16:43PM +0800, Tao Ma wrote:
>   
>> Dong Yang Li wrote:
>>     
>>> I still get a bug with this check and without my patch:
>>>       
>> yes, the check doesn't work actually in this case.
>>     
>>> [16179.955148] (13400,1):ocfs2_truncate_file:465 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
>>> [16179.955157] (13400,1):ocfs2_truncate_file:465 ERROR: Inode 254789, inode i_size = 811008 != di i_size = 809011, i_flags = 0x1
>>> the call trace is the same.
>>>
>>>
>>> the problem is this check in ocfs2_direct_IO_get_blocks just check if we are going beyond the blocks right now,
>>> so if a direct write won't play with new blocks but extending the i_size still get a pass, like the error above said, di->i_size is 809011, using 198 blocks and the direct write end up with i_size 811008, just same 198 blocks.
>>>       
>> yeah, you are right.
>>     
>
> 	I think Sunil and I have found the real culprit.
> 	If a file is opened for O_DIRECT, and there are no holes,
> refcounts or anything, we are doing direct I/O.  ocfs2_file_aio_write()
> (o_f_a_w() from now on) locks things down like so:  lock(i_mutex),
> down_read(ip_alloc_sem), PR(rw_lock).  We have ip_alloc_sem preventing
> size changes on the local node and rw_lock preventing size changes on
> other nodes.  We call generic_file_direct_write() ourselves.
> 	If a file is not opened with O_DIRECT, we are doing regular
> buffered writes.  o_f_a_w() locks like so: lock(i_mutex),
> EX(rw_lock).  It is protecting against other nodes, but it does not
> touch ip_alloc_sem.  Why?  Because we call __generic_file_aio_write(),
> which will call ->write_begin().  ip_alloc_sem will be taken inside
> ->write_begin().  That's where we protect against other local processes.  
> 	You may already see where I'm going with this.  If we are open
> with O_DIRECT, but we have to fall back to buffered, we will do this
> locking:  lock(i_mutex), down_read(ip_alloc_sem), PR(rw_lock),
> NL(rw_lock), up_read(ip_alloc_sem), EX(rw_lock).  That is, we start with
> the direct I/O locking, then back off and do the buffered locking.  But
> when we get into __g_f_a_w(), it will try the direct I/O again.  If the
> leading portion of the I/O is capable of direct I/O, it will go into
> direct mode *without ever taking ip_alloc_sem*.  Once it gets to the
> portion of the I/O that cannot be done direct, it will fall back to
> buffered for the rest of the I/O and will call ->write_begin() as
> expected.
> 	So this I/O that extends i_size to the end of the allocation
> will proceed as a direct I/O but will not have ip_alloc_sem.  Thus
> truncate (and any other allocation change) can race on the local
> machine.
> 	I think some form of Dong Yang's patch is going to be necessary.
>   
oh, yes, your analysis make sense.
But that doesn't prove that my get_block suggestion doesn't work in this 
case.
If we can find this situation in ocfs2_direct_IO_get_blocks and 
clear_buffer_mapped. It should fall
back to buffer_write for the last block and update i_size properly.
Actually, the check should be easy.
sb->s_blocksize * (iblocks+contig_blocks)>inode->i_size.

In this way, we should have to fall to buffer write only necessarily.

Regards,
Tao

Regards,
Tao
> Joel
>
>   

  reply	other threads:[~2010-04-14  0:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-10  7:37 [Ocfs2-devel] [PATCH] ocfs2: avoid direct write if we fall back to buffered Dong Yang Li
2010-04-10  9:37 ` Joel Becker
2010-04-10  9:48   ` Li Dongyang
2010-04-12  5:16 ` Tao Ma
2010-04-12  5:31   ` Li Dongyang
2010-04-12  6:24     ` Tao Ma
2010-04-14  2:44       ` Tao Ma
2010-04-14  5:47         ` Li Dongyang
2010-04-14  6:08           ` Tao Ma
2010-04-13 23:54   ` Joel Becker
2010-04-14  0:13     ` Tao Ma [this message]
2010-04-14  5:58     ` Li Dongyang
2010-04-14 19:20       ` Joel Becker
2010-04-22 14:13         ` Li Dongyang
2010-04-23 20:06           ` Joel Becker
  -- strict thread matches above, loose matches on Subject: below --
2010-04-08  7:47 Li Dongyang
2010-04-08 18:41 ` Sunil Mushran
2010-04-09  2:27   ` Li Dongyang
2010-04-09  2:38     ` Tao Ma
2010-04-09  3:00       ` Li Dongyang
2010-04-09  3:32         ` Tao Ma
2010-04-09  9:20           ` Li Dongyang
2010-04-09 17:36             ` Sunil Mushran
2010-04-09  7:58   ` Coly Li
2010-04-09  7:56     ` Tao Ma
2010-04-14  1:58 ` Joel Becker
2010-04-14  7:42   ` Li Dongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BC508A3.4070104@oracle.com \
    --to=tao.ma@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.