All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: avoid direct write if we fall back to buffered
Date: Wed, 14 Apr 2010 12:20:11 -0700	[thread overview]
Message-ID: <20100414192011.GA29831@mail.oracle.com> (raw)
In-Reply-To: <201004141358.20777.lidongyang@novell.com>

On Wed, Apr 14, 2010 at 01:58:20PM +0800, Li Dongyang wrote:
> On Wednesday 14 April 2010 07:54:35 Joel Becker wrote:
> > 	I think Sunil and I have found the real culprit.
> > 	If a file is opened for O_DIRECT, and there are no holes,
> > refcounts or anything, we are doing direct I/O.  ocfs2_file_aio_write()
> > (o_f_a_w() from now on) locks things down like so:  lock(i_mutex),
> > down_read(ip_alloc_sem), PR(rw_lock).  We have ip_alloc_sem preventing
> > size changes on the local node and rw_lock preventing size changes on
> > other nodes.  We call generic_file_direct_write() ourselves.
> > 	If a file is not opened with O_DIRECT, we are doing regular
> > buffered writes.  o_f_a_w() locks like so: lock(i_mutex),
> > EX(rw_lock).  It is protecting against other nodes, but it does not
> > touch ip_alloc_sem.  Why?  Because we call __generic_file_aio_write(),
> > which will call ->write_begin().  ip_alloc_sem will be taken inside
> > ->write_begin().  That's where we protect against other local processes.
> > 	You may already see where I'm going with this.  If we are open
> > with O_DIRECT, but we have to fall back to buffered, we will do this
> > locking:  lock(i_mutex), down_read(ip_alloc_sem), PR(rw_lock),
> > NL(rw_lock), up_read(ip_alloc_sem), EX(rw_lock).  That is, we start with
> > the direct I/O locking, then back off and do the buffered locking.  But
> > when we get into __g_f_a_w(), it will try the direct I/O again.  If the
> > leading portion of the I/O is capable of direct I/O, it will go into
> > direct mode *without ever taking ip_alloc_sem*.  Once it gets to the
> > portion of the I/O that cannot be done direct, it will fall back to
> > buffered for the rest of the I/O and will call ->write_begin() as
> > expected.
> > 	So this I/O that extends i_size to the end of the allocation
> > will proceed as a direct I/O but will not have ip_alloc_sem.  Thus
> > truncate (and any other allocation change) can race on the local
> > machine.
> > 	I think some form of Dong Yang's patch is going to be necessary.
> > 
> Thanks for the great explanation and analysis, but I only see we down write the
> OCFS2_I(inode)->ip_alloc_sem in ->write_begin() and we are taking
> inode->i_alloc_sem in o_f_a_w() when we try to do a direct write, not the ip_alloc_sem.
> Am I missing something?

	You're right, we use i_alloc_sem in the direct case and
ip_alloc_sem in the buffered case.  It is, however, for the same reason.
i_alloc_sem is about competing with the VFS (eg, vs vfs_truncate()).
ip_alloc_sem is about competing with ourselves (ocfs2_truncate(),
ocfs2_readpage(), etc).
	While I should be saying i_alloc_sem above for the direct I/O
case, the rest of the analysis is still correct.  We need to be holding
i_alloc_sem if we're going to be issuing direct I/Os, and we are not
holding it in the fallback to buffered case.

Joel

-- 

"Depend on the rabbit's foot if you will, but remember, it didn't
 help the rabbit."
	- R. E. Shay

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

  reply	other threads:[~2010-04-14 19:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-10  7:37 [Ocfs2-devel] [PATCH] ocfs2: avoid direct write if we fall back to buffered Dong Yang Li
2010-04-10  9:37 ` Joel Becker
2010-04-10  9:48   ` Li Dongyang
2010-04-12  5:16 ` Tao Ma
2010-04-12  5:31   ` Li Dongyang
2010-04-12  6:24     ` Tao Ma
2010-04-14  2:44       ` Tao Ma
2010-04-14  5:47         ` Li Dongyang
2010-04-14  6:08           ` Tao Ma
2010-04-13 23:54   ` Joel Becker
2010-04-14  0:13     ` Tao Ma
2010-04-14  5:58     ` Li Dongyang
2010-04-14 19:20       ` Joel Becker [this message]
2010-04-22 14:13         ` Li Dongyang
2010-04-23 20:06           ` Joel Becker
  -- strict thread matches above, loose matches on Subject: below --
2010-04-08  7:47 Li Dongyang
2010-04-08 18:41 ` Sunil Mushran
2010-04-09  2:27   ` Li Dongyang
2010-04-09  2:38     ` Tao Ma
2010-04-09  3:00       ` Li Dongyang
2010-04-09  3:32         ` Tao Ma
2010-04-09  9:20           ` Li Dongyang
2010-04-09 17:36             ` Sunil Mushran
2010-04-09  7:58   ` Coly Li
2010-04-09  7:56     ` Tao Ma
2010-04-14  1:58 ` Joel Becker
2010-04-14  7:42   ` Li Dongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100414192011.GA29831@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.