From: Andreas Dilger <adilger@turbolinux.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andreas Dilger <adilger@turbolinux.com>,
lvm-devel@sistina.com, Andi Kleen <ak@suse.de>,
Lance Larsh <llarsh@oracle.com>,
Brian Strand <bstrand@switchmanagement.com>,
linux-kernel@vger.kernel.org
Subject: Re: [lvm-devel] Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
Date: Fri, 13 Jul 2001 01:35:00 -0600 (MDT) [thread overview]
Message-ID: <200107130735.f6D7Z0Bl029176@webber.adilger.int> (raw)
In-Reply-To: <20010713005501.J19011@athlon.random> "from Andrea Arcangeli at Jul 13, 2001 00:55:01 am"
Andrea writes:
> With the current design of the pe_lock_req logic when you return from
> the ioctl(PE_LOCK) syscall, you never have the guarantee that all the
> in-flight writes are commited to disk, the
> fsync_dev(pe_lock_req.data.lv_dev) is just worthless, there's an huge
> race window between the fsync_dev and the pe_lock_req.lock = LOCK_PE
> where whatever I/O can be started without you fiding it later in the
> _pe_request list.
Yes there is a slight window there, but fsync_dev() serves to flush out the
majority of outstanding I/Os to disk (it waits for I/O completion). All
of these buffers should be on disk, right?
> Even despite of that window we don't even wait the
> requests running just after the lock test to complete, the only lock we
> have is in lvm_map, but we should really track which of those bh are
> been committed successfully to the platter before we can actually copy
> the pv under the lvm from userspace.
As soon as we set LOCK_PE, any new I/Os coming in on the LV device will
be put on the queue, so we don't need to worry about those. We have to
do something like sync_buffers(PV, 1) for the PV that is underneath the
PE being moved, to ensure any buffers that arrived between fsync_dev()
and LOCK_PE are flushed (they are the only buffers that can be in flight).
Is there another problem you are referring to?
AFAICS, there would only be a large window for missed buffers if you
were doing two PE moves at once, and had contention for _pe_lock,
otherwise fsync_dev to LOCK_PE is a very small window, I think.
However, I think we are also protected by the global LVM lock from
doing multiple PE moves at one time.
> If the logic would been sane, your patch would also been ok
> (besides the C breakage of the missing volatile but we abuse gcc this
> way in other parts of the kernel too after all).
Yes, I never thought about GCC optimizing away the two references to the
same var before and after making the check.
> I think the whole pv_move logic needs to be redesigned and rewritten, if
> you could rewrite it and send patches (possibly also against beta7 if
> a new lvm release is not scheduled shortly) that would be more than
> welcome!
Yes, well the correct solution is to do it all in a kernel thread, so
that you don't need to do kernel->user->kernel data copying. I already
discussed this with Joe Thornber (I think) and it was decided to be too
much for now (needs changes to user tools, IOP version, etc). Later.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
next prev parent reply other threads:[~2001-07-13 7:37 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-07-11 0:45 2x Oracle slowdown from 2.2.16 to 2.4.4 Brian Strand
2001-07-11 1:15 ` Andrea Arcangeli
2001-07-11 16:44 ` Brian Strand
2001-07-11 17:08 ` Andrea Arcangeli
2001-07-11 17:23 ` Chris Mason
2001-07-11 23:03 ` Lance Larsh
2001-07-11 23:46 ` Brian Strand
2001-07-12 15:21 ` Lance Larsh
2001-07-12 21:31 ` Hans Reiser
2001-07-12 21:51 ` Chris Mason
2001-07-13 3:00 ` Andrew Morton
2001-07-13 4:17 ` Andrew Morton
2001-07-13 15:36 ` Jeffrey W. Baker
2001-07-13 15:49 ` Andrew Morton
2001-07-16 22:03 ` Stephen C. Tweedie
2001-07-12 0:23 ` Chris Mason
2001-07-12 14:48 ` Lance Larsh
2001-07-12 2:30 ` Andrea Arcangeli
2001-07-12 9:26 ` [lvm-devel] " Andi Kleen
2001-07-12 9:45 ` Andrea Arcangeli
2001-07-12 17:04 ` Andreas Dilger
2001-07-12 18:18 ` Andrea Arcangeli
2001-07-12 22:55 ` Andrea Arcangeli
2001-07-13 7:35 ` Andreas Dilger [this message]
2001-07-13 16:07 ` Andrea Arcangeli
2001-07-12 6:12 ` parviz dey
2001-07-11 2:58 ` Jeff V. Merkey
2001-07-11 15:55 ` Brian Strand
2001-07-11 2:59 ` Jeff V. Merkey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200107130735.f6D7Z0Bl029176@webber.adilger.int \
--to=adilger@turbolinux.com \
--cc=ak@suse.de \
--cc=andrea@suse.de \
--cc=bstrand@switchmanagement.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llarsh@oracle.com \
--cc=lvm-devel@sistina.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).