linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@turbolinux.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andreas Dilger <adilger@turbolinux.com>,
	lvm-devel@sistina.com, Andi Kleen <ak@suse.de>,
	Lance Larsh <llarsh@oracle.com>,
	Brian Strand <bstrand@switchmanagement.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [lvm-devel] Re: 2x Oracle slowdown from 2.2.16 to 2.4.4
Date: Fri, 13 Jul 2001 01:35:00 -0600 (MDT)	[thread overview]
Message-ID: <200107130735.f6D7Z0Bl029176@webber.adilger.int> (raw)
In-Reply-To: <20010713005501.J19011@athlon.random> "from Andrea Arcangeli at Jul 13, 2001 00:55:01 am"

Andrea writes:
> With the current design of the pe_lock_req logic when you return from
> the ioctl(PE_LOCK) syscall, you never have the guarantee that all the
> in-flight writes are commited to disk, the
> fsync_dev(pe_lock_req.data.lv_dev) is just worthless, there's an huge
> race window between the fsync_dev and the pe_lock_req.lock = LOCK_PE
> where whatever I/O can be started without you fiding it later in the
> _pe_request list.

Yes there is a slight window there, but fsync_dev() serves to flush out the
majority of outstanding I/Os to disk (it waits for I/O completion).  All
of these buffers should be on disk, right?

> Even despite of that window we don't even wait the
> requests running just after the lock test to complete, the only lock we
> have is in lvm_map, but we should really track which of those bh are
> been committed successfully to the platter before we can actually copy
> the pv under the lvm from userspace.

As soon as we set LOCK_PE, any new I/Os coming in on the LV device will
be put on the queue, so we don't need to worry about those.  We have to
do something like sync_buffers(PV, 1) for the PV that is underneath the
PE being moved, to ensure any buffers that arrived between fsync_dev()
and LOCK_PE are flushed (they are the only buffers that can be in flight).
Is there another problem you are referring to?

AFAICS, there would only be a large window for missed buffers if you
were doing two PE moves at once, and had contention for _pe_lock,
otherwise fsync_dev to LOCK_PE is a very small window, I think.
However, I think we are also protected by the global LVM lock from
doing multiple PE moves at one time.

> If the logic would been sane, your patch would also been ok
> (besides the C breakage of the missing volatile but we abuse gcc this
> way in other parts of the kernel too after all).

Yes, I never thought about GCC optimizing away the two references to the
same var before and after making the check.

> I think the whole pv_move logic needs to be redesigned and rewritten, if
> you could rewrite it and send patches (possibly also against beta7 if
> a new lvm release is not scheduled shortly) that would be more than
> welcome!

Yes, well the correct solution is to do it all in a kernel thread, so
that you don't need to do kernel->user->kernel data copying.  I already
discussed this with Joe Thornber (I think) and it was decided to be too
much for now (needs changes to user tools, IOP version, etc).  Later.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

  reply	other threads:[~2001-07-13  7:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-07-11  0:45 2x Oracle slowdown from 2.2.16 to 2.4.4 Brian Strand
2001-07-11  1:15 ` Andrea Arcangeli
2001-07-11 16:44   ` Brian Strand
2001-07-11 17:08     ` Andrea Arcangeli
2001-07-11 17:23       ` Chris Mason
2001-07-11 23:03     ` Lance Larsh
2001-07-11 23:46       ` Brian Strand
2001-07-12 15:21         ` Lance Larsh
2001-07-12 21:31           ` Hans Reiser
2001-07-12 21:51             ` Chris Mason
2001-07-13  3:00           ` Andrew Morton
2001-07-13  4:17             ` Andrew Morton
2001-07-13 15:36               ` Jeffrey W. Baker
2001-07-13 15:49                 ` Andrew Morton
2001-07-16 22:03                 ` Stephen C. Tweedie
2001-07-12  0:23       ` Chris Mason
2001-07-12 14:48         ` Lance Larsh
2001-07-12  2:30       ` Andrea Arcangeli
2001-07-12  9:26         ` [lvm-devel] " Andi Kleen
2001-07-12  9:45           ` Andrea Arcangeli
2001-07-12 17:04             ` Andreas Dilger
2001-07-12 18:18               ` Andrea Arcangeli
2001-07-12 22:55                 ` Andrea Arcangeli
2001-07-13  7:35                   ` Andreas Dilger [this message]
2001-07-13 16:07                     ` Andrea Arcangeli
2001-07-12  6:12       ` parviz dey
2001-07-11  2:58 ` Jeff V. Merkey
2001-07-11 15:55   ` Brian Strand
2001-07-11  2:59 ` Jeff V. Merkey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200107130735.f6D7Z0Bl029176@webber.adilger.int \
    --to=adilger@turbolinux.com \
    --cc=ak@suse.de \
    --cc=andrea@suse.de \
    --cc=bstrand@switchmanagement.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llarsh@oracle.com \
    --cc=lvm-devel@sistina.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).