From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Daniel McNeil <daniel@osdl.org>
Cc: Andrew Morton <akpm@osdl.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, "linux-aio@kvack.org" <linux-aio@kvack.org>
Subject: Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
Date: Thu, 4 Dec 2003 10:10:29 +0530 [thread overview]
Message-ID: <20031204044029.GA3353@in.ibm.com> (raw)
In-Reply-To: <1070493246.1969.15.camel@ibm-c.pdx.osdl.net>
Interesting :)
Could it be that extra atomic_dec in dio_bio_end_io in the current code
that makes the difference ? Or are spin_lock_irq and atomic_inc/dec
really that close ? Would be good to know from other archs just to keep
in mind in the future.
As for the fallback case, performance doesn't even matter - its not a
typical situation. So we don't need to bother with that.
Unless someone thinks otherwise, we should go with your fix.
Regards
Suparna
On Wed, Dec 03, 2003 at 03:14:06PM -0800, Daniel McNeil wrote:
> Suparna,
>
> I did a quick test of your patch and my patch by running my aiocp
> program to write zeros to a file and to read a file. I used
> a 50MB file on ext2 file system on a ramdisk. The machine
> is a 2-proc IBM box with:
>
> model name : Intel(R) Xeon(TM) CPU 1700MHz
> stepping : 10
> cpu MHz : 1686.033
> cache size : 256 KB
>
> The write test was:
>
> time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file
>
> The read test was
>
> time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file
>
> I ran each test more than 10 times and here are the averages:
>
> my patch your patch
>
> aiocp write real 0.7328 real 0.7756
> user 0.01425 user 0.01221
> sys 0.716 sys 0.76157
>
> aiocp read real 0.7250 real 0.7456
> user 0.0144 user 0.0130
> sys 0.07149 sys 0.7307
>
> It looks like using the spin_lock instead of the atomic inc/dec
> is very close performance wise. The spin_lock averages a bit
> faster. This is not testing the fallback base, but both patches
> would be very similar in performance for that case.
>
> I don't have any non-intel hardware to test with.
>
> Daniel
>
>
>
> On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote:
> > I suspect the degree to which the spin_lock_irq is costlier
> > than atomic_inc/dec would vary across architectures - cli/sti
> > is probably more expensive on certain archs than others.
> >
> > The patch I sent just kept things the way they were in terms of
> > locking costs, assuming that those choices were thought through
> > at that time (should check with akpm). Yours changes it by
> > switching to spin_lock(unlock)_irq instead of atomic_dec in
> > the normal (common) path for finished_one_bio, for both sync
> > and async i/o. At the same time, for the sync i/o case, as
> > you observe it takes away one atomic_dec from dio_bio_end_io.
> >
> > Since these probably aren't really very hot paths ... possibly
> > the difference doesn't matter that much. I do agree that your
> > patch makes the locking easier to follow.
> >
> > Regards
> > Suparna
> >
> > On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote:
> > > Suparna,
> > >
> > > Sorry I did not respond sooner, I was on vacation.
> > >
> > > Your patch should also fix the problem. I like mine with the
> > > cleaner locking.
> > >
> > > I am not sure your approach has less overhead. At least
> > > on x86, cli/sti are fairly inexpensive. The locked xchange or locked
> > > inc/dec is what is expensive (from what I understand).
> > >
> > > So comparing:
> > >
> > > my patch: Your patch:
> > >
> > > dio_bio_submit()
> > > spin_lock() atomic_inc(bio_count);
> > > bio_count++ atomic_inc(bios_in_flight);
> > > bios_in_flight++
> > > spin_unlock
> > >
> > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
> > > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)
> > >
> > > finished_one_bio() (normal case)
> > >
> > > My patch:
> > > spin_lock() atomic_dec_and_test(bio_count)
> > > bio_count--
> > > spin_unlock()
> > >
> > > 1 locked instruction each, so very close -- atomic_dec_and_test() does
> > > not disable interrupts, so it is probabably a little bit faster.
> > >
> > > finished_one-bio (fallback case):
> > >
> > > spin_lock() spin_lock()
> > > bio_count--; dio->waiter = null
> > > spin_unlock() spin_unlock()
> > >
> > > Both approaches are the same.
> > >
> > > dio_bio_complete()
> > >
> > > spin_lock() spin_lock()
> > > bios_in_flight-- atomic_dec()
> > > spin_unlock spin_unlock()
> > >
> > > My patch is faster since it removed 1 locked instruction.
> > >
> > > Conclusion:
> > >
> > > My guess would be that both approaches are close, but my patch
> > > has less locked instructions but does disable interrupts more.
> > > My preference is for the cleaner locking approach that is easier
> > > to understand and modify in the future.
> > >
> > > Daniel
> > >
> > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> > > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > > > > Suparna,
> > > > >
> > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which
> > > > > was helping me see problems because the the freed dio was getting
> > > > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > > > > very good at catching these.
> > > >
> > > > Ah I see - perhaps that explains why neither Janet nor I could
> > > > recreate the problem that you were hitting so easily. So we
> > > > should probably try running with CONFIG_DEBUG_SLAB and
> > > > CONFIG_DEBUG_PAGEALLOC as well.
> > > >
> > > > >
> > > > > I updated your AIO fallback patch plus your AIO race plus I fixed
> > > > > the bio_count decrement fix. This patch has all three fixes and
> > > > > it is working for me.
> > > > >
> > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > > > > and using that for all the bio fields. I changed bio_count and
> > > > > bios_in_flight from atomics into int. They are now proctected by
> > > > > the bio_lock. I fixed the race, by in finished_one_bio() by
> > > > > leaving the bio_count at 1 until after the dio_complete()
> > > > > and then do the bio_count decrement and wakeup holding the bio_lock.
> > > > >
> > > > > Take a look, give it a try, and let me know what you think.
> > > >
> > > > I had been trying a slightly different kind of fix -- appended is
> > > > the updated version of the patch I last posted. It uses the bio_list_lock
> > > > to protect the dio->waiter field, which finished_one_bio sets back
> > > > to NULL after it has issued the wakeup; and the code that waits for
> > > > i/o to drain out checks the dio->waiter field instead of bio_count.
> > > > This might not seem very obvious given the nomenclature of the
> > > > bio_list_lock, so I was holding back wondering if it could be
> > > > improved.
> > > >
> > > > Your approach looks clearer in that sense -- its pretty unambiguous
> > > > about what lock protects what fields. The only thing that bothers me (and
> > > > this is what I was trying to avoid in my patch) is the increased
> > > > use of spin_lock_irq 's (overhead of turning interrupts off and on)
> > > > instead of simple atomic inc/dec in most places.
> > > >
> > > > Thoughts ?
> > > >
> > > > Regards
> > > > Suparna
> > >
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > > the body to majordomo@kvack.org. For more info on Linux AIO,
> > > see: http://www.kvack.org/aio/
> > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
>
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
next prev parent reply other threads:[~2003-12-04 4:35 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-11-13 7:30 2.6.0-test9-mm3 Andrew Morton
2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
2003-11-17 5:25 ` Suparna Bhattacharya
2003-11-18 1:15 ` Daniel McNeil
2003-11-18 1:37 ` Daniel McNeil
2003-11-18 11:55 ` Suparna Bhattacharya
2003-11-18 23:47 ` Daniel McNeil
2003-11-24 9:42 ` Suparna Bhattacharya
2003-11-25 23:49 ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil
2003-11-26 7:55 ` Suparna Bhattacharya
2003-12-02 1:35 ` Daniel McNeil
2003-12-02 15:25 ` Suparna Bhattacharya
2003-12-03 23:14 ` Daniel McNeil
2003-12-04 4:40 ` Suparna Bhattacharya [this message]
2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry
2003-11-14 5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 20:57 ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:57 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 21:37 ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:47 ` 2.6.0-test9-mm3 Linus Torvalds
2003-11-15 0:55 ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-15 19:34 ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
2003-11-15 19:52 ` Zwane Mwaikambo
2003-11-17 21:46 ` Zwane Mwaikambo
2003-11-17 22:42 ` Linus Torvalds
2003-11-17 23:01 ` Zwane Mwaikambo
2003-11-17 23:14 ` Zwane Mwaikambo
2003-11-18 7:21 ` Zwane Mwaikambo
2003-11-18 15:47 ` Linus Torvalds
2003-11-18 16:16 ` Zwane Mwaikambo
2003-11-18 16:37 ` Linus Torvalds
2003-11-18 17:08 ` Zwane Mwaikambo
2003-11-18 17:38 ` Martin J. Bligh
2003-11-18 17:22 ` Zwane Mwaikambo
2003-11-19 20:32 ` Matt Mackall
2003-11-19 23:09 ` Matt Mackall
2003-11-20 7:14 ` Zwane Mwaikambo
2003-11-20 7:44 ` Matt Mackall
2003-11-20 7:53 ` Andrew Morton
2003-11-20 8:13 ` Matt Mackall
2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 18:59 ` 2.6.0-test9-mm3 Andrew Morton
2003-11-14 19:32 ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 20:27 ` 2.6.0-test9-mm3 John Stoffel
2003-11-15 1:01 ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 19:10 ` 2.6.0-test9-mm3 Badari Pulavarty
2003-11-14 20:29 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-17 20:58 ` 2.6.0-test9-mm3 bill davidsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031204044029.GA3353@in.ibm.com \
--to=suparna@in.ibm.com \
--cc=akpm@osdl.org \
--cc=daniel@osdl.org \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).