linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* O_DIRECT please; Sybase 12.5
@ 2001-06-29  9:39 Dan Kegel
  2001-06-29  9:50 ` Alan Cox
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Dan Kegel @ 2001-06-29  9:39 UTC (permalink / raw)
  To: linux-kernel

At work I had to sit through a meeting where I heard
the boss say "If Linux makes Sybase go through the page cache on
reads, maybe we'll just have to switch to Solaris.  That's
a serious performance problem."
All I could say was "I expect Linux will support O_DIRECT
soon, and Sybase will support that within a year."  

Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
 http://lwn.net/2001/0510/bigpage.php3 )
Is it supported yet in 2.4, or is this a 2.5 thing?

And what are the chances Sybase will support that flag any time
soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux
that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable
for NT and Linux.  I'm downloading now; it's a biggie.

It supports raw partitions, which is good; that might satisfy my
boss (although the administration will be a pain, and I'm not
sure whether it's really supported by Dell RAID devices).
I'd prefer O_DIRECT :-(

Hope somebody can give me encouraging news.

Thanks,
Dan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:39 O_DIRECT please; Sybase 12.5 Dan Kegel
@ 2001-06-29  9:50 ` Alan Cox
  2001-06-29 10:16   ` Dan Kegel
  2001-07-05 13:59   ` Andrea Arcangeli
  2001-06-29 15:23 ` Steve Lord
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Alan Cox @ 2001-06-29  9:50 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel

> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."

Thats something you'd have to benchmark. It depends on a very large number
of factors including whether the database uses mmap, the average I/O size
and the like

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(

We already support raw direct I/O to devices themselves so they should support
that - if not then Oracle I believe already does.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:50 ` Alan Cox
@ 2001-06-29 10:16   ` Dan Kegel
  2001-06-29 12:49     ` Mike Harrold
  2001-07-05 13:59   ` Andrea Arcangeli
  1 sibling, 1 reply; 15+ messages in thread
From: Dan Kegel @ 2001-06-29 10:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > the boss say "If Linux makes Sybase go through the page cache on
> > reads, maybe we'll just have to switch to Solaris.  That's
> > a serious performance problem."
> 
> Thats something you'd have to benchmark. It depends on a very large number
> of factors including whether the database uses mmap, the average I/O size
> and the like

I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5
on our application once we've come up to speed on basic performance
issues (we're database newbies).
 
> > It supports raw partitions, which is good; that might satisfy my
> > boss (although the administration will be a pain, and I'm not
> > sure whether it's really supported by Dell RAID devices).
> > I'd prefer O_DIRECT :-(
> 
> We already support raw direct I/O to devices themselves so they should support
> that - if not then Oracle I believe already does.

Haven't seen Sybase talk about O_DIRECT.  Not sure we want to
pony up the Sybase license fees.  (I'm still in denial about
databases in general, and hope I can switch to PostgreSQL
at some point.)

BTW, 
http://eval.veritas.com/webfiles/whitepapers/sybaseedition/sybase14_performance_paper.pdf
seems to show that raw beats O_DIRECT hands down on Solaris.
Will that hold on Linux, or is your (forthcoming?) O_DIRECT
higher performance than the one on Solaris?

Thanks,
Dan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29 10:16   ` Dan Kegel
@ 2001-06-29 12:49     ` Mike Harrold
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Harrold @ 2001-06-29 12:49 UTC (permalink / raw)
  To: Dan Kegel; +Cc: Alan Cox, linux-kernel

> 
> Alan Cox wrote:
> > 
> > > the boss say "If Linux makes Sybase go through the page cache on
> > > reads, maybe we'll just have to switch to Solaris.  That's
> > > a serious performance problem."
> > 
> > Thats something you'd have to benchmark. It depends on a very large number
> > of factors including whether the database uses mmap, the average I/O size
> > and the like
> 
> I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5
> on our application once we've come up to speed on basic performance
> issues (we're database newbies).

Quite obviously. One of the primary things a DBA is supposed to do is ensure
that the disk is accessed as *few* times as possible. What size database do
you have? How much memory has the machine have? How much memory does the
database have? How many engines is the database running?

We can take this off-list if you want, but disk I/O shouldn't really be an
issue for any database as long as other parameters are set correctly. Sybase
recommends raw devices *not* because they are faster, but because it's the
only way that they (Sybase) can guarantee the data is actually written to
disk (legal liability, etc.).

/Mike (Sybase DBA)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:39 O_DIRECT please; Sybase 12.5 Dan Kegel
  2001-06-29  9:50 ` Alan Cox
@ 2001-06-29 15:23 ` Steve Lord
  2001-07-03  9:42 ` Stephen C. Tweedie
  2001-07-05 13:53 ` Andrea Arcangeli
  3 siblings, 0 replies; 15+ messages in thread
From: Steve Lord @ 2001-06-29 15:23 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel


XFS supports O_DIRECT on linux, has done for a while.

Steve

> At work I had to sit through a meeting where I heard
> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."
> All I could say was "I expect Linux will support O_DIRECT
> soon, and Sybase will support that within a year."  
> 
> Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
> ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
>  http://lwn.net/2001/0510/bigpage.php3 )
> Is it supported yet in 2.4, or is this a 2.5 thing?
> 
> And what are the chances Sybase will support that flag any time
> soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux
> that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable
> for NT and Linux.  I'm downloading now; it's a biggie.
> 
> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(
> 
> Hope somebody can give me encouraging news.
> 
> Thanks,
> Dan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:39 O_DIRECT please; Sybase 12.5 Dan Kegel
  2001-06-29  9:50 ` Alan Cox
  2001-06-29 15:23 ` Steve Lord
@ 2001-07-03  9:42 ` Stephen C. Tweedie
  2001-07-03 15:10   ` Daryll Strauss
  2001-07-05 13:53 ` Andrea Arcangeli
  3 siblings, 1 reply; 15+ messages in thread
From: Stephen C. Tweedie @ 2001-07-03  9:42 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel, Stephen Tweedie

Hi,

On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).

All block devices support raw IO --- the raw IO mechanism talks to the
device driver through the normal kernel-internal block IO entry
points.

> I'd prefer O_DIRECT :-(

Andrea Arcangeli has already posted patches you can try for ext2.  The
functionality isn't in the mainline kernel yet, though.

--Stephen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-03  9:42 ` Stephen C. Tweedie
@ 2001-07-03 15:10   ` Daryll Strauss
  2001-07-03 15:48     ` Stephen C. Tweedie
  0 siblings, 1 reply; 15+ messages in thread
From: Daryll Strauss @ 2001-07-03 15:10 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Dan Kegel, linux-kernel

On Tue, Jul 03, 2001 at 10:42:53AM +0100, Stephen C. Tweedie wrote:
> On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:
> 
> > It supports raw partitions, which is good; that might satisfy my
> > boss (although the administration will be a pain, and I'm not
> > sure whether it's really supported by Dell RAID devices).
> 
> All block devices support raw IO --- the raw IO mechanism talks to the
> device driver through the normal kernel-internal block IO entry
> points.
> 
> > I'd prefer O_DIRECT :-(
> 
> Andrea Arcangeli has already posted patches you can try for ext2.  The
> functionality isn't in the mainline kernel yet, though.

I recall hearing about a problem with the md device and raw IO. It was
something about the block sizes not matching causing performance
problems. Has anything been done to improve those issues?

					    - |Daryll

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-03 15:10   ` Daryll Strauss
@ 2001-07-03 15:48     ` Stephen C. Tweedie
  0 siblings, 0 replies; 15+ messages in thread
From: Stephen C. Tweedie @ 2001-07-03 15:48 UTC (permalink / raw)
  To: Daryll Strauss; +Cc: Stephen C. Tweedie, Dan Kegel, linux-kernel

Hi,

On Tue, Jul 03, 2001 at 08:10:39AM -0700, Daryll Strauss wrote:

> I recall hearing about a problem with the md device and raw IO. It was
> something about the block sizes not matching causing performance
> problems. Has anything been done to improve those issues?

The problem is a combination of two things.  First, raw IO is always
fully synchronous, so with raw IO (and O_DIRECT) you are, in effect,
explicitly instructing the kernel not to do any readahead.  That makes
it hard to keep two disks running in parallel with soft raid if you
are using small IOs, obviously.

Secondly, raw IO pins buffers in physical memory, and to avoid
causing serious VM problems due to having too much unswappable memory
pinned by arbitrary applications, the current raw IO driver limits the
pinned chunk size to 64k.  That, combined with the sequential nature
of raw IO, can limit performance, certainly.

Raw IO is quite capable of running with larger chunk sizes, but we
really need a kernel limiter of some description to prevent users from
using this mechanism to pin massive amounts of memory for raw IO at
once.  There are several candidate mechanisms for that, but none in
the main kernel right now.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:39 O_DIRECT please; Sybase 12.5 Dan Kegel
                   ` (2 preceding siblings ...)
  2001-07-03  9:42 ` Stephen C. Tweedie
@ 2001-07-05 13:53 ` Andrea Arcangeli
  2001-07-05 14:28   ` Andrew Morton
  3 siblings, 1 reply; 15+ messages in thread
From: Andrea Arcangeli @ 2001-07-05 13:53 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel

On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:
> At work I had to sit through a meeting where I heard
> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."
> All I could say was "I expect Linux will support O_DIRECT
> soon, and Sybase will support that within a year."  
> 
> Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
> ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
>  http://lwn.net/2001/0510/bigpage.php3 )
> Is it supported yet in 2.4, or is this a 2.5 thing?

all 2.4 kernel in SuSE 7.2 ships with O_DIRECT enabled by default for
ext2, just open your files with O_DIRECT as luser and there you go.
Today I got in my inbox a patch from Chris Wedgwood for reiserfs, and
Andrew Morton took care of ext3 O_DIRECT support (included into the ext3
patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to
add to the latest o_direct patches). (you know O_DIRECT is 99% common
code, so supporting new fs is almost a no brainer)

I will send the o_direct patch to Linus for 2.4 too but possibly this is
2.5 material, however I will fully support it for 2.4 too indeed as it
is rock solid and you can just use it in production, same thing that
everybody has to do for rawio in 2.2.

I will release a new patch soon against 2.4.7pre2 in the next aa
patchkit as soon as I finished to synchronize my tree.

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-06-29  9:50 ` Alan Cox
  2001-06-29 10:16   ` Dan Kegel
@ 2001-07-05 13:59   ` Andrea Arcangeli
  1 sibling, 0 replies; 15+ messages in thread
From: Andrea Arcangeli @ 2001-07-05 13:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: Dan Kegel, linux-kernel

On Fri, Jun 29, 2001 at 10:50:15AM +0100, Alan Cox wrote:
> > the boss say "If Linux makes Sybase go through the page cache on
> > reads, maybe we'll just have to switch to Solaris.  That's
> > a serious performance problem."
> 
> Thats something you'd have to benchmark. It depends on a very large number
> of factors including whether the database uses mmap, the average I/O size
> and the like

correct, here the benchmarks:

	http://boudicca.tux.org/hypermail/linux-kernel/2001week17/1175.html
        http://boudicca.tux.org/hypermail/linux-kernel/2001week17/att-1175/01-directio.png

of course the huge improvement is also because of broken VM in the
buffered-io case.

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-05 13:53 ` Andrea Arcangeli
@ 2001-07-05 14:28   ` Andrew Morton
  2001-07-05 14:37     ` Andrea Arcangeli
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2001-07-05 14:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Dan Kegel, linux-kernel

Andrea Arcangeli wrote:
> 
> Andrew Morton took care of ext3 O_DIRECT support (included into the ext3
> patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to
> add to the latest o_direct patches). (you know O_DIRECT is 99% common
> code, so supporting new fs is almost a no brainer)

Sorry, haven't looked at that yet.

ext3 journals data.  That's unique and it breaks things (or rather,
things break it).   It'd be trivial to support O_DIRECT in ext3's
writeback mode (metadata-only), but nobody uses that.

>From a quick look it seems that we'll need fs-private implementations
of generic_direct_IO() and brw_kiovec() at least.

I'll take a closer look.

-

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-05 14:28   ` Andrew Morton
@ 2001-07-05 14:37     ` Andrea Arcangeli
  2001-07-05 15:06       ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Arcangeli @ 2001-07-05 14:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dan Kegel, linux-kernel

On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote:
> ext3 journals data.  That's unique and it breaks things (or rather,
> things break it).   It'd be trivial to support O_DIRECT in ext3's
> writeback mode (metadata-only), but nobody uses that.

I thought everybody uses metadata-only to avoid killing data-write
performance. So I thought it was ok to at first support O_DIRECT only
for metadata journaling, doing that should be a three liner as you said
and that is what I expected.

> >From a quick look it seems that we'll need fs-private implementations
> of generic_direct_IO() and brw_kiovec() at least.

brw_kiovec is called by generic_direct_IO, so yes, all you need is a
private generic_direct_IO implementation to deal with the journaled data
writes.

> I'll take a closer look.

OK, thanks!

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-05 14:37     ` Andrea Arcangeli
@ 2001-07-05 15:06       ` Andrew Morton
  2001-07-06  0:25         ` Keith Owens
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2001-07-05 15:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Dan Kegel, linux-kernel

Andrea Arcangeli wrote:
> 
> On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote:
> > ext3 journals data.  That's unique and it breaks things (or rather,
> > things break it).   It'd be trivial to support O_DIRECT in ext3's
> > writeback mode (metadata-only), but nobody uses that.
> 
> I thought everybody uses metadata-only to avoid killing data-write
> performance.

ext3 has three modes:

data=journal

	Data is journalled.  Yes, this slows things down
	significantly.

data=ordered

	The default mode and the most popular.  All data is written
	to disk prior to a commit.  Write throughput is good, and
	you don't have uninitialised data in your files after a
	crash.

data=writeback

	Metadata-only.   Better write throughput (in dbench, anyway),
	but only metadata integrity is preserved after a crash. ie:
	fsck says the fs is fine, but files can (and almost always do)
	contain random stuff after a crash.

Ordered data mode is really nice.  It's not magical though - for example,
if you reset the machine during a kernel build, a subsequent `make' will
fail because you have a number of .o files which have zero length.
That's the length they happened to have when the machine went down.

For ordered-data mode we need to keep track of all the buffers which
are associated with a transaction's journalled metadata and ensure that
they are written out before the transaction commits.  That is done with
a little structure which hangs off ->b_private.

> So I thought it was ok to at first support O_DIRECT only
> for metadata journaling, doing that should be a three liner as you said
> and that is what I expected.

Yup.  metadata-only journalling is all-round much, much simpler.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
  2001-07-05 15:06       ` Andrew Morton
@ 2001-07-06  0:25         ` Keith Owens
  0 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2001-07-06  0:25 UTC (permalink / raw)
  To: linux-kernel

On Fri, 06 Jul 2001 01:06:53 +1000, 
Andrew Morton <andrewm@uow.edu.au> wrote:
>Ordered data mode is really nice.  It's not magical though - for example,
>if you reset the machine during a kernel build, a subsequent `make' will
>fail because you have a number of .o files which have zero length.

FYI, that particular problem will disappear with the 2.5 Makefiles.
The zero length .o files will still exist but the post-compile
dependency data (.o.d) will not exist so a subsequent make kernel will
rebuild the incomplete objects.  This is a general workaround for
incomplete kernel objects, independent of the file system type.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: O_DIRECT please; Sybase 12.5
       [not found] <3B3C4CB4.6B3D2B2F@kegel.com.suse.lists.linux.kernel>
@ 2001-06-29 10:42 ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2001-06-29 10:42 UTC (permalink / raw)
  To: Dan Kegel; +Cc: linux-kernel

Dan Kegel <dank@kegel.com> writes:
> 
> And what are the chances Sybase will support that flag any time
> soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux

When Sybase always submits its buffers block aligned (same requirement as
for raw io) you can do it with a simple LD_PRELOAD hack.

I hacked sapdb (which has source available unlike sybase) to do direct IO 
and it seems to not hurt at least.

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(

LVM makes raw partitions much less worse than they used to be. It is 
basically a file system of raw partitions; allowing you to move and resize
them.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2001-07-06  0:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-29  9:39 O_DIRECT please; Sybase 12.5 Dan Kegel
2001-06-29  9:50 ` Alan Cox
2001-06-29 10:16   ` Dan Kegel
2001-06-29 12:49     ` Mike Harrold
2001-07-05 13:59   ` Andrea Arcangeli
2001-06-29 15:23 ` Steve Lord
2001-07-03  9:42 ` Stephen C. Tweedie
2001-07-03 15:10   ` Daryll Strauss
2001-07-03 15:48     ` Stephen C. Tweedie
2001-07-05 13:53 ` Andrea Arcangeli
2001-07-05 14:28   ` Andrew Morton
2001-07-05 14:37     ` Andrea Arcangeli
2001-07-05 15:06       ` Andrew Morton
2001-07-06  0:25         ` Keith Owens
     [not found] <3B3C4CB4.6B3D2B2F@kegel.com.suse.lists.linux.kernel>
2001-06-29 10:42 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).