linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kiobuf wrong changes in 2.4.9ac9
@ 2001-09-06  1:02 Andrea Arcangeli
  2001-09-06 12:29 ` Alan Cox
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Arcangeli @ 2001-09-06  1:02 UTC (permalink / raw)
  To: linux-kernel, Alan Cox; +Cc: rohit.seth

I suggest to backout the kiobuf patch in 2.4.9ac9. Right performance fix
is just in 2.4.10pre4aa1 and it depends on O_DIRECT.

see:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.10pre4aa1/00_o_direct-15
	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.10pre4aa1/10_rawio-f_iobuf-1

Porting to 2.4.5 is very very very trivial if truly needed.

I cannot care less if with 2 hounrded of harddisks and 2 houndred of
tasks all doing simultaneous I/O to all the 2 hounrded of harddisks, 2
hounrded of mbytes of ram are statically allocated in kiobufs. If you
have money for such configuration you *defininitely* don't want to waste
cpu in kiobufs allocation but you want to keep them preallocated and
spend the money in the 2 houndred mbytes of ram (today in Italy a pair
kilometers away from my home I can buy a 128mbytes 133mhz dimm for 20
EUR [in us it has to be cheaper], compare that with the price of the
rest of the system). I didn't even attempted to count the static ram you
as well spend in the large preallocated I/O queues for each harddisk for
the same reason.

In low end configuration with a few disks and a few tasks doing I/O the
ram overhead is some houndred kbytes so it's fine.

For the thread/process issue there's no difference at all (I'm not
penalyzing threads), it's just that you must reopen the file if the
child thread or process will do simutalenous I/O to the same rawio
device with the parent (the only difference between process and thread
is that you will be forced to share the same fd space with the parent in
the thread case but it's a long time [2.2] that the fd space is
1024*1024 fd high bound).

Now I'm not saying we don't need to shrink the size of the kiobuf so we
can save ram [notably for non IO backed kiobuf users] and make the
contention case faster as well (btw, having the KIO_MAX_ATOMIC_IO at
512k is useful only in -aa with the other changes that allows the 512k
scsi commands:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.10pre4aa1/00_sd-max_sectors-1

). But my plan was to split the kiobuf in two entities to save ram and
to try to slabify it again, but that's a much lower prio work (the high
prio stuff is what I'm shipping above in -aa) and my point here is that
this lower prio work it's not in the direction of the patch.

The above is all about performance and design, about real world
showstopper the one in 2.4.9ac9 is that kiobuf allocations are going to
fail during read/writes due mem framentation (this is why it was using
vmalloc indeed) [those faliures should be easily reprocible on x86 boxes
with PAGE_SIZE = 4k]. The reason kmem allocations larger than PAGE_SIZE
aren't reliable is because the slab like everything else is alloc_pages
backed and the main allocator isn't reliable to allocate anything larger
than PAGE_SIZE.  OTOH for the kernel stack we also allocate 2*PAGE_SIZE
physically contigous, but here the kiobuf structure would generate an
order 2 allocation that will definitely fail with the current vm
eventually [ask Daniel] (not order 1 like kernel stack)

I told Rohit a few days ago about some of those issues as argument why I
didn't accepted the patch, he raised a few issues that I hope to have
addressed in this email, I was busy with other things and so I managed
to answer only now, I'm sorry for the delay Rohit.

Rohit could you please do a run of the benchmark on top of
2.4.10pre4aa1 to verify I'm right about the "high prio" stuff, then
we'll address the "low prio" contention optimization and finegrined
memory-saving part relaxed in a larger patch.

Andrea

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kiobuf wrong changes in 2.4.9ac9
  2001-09-06  1:02 kiobuf wrong changes in 2.4.9ac9 Andrea Arcangeli
@ 2001-09-06 12:29 ` Alan Cox
  2001-09-06 13:59   ` Andrea Arcangeli
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Cox @ 2001-09-06 12:29 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Alan Cox, rohit.seth

> The above is all about performance and design, about real world
> showstopper the one in 2.4.9ac9 is that kiobuf allocations are going to
> fail during read/writes due mem framentation (this is why it was using
> vmalloc indeed) [those faliures should be easily reprocible on x86 boxes

Vmalloc is extremely expensive on many platforms. It looks very easy to
simple flip between slab and vmalloc based on size.

Let me know how the testing goes - if it works out well then I'll migrate
the -ac tree to the -aa patch when I have time to do the merging

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kiobuf wrong changes in 2.4.9ac9
  2001-09-06 12:29 ` Alan Cox
@ 2001-09-06 13:59   ` Andrea Arcangeli
  0 siblings, 0 replies; 3+ messages in thread
From: Andrea Arcangeli @ 2001-09-06 13:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, rohit.seth

On Thu, Sep 06, 2001 at 01:29:53PM +0100, Alan Cox wrote:
> > The above is all about performance and design, about real world
> > showstopper the one in 2.4.9ac9 is that kiobuf allocations are going to
> > fail during read/writes due mem framentation (this is why it was using
> > vmalloc indeed) [those faliures should be easily reprocible on x86 boxes
> 
> Vmalloc is extremely expensive on many platforms. It looks very easy to
> simple flip between slab and vmalloc based on size.

based on size in turn means based on source because the kiobuf has a
fixed size. This is why I'm saying it has to be vmalloced in these
kernel trees until we shrink it and the plan to shrink it is first of
all to split the io backend out of the memory management part.

> Let me know how the testing goes - if it works out well then I'll migrate
> the -ac tree to the -aa patch when I have time to do the merging

btw, I actually don't have the workload here (my software simulations says
it works but it's not exactly the same workload, the workload I
simulated to test it is pure simultaneous rawio I/O load to the same
rawio device via different filp) so I'd like to know too ;)

Andrea

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-09-06 13:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-06  1:02 kiobuf wrong changes in 2.4.9ac9 Andrea Arcangeli
2001-09-06 12:29 ` Alan Cox
2001-09-06 13:59   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).