linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Harkes <jaharkes@cs.cmu.edu>
To: Jesse Pollard <pollard@tomcat.admin.navo.hpc.mil>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: 64-bit block sizes on 32-bit systems
Date: Tue, 27 Mar 2001 15:20:12 -0500	[thread overview]
Message-ID: <20010327152011.A1354@cs.cmu.edu> (raw)
In-Reply-To: <200103271957.NAA13547@tomcat.admin.navo.hpc.mil>
In-Reply-To: <200103271957.NAA13547@tomcat.admin.navo.hpc.mil>; from pollard@tomcat.admin.navo.hpc.mil on Tue, Mar 27, 2001 at 01:57:42PM -0600

On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > Using similar numbers as presented. If we are working our way through
> > every single block in a Pentabyte filesystem, and the blocksize is 512
> > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > would add, according to by back of the envelope calculation, 2199023
> > seconds of CPU time a bit more than 25 days.
> 
> Ummm... I don't think it adds that much. You seem to be leaving out the
> overlap disk/IO and computation for read-ahead. This should eliminate the
> majority of the delay effect.

1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
of "assumed" overhead per block operation is 2*10^6 seconds, no I
believe I'm pretty close there. I am considering everything being
"available in the cache", i.e. no waiting for disk access.

> > Seriously, there is a lot more that needs to be done than introducing a
> > 64-bit blocknumber. Effectively 512 byte blocks are far too small for
> > that kind of data, and going to pagesize blocks (and increasing pagesize
> > to 64KB or 2MB at the same time) is a solution that is far more likely
> > to give good results since it reduces both the total the number of
> > 'blocks' on the device as well as reducing the total amount of calls
> > throughout kernel space instead of increasing the cost per call.
> 
> Talk about adding overhead... How long do you think it takes to read a
> 2MB block (not to mention the time to update that page..) The additional
> contention on the fiberchannel I/O alone might kill it if the filesystem
> is busy.

The time to update the pagetables is identical to the time to update a
4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
time to load the data into the page, however it should be a consecutive
stretch of data on disk, which should give a more efficient transfer
than small blocks scattered around the disk.

> Granted, 512 bytes could be considered too small for some things, but
> once you pass 32K you start adding a lot of rotational delay problems.
> I've used file systems with 256K blocks - they are slow when compaired
> to the throughput using 32K. I wasn't the one running the benchmarks,
> but with a MaxStrat 400GB raid with 256K sized data transfer was much
> slower (around 3 times slower) than 32K. (The target application was
> a GIS server using Oracle).

But your subsystem (the disk) was probably still using 512 byte blocks,
possibly scattered. And the OS was still using 4KB pages, it takes more
time to reclaim and gather 64 pages per IO operation than one, that's
why I'm saying that the pagesize needs to scale along with the blocksize.

The application might have been assuming a small block size as well, and
the OS was told to do several read/modify/write cycles, perhaps even 512
times as much as necessary.

I'm not saying that the current system will perform well when working
with large blocks, but compared to increasing the size of block_t, a
larger blocksize has more potential to give improvements in the long
term without adding an unrecoverable performance hit.

Jan


  reply	other threads:[~2001-03-27 20:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-03-27 19:57 64-bit block sizes on 32-bit systems Jesse Pollard
2001-03-27 20:20 ` Jan Harkes [this message]
2001-03-27 21:55   ` LA Walsh
  -- strict thread matches above, loose matches on Subject: below --
2001-03-27 22:23 Jesse Pollard
2001-03-27 23:56 ` Steve Lord
2001-03-28  8:09   ` Brad Boyer
2001-03-28 14:53     ` Dave Kleikamp
2001-03-27 19:30 Jesse Pollard
     [not found] <Pine.LNX.4.30.0103270022500.21075-100000@age.cs.columbia.edu>
     [not found] ` <3AC0CA9C.3D804361@sgi.com>
2001-03-27 19:00   ` Jan Harkes
2001-03-27 17:22 LA Walsh
2001-03-26 21:27 Jesse Pollard
2001-03-26 22:07 ` Jonathan Morton
2001-03-27  4:14   ` Jesse Pollard
2001-03-26 19:26 Jesse Pollard
2001-03-26 18:01 Manfred Spraul
2001-03-26 18:07 ` Matthew Wilcox
2001-03-26 19:40 ` LA Walsh
2001-03-26 21:53   ` Manfred Spraul
2001-03-26 22:07     ` LA Walsh
2001-03-26 17:35 LA Walsh
2001-03-26 16:39 LA Walsh
2001-03-26 17:18 ` Matthew Wilcox
2001-03-26 17:47   ` Andreas Dilger
2001-03-26 18:09     ` Matthew Wilcox
2001-03-26 18:37       ` Eric W. Biederman
2001-03-26 19:36         ` Martin Dalecki
2001-03-26 23:03         ` AJ Lewis
2001-03-26 19:05       ` Scott Laird
2001-03-26 19:09       ` Andreas Dilger
2001-03-26 20:31         ` Dan Hollis
2001-03-26 19:20       ` Rik van Riel
2001-03-26 20:14       ` Jes Sorensen
2001-03-26 17:58 ` Eric W. Biederman
2001-03-28  8:06 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20010327152011.A1354@cs.cmu.edu \
    --to=jaharkes@cs.cmu.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pollard@tomcat.admin.navo.hpc.mil \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).