linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: i_blksize
@ 2003-08-05  9:20 Andries.Brouwer
  2003-08-05 15:55 ` i_blksize Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Andries.Brouwer @ 2003-08-05  9:20 UTC (permalink / raw)
  To: aebr, akpm; +Cc: Andries.Brouwer, linux-kernel, torvalds

    From: Andrew Morton <akpm@osdl.org>

    >  You propose to remove i_blksize.
    >  It is used in stat only, so we have to produce some value for stat.
    >  Do you want to replace
    >      inode->i_blksize
    >  by
    >      inode->i_sb->s_optimal_io_size
    >  ?

    No, that's different.  You are referring to stat.st_blksize.  That is a
    different animal, and we can leave that alone.

    inode->i_blksize is equal to (1 << inode->i_blkbits) always, all the time. 
    It is just duplicated nonsense and usually leads to poorer code -
    multiplications instead of shifts.

    It should be a pretty easy incremental set of patches to ease i_blksize out
    of the kernel.

Hmm. Let me first read stat.c.

generic_fillattr():
	stat->blksize = inode->i_blksize;
vfs_getattr():
	generic_fillattr(inode, stat);
	if (!stat->blksize)
		stat->blksize = s->s_blocksize;
cp_new_stat64():
	tmp.st_blksize = stat->blksize;
	copy_to_user(statbuf, &tmp, sizeof(tmp));

So really, if inode->i_blksize has a nonzero value,
this value is returned in stat.st_blksize.

Remains your other claim: inode->i_blksize == (1 << inode->i_blkbits).
I don't see any code that would enforce that.

Andries

[and there are lots of comments around:
	inode->i_blksize = PAGE_SIZE;   /* This is the optimal IO size ... */
]

Maybe also a good candidate for renaming if it is not eliminated?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
  2003-08-05  9:20 i_blksize Andries.Brouwer
@ 2003-08-05 15:55 ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2003-08-05 15:55 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: aebr, Andries.Brouwer, linux-kernel, torvalds

Andries.Brouwer@cwi.nl wrote:
>
> Hmm. Let me first read stat.c.

hmm indeed.  Looks like I got myself confused:

fs/ext2/ialloc.c:       inode->i_blksize = PAGE_SIZE;   /* This is the optimal IO size (for stat), not the fs block size */



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
@ 2003-08-05 19:48 Andries.Brouwer
  0 siblings, 0 replies; 8+ messages in thread
From: Andries.Brouwer @ 2003-08-05 19:48 UTC (permalink / raw)
  To: adilger, akpm; +Cc: Andries.Brouwer, linux-kernel, torvalds

>> I am not aware of anybody who actually uses i_blksize
>> to give per-file advice.

> Actually, Lustre uses this

Hmm. Not in the sources I have. Probably too old.

> reiserfs

I think constant per filesystem. The variable reiserfs_default_io_size
is a global constant in reiserfs/super.c.

Andries


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
  2003-08-05 17:50 ` i_blksize Andrew Morton
@ 2003-08-05 18:09   ` Andreas Dilger
  0 siblings, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2003-08-05 18:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andries.Brouwer, linux-kernel, torvalds

On Aug 05, 2003  10:50 -0700, Andrew Morton wrote:
> Andries.Brouwer@cwi.nl wrote:
> > Yes. But nevertheless, now that you brought this up,
> > we might consider throwing out i_blksize.
> > 
> > I am not aware of anybody who actually uses this to give
> > per-file advice. So, it could be in the superblock.
> 
> I suppose so.  reiserfs plays with it.
> 
> I can't really see that anyone would want to set the I/O size hint on a
> per-inode basis, especially as the readahead and writebehind code will
> cheerfully ignore it.

Actually, Lustre uses this, because each file can be striped over a
different number of storage targets, and you want read and write requests
large enough to try and write to all of the targets at one time.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
  2003-08-05 16:27 i_blksize Andries.Brouwer
@ 2003-08-05 17:50 ` Andrew Morton
  2003-08-05 18:09   ` i_blksize Andreas Dilger
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-08-05 17:50 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel, torvalds

Andries.Brouwer@cwi.nl wrote:
>
> > Looks like I got myself confused
> 
> Yes. But nevertheless, now that you brought this up,
> we might consider throwing out i_blksize.
> 
> I am not aware of anybody who actually uses this to give
> per-file advice. So, it could be in the superblock.

I suppose so.  reiserfs plays with it.

I can't really see that anyone would want to set the I/O size hint on a
per-inode basis, especially as the readahead and writebehind code will
cheerfully ignore it.

> Any objections?

I don't think it's worth fiddling with at this time, really.

> If sizeof(struct inode) decreases by 1% then we can keep 1% more inodes.
>
> That reminds me - I threw out i_dev and i_cdev, but Al reintroduced i_cdev.
> We should do as some comment says and make a union with i_bdev and i_pipe.
> Another 8 bytes gone.

Well all the inode slab caches are using SLAB_HWCACHE_ALIGN at present, so
it's a little moot.  Especially on a pentium4-compiled kernel.

But I expect most distributed 2.6 kernels will be pII or pIII-compiled. 
Let's look:

SMP:
	sizeof(struct ext2_inode_info) = 0x1d0
	sizeof(struct ext3_inode_info) = 0x1e0

Both of these pack eight-per-page.  Need to get them to 0x1c4 (and remove
SLAB_HWCACHE_ALIGN) to get to nine-per-page.

UP:
	sizeof(struct ext3_inode_info) = 0x1c4		(whew!)
	sizeof(struct ext2_inode_info) = 0x1b4

So for these filesystems at least, we need to remove SLAB_HWCACHE_ALIGN and
we will get a 12% improvement in packing density on uniprocessor.

unionification of i_[bcp]dev sounds like a good idea to give us a little
margin there.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
@ 2003-08-05 16:27 Andries.Brouwer
  2003-08-05 17:50 ` i_blksize Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Andries.Brouwer @ 2003-08-05 16:27 UTC (permalink / raw)
  To: Andries.Brouwer, akpm; +Cc: aebr, linux-kernel, torvalds

> Looks like I got myself confused

Yes. But nevertheless, now that you brought this up,
we might consider throwing out i_blksize.

I am not aware of anybody who actually uses this to give
per-file advice. So, it could be in the superblock.
There is no reason why it would be a power of two -
the case mentioned yesterday or so was cifs, with

        inode->i_blksize =
            (pTcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE) & 0xFFFFFE00;

I see no reason not to replace i_blksize by i_sb->s_optimal_io_size.

Any objections?

If sizeof(struct inode) decreases by 1% then we can keep 1% more inodes.

That reminds me - I threw out i_dev and i_cdev, but Al reintroduced i_cdev.
We should do as some comment says and make a union with i_bdev and i_pipe.
Another 8 bytes gone.

Andries


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
  2003-08-05  2:58   ` i_blksize Andries Brouwer
@ 2003-08-05  6:10     ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2003-08-05  6:10 UTC (permalink / raw)
  To: Andries Brouwer; +Cc: Andries.Brouwer, linux-kernel, torvalds

Andries Brouwer <aebr@win.tue.nl> wrote:
>
> You propose to remove i_blksize.
>  Yes, a good plan, half an hour's work.
>  It is used in stat only, so we have to produce some value for stat.
>  Do you want to replace
>  	inode->i_blksize
>  by
>  	inode->i_sb->s_optimal_io_size
>  ?

No, that's different.  You are referring to stat.st_blksize.  That is a
different animal, and we can leave that alone.

inode->i_blksize is equal to (1 << inode->i_blkbits) always, all the time. 
It is just duplicated nonsense and usually leads to poorer code -
multiplications instead of shifts.

It should be a pretty easy incremental set of patches to ease i_blksize out
of the kernel.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: i_blksize
  2003-08-04  2:29 ` Andrew Morton
@ 2003-08-05  2:58   ` Andries Brouwer
  2003-08-05  6:10     ` i_blksize Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Andries Brouwer @ 2003-08-05  2:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andries.Brouwer, linux-kernel, torvalds

On Sun, Aug 03, 2003 at 07:29:19PM -0700, Andrew Morton wrote:

> and inode.i_blksize should probably be removed from the kernel.

I see that Linus has added a comment and does not want to rename,
so that settles that half of that letter.

You propose to remove i_blksize.
Yes, a good plan, half an hour's work.
It is used in stat only, so we have to produce some value for stat.
Do you want to replace
	inode->i_blksize
by
	inode->i_sb->s_optimal_io_size
?

Andries


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-08-05 19:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-05  9:20 i_blksize Andries.Brouwer
2003-08-05 15:55 ` i_blksize Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2003-08-05 19:48 i_blksize Andries.Brouwer
2003-08-05 16:27 i_blksize Andries.Brouwer
2003-08-05 17:50 ` i_blksize Andrew Morton
2003-08-05 18:09   ` i_blksize Andreas Dilger
2003-08-04  2:03 do_div considered harmful Andries.Brouwer
2003-08-04  2:29 ` Andrew Morton
2003-08-05  2:58   ` i_blksize Andries Brouwer
2003-08-05  6:10     ` i_blksize Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).