* do_div considered harmful
@ 2003-08-04 2:03 Andries.Brouwer
2003-08-04 2:29 ` Andrew Morton
2003-08-04 5:54 ` do_div considered harmful Linus Torvalds
0 siblings, 2 replies; 11+ messages in thread
From: Andries.Brouwer @ 2003-08-04 2:03 UTC (permalink / raw)
To: linux-kernel; +Cc: torvalds
Writing this ide capacity patch an hour ago or so
I split off a helper sectors_to_MB() since Erik's recent
patch uses this also.
Now that I compare, he wrote
nativeMb = do_div(nativeMb, 1000000);
to divide nativeMb by 1000000.
Similarly, I find in fs/cifs/inode.c
inode->i_blocks = do_div(findData.NumOfBytes, inode->i_blksize);
So, it seems natural to expect that do_div() gives the quotient.
But it gives the remainder.
(Strange, Erik showed correct output.)
Since the semantics of this object are very unlike that of a C function,
I wonder whether we should write DO_DIV instead, or DO_DIV_AND_REM
to show that a remainder is returned.
Andries
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: do_div considered harmful
2003-08-04 2:03 do_div considered harmful Andries.Brouwer
@ 2003-08-04 2:29 ` Andrew Morton
2003-08-05 2:58 ` i_blksize Andries Brouwer
2003-08-04 5:54 ` do_div considered harmful Linus Torvalds
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2003-08-04 2:29 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel, torvalds
Andries.Brouwer@cwi.nl wrote:
>
> Writing this ide capacity patch an hour ago or so
> I split off a helper sectors_to_MB() since Erik's recent
> patch uses this also.
> Now that I compare, he wrote
> nativeMb = do_div(nativeMb, 1000000);
> to divide nativeMb by 1000000.
> Similarly, I find in fs/cifs/inode.c
> inode->i_blocks = do_div(findData.NumOfBytes, inode->i_blksize);
This should be
int blocksize = 1 << inode->i_blkbits;
inode->i_blocks = (findData.NumOfBytes + blocksize - 1)
>> inode->i_blkbits;
and inode.i_blksize should probably be removed from the kernel.
> So, it seems natural to expect that do_div() gives the quotient.
> But it gives the remainder.
> (Strange, Erik showed correct output.)
>
> Since the semantics of this object are very unlike that of a C function,
> I wonder whether we should write DO_DIV instead, or DO_DIV_AND_REM
> to show that a remainder is returned.
Sometimes the slash-star operator comes in handy.
--- 25/include/asm-i386/div64.h~do_div-comment 2003-08-03 19:20:58.000000000 -0700
+++ 25-akpm/include/asm-i386/div64.h 2003-08-03 19:21:11.000000000 -0700
@@ -1,6 +1,16 @@
#ifndef __I386_DIV64
#define __I386_DIV64
+/*
+ * The semantics of do_div() are:
+ *
+ * uint32_t do_div(uint64_t *n, uint32_t base)
+ * {
+ * uint32_t remainder = *n % base;
+ * *n = *n / base;
+ * return remainder;
+ * }
+ */
#define do_div(n,base) ({ \
unsigned long __upper, __low, __high, __mod; \
asm("":"=a" (__low), "=d" (__high):"A" (n)); \
_
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
2003-08-04 2:29 ` Andrew Morton
@ 2003-08-05 2:58 ` Andries Brouwer
2003-08-05 6:10 ` i_blksize Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Andries Brouwer @ 2003-08-05 2:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andries.Brouwer, linux-kernel, torvalds
On Sun, Aug 03, 2003 at 07:29:19PM -0700, Andrew Morton wrote:
> and inode.i_blksize should probably be removed from the kernel.
I see that Linus has added a comment and does not want to rename,
so that settles that half of that letter.
You propose to remove i_blksize.
Yes, a good plan, half an hour's work.
It is used in stat only, so we have to produce some value for stat.
Do you want to replace
inode->i_blksize
by
inode->i_sb->s_optimal_io_size
?
Andries
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
2003-08-05 2:58 ` i_blksize Andries Brouwer
@ 2003-08-05 6:10 ` Andrew Morton
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2003-08-05 6:10 UTC (permalink / raw)
To: Andries Brouwer; +Cc: Andries.Brouwer, linux-kernel, torvalds
Andries Brouwer <aebr@win.tue.nl> wrote:
>
> You propose to remove i_blksize.
> Yes, a good plan, half an hour's work.
> It is used in stat only, so we have to produce some value for stat.
> Do you want to replace
> inode->i_blksize
> by
> inode->i_sb->s_optimal_io_size
> ?
No, that's different. You are referring to stat.st_blksize. That is a
different animal, and we can leave that alone.
inode->i_blksize is equal to (1 << inode->i_blkbits) always, all the time.
It is just duplicated nonsense and usually leads to poorer code -
multiplications instead of shifts.
It should be a pretty easy incremental set of patches to ease i_blksize out
of the kernel.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: do_div considered harmful
2003-08-04 2:03 do_div considered harmful Andries.Brouwer
2003-08-04 2:29 ` Andrew Morton
@ 2003-08-04 5:54 ` Linus Torvalds
1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2003-08-04 5:54 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel
On Mon, 4 Aug 2003 Andries.Brouwer@cwi.nl wrote:
>
> Now that I compare, he wrote
> nativeMb = do_div(nativeMb, 1000000);
> to divide nativeMb by 1000000.
>
> So, it seems natural to expect that do_div() gives the quotient.
> But it gives the remainder.
> (Strange, Erik showed correct output.)
Actually, the above is "undefined behaviour", since it has _two_
assignments in the same thing. Exactly because "do_div()" modifies both
the first argument _and_ returns a value. So depending on the
implementation of do_div() (whether there are any sequence points etc) and
on random compiler behaviour (if there are no sequence points in do_div()
internally), in the example above "nativeMb" migth be totally undefined
after the above.
And yes, as a special case, it might be the divisor.
I agree that "do_div()" has strange semantics and is very likely misnamed,
but they are kind of forced upon us by the fact that C functions cannot
return two values. Renaming do_div() at this point is just going to make
it harder to write kernel- portable source, so I suspect we're better off
just commenting it, and making people aware of how do_div() works.
Not very many people should use "do_div()" directly (and a quick grep
shows that not very many people do). It's generally a mistake to do so, I
suspect. The thing was originally written explicitly for "printk()" and
nothing else.
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
@ 2003-08-05 9:20 Andries.Brouwer
2003-08-05 15:55 ` i_blksize Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Andries.Brouwer @ 2003-08-05 9:20 UTC (permalink / raw)
To: aebr, akpm; +Cc: Andries.Brouwer, linux-kernel, torvalds
From: Andrew Morton <akpm@osdl.org>
> You propose to remove i_blksize.
> It is used in stat only, so we have to produce some value for stat.
> Do you want to replace
> inode->i_blksize
> by
> inode->i_sb->s_optimal_io_size
> ?
No, that's different. You are referring to stat.st_blksize. That is a
different animal, and we can leave that alone.
inode->i_blksize is equal to (1 << inode->i_blkbits) always, all the time.
It is just duplicated nonsense and usually leads to poorer code -
multiplications instead of shifts.
It should be a pretty easy incremental set of patches to ease i_blksize out
of the kernel.
Hmm. Let me first read stat.c.
generic_fillattr():
stat->blksize = inode->i_blksize;
vfs_getattr():
generic_fillattr(inode, stat);
if (!stat->blksize)
stat->blksize = s->s_blocksize;
cp_new_stat64():
tmp.st_blksize = stat->blksize;
copy_to_user(statbuf, &tmp, sizeof(tmp));
So really, if inode->i_blksize has a nonzero value,
this value is returned in stat.st_blksize.
Remains your other claim: inode->i_blksize == (1 << inode->i_blkbits).
I don't see any code that would enforce that.
Andries
[and there are lots of comments around:
inode->i_blksize = PAGE_SIZE; /* This is the optimal IO size ... */
]
Maybe also a good candidate for renaming if it is not eliminated?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
@ 2003-08-05 16:27 Andries.Brouwer
2003-08-05 17:50 ` i_blksize Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Andries.Brouwer @ 2003-08-05 16:27 UTC (permalink / raw)
To: Andries.Brouwer, akpm; +Cc: aebr, linux-kernel, torvalds
> Looks like I got myself confused
Yes. But nevertheless, now that you brought this up,
we might consider throwing out i_blksize.
I am not aware of anybody who actually uses this to give
per-file advice. So, it could be in the superblock.
There is no reason why it would be a power of two -
the case mentioned yesterday or so was cifs, with
inode->i_blksize =
(pTcon->ses->server->maxBuf - MAX_CIFS_HDR_SIZE) & 0xFFFFFE00;
I see no reason not to replace i_blksize by i_sb->s_optimal_io_size.
Any objections?
If sizeof(struct inode) decreases by 1% then we can keep 1% more inodes.
That reminds me - I threw out i_dev and i_cdev, but Al reintroduced i_cdev.
We should do as some comment says and make a union with i_bdev and i_pipe.
Another 8 bytes gone.
Andries
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
2003-08-05 16:27 i_blksize Andries.Brouwer
@ 2003-08-05 17:50 ` Andrew Morton
2003-08-05 18:09 ` i_blksize Andreas Dilger
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2003-08-05 17:50 UTC (permalink / raw)
To: Andries.Brouwer; +Cc: linux-kernel, torvalds
Andries.Brouwer@cwi.nl wrote:
>
> > Looks like I got myself confused
>
> Yes. But nevertheless, now that you brought this up,
> we might consider throwing out i_blksize.
>
> I am not aware of anybody who actually uses this to give
> per-file advice. So, it could be in the superblock.
I suppose so. reiserfs plays with it.
I can't really see that anyone would want to set the I/O size hint on a
per-inode basis, especially as the readahead and writebehind code will
cheerfully ignore it.
> Any objections?
I don't think it's worth fiddling with at this time, really.
> If sizeof(struct inode) decreases by 1% then we can keep 1% more inodes.
>
> That reminds me - I threw out i_dev and i_cdev, but Al reintroduced i_cdev.
> We should do as some comment says and make a union with i_bdev and i_pipe.
> Another 8 bytes gone.
Well all the inode slab caches are using SLAB_HWCACHE_ALIGN at present, so
it's a little moot. Especially on a pentium4-compiled kernel.
But I expect most distributed 2.6 kernels will be pII or pIII-compiled.
Let's look:
SMP:
sizeof(struct ext2_inode_info) = 0x1d0
sizeof(struct ext3_inode_info) = 0x1e0
Both of these pack eight-per-page. Need to get them to 0x1c4 (and remove
SLAB_HWCACHE_ALIGN) to get to nine-per-page.
UP:
sizeof(struct ext3_inode_info) = 0x1c4 (whew!)
sizeof(struct ext2_inode_info) = 0x1b4
So for these filesystems at least, we need to remove SLAB_HWCACHE_ALIGN and
we will get a 12% improvement in packing density on uniprocessor.
unionification of i_[bcp]dev sounds like a good idea to give us a little
margin there.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
2003-08-05 17:50 ` i_blksize Andrew Morton
@ 2003-08-05 18:09 ` Andreas Dilger
0 siblings, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2003-08-05 18:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andries.Brouwer, linux-kernel, torvalds
On Aug 05, 2003 10:50 -0700, Andrew Morton wrote:
> Andries.Brouwer@cwi.nl wrote:
> > Yes. But nevertheless, now that you brought this up,
> > we might consider throwing out i_blksize.
> >
> > I am not aware of anybody who actually uses this to give
> > per-file advice. So, it could be in the superblock.
>
> I suppose so. reiserfs plays with it.
>
> I can't really see that anyone would want to set the I/O size hint on a
> per-inode basis, especially as the readahead and writebehind code will
> cheerfully ignore it.
Actually, Lustre uses this, because each file can be striped over a
different number of storage targets, and you want read and write requests
large enough to try and write to all of the targets at one time.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: i_blksize
@ 2003-08-05 19:48 Andries.Brouwer
0 siblings, 0 replies; 11+ messages in thread
From: Andries.Brouwer @ 2003-08-05 19:48 UTC (permalink / raw)
To: adilger, akpm; +Cc: Andries.Brouwer, linux-kernel, torvalds
>> I am not aware of anybody who actually uses i_blksize
>> to give per-file advice.
> Actually, Lustre uses this
Hmm. Not in the sources I have. Probably too old.
> reiserfs
I think constant per filesystem. The variable reiserfs_default_io_size
is a global constant in reiserfs/super.c.
Andries
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-08-05 19:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-04 2:03 do_div considered harmful Andries.Brouwer
2003-08-04 2:29 ` Andrew Morton
2003-08-05 2:58 ` i_blksize Andries Brouwer
2003-08-05 6:10 ` i_blksize Andrew Morton
2003-08-04 5:54 ` do_div considered harmful Linus Torvalds
2003-08-05 9:20 i_blksize Andries.Brouwer
2003-08-05 15:55 ` i_blksize Andrew Morton
2003-08-05 16:27 i_blksize Andries.Brouwer
2003-08-05 17:50 ` i_blksize Andrew Morton
2003-08-05 18:09 ` i_blksize Andreas Dilger
2003-08-05 19:48 i_blksize Andries.Brouwer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).