All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG/PATCH] md bitmap broken on big endian machines
@ 2006-09-21 20:26 Paul Clements
  2006-09-22 15:29 ` [PATCH] md SET_BITMAP_FILE compat ioctl Paul Clements
  2006-09-28  6:34 ` [BUG/PATCH] md bitmap broken on big endian machines Neil Brown
  0 siblings, 2 replies; 7+ messages in thread
From: Paul Clements @ 2006-09-21 20:26 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Neil,

We just discovered this problem on a 64-bit IBM POWER (ppc64) system. 
The symptom was this BUG():

Sep 20 20:55:51 caspian kernel: kernel BUG in sync_request at 
drivers/md/raid1.c
:1743!
Sep 20 20:55:51 caspian kernel: Oops: Exception in kernel mode, sig: 5 [#1]
Sep 20 20:55:51 caspian kernel: SMP NR_CPUS=128 NUMA PSERIES LPAR
Sep 20 20:55:51 caspian kernel: Modules linked in: nbd raid1 ipv6 nfs 
lockd nfs_
acl sunrpc apparmor aamatch_pcre loop dm_mod e1000 ide_cd cdrom lpfc 
scsi_transp
ort_fc pdc202xx_new sg st ipr firmware_class sd_mod scsi_mod


I traced the problem back to a bad value in bitmap->chunkshift (the 
value was 8, instead of 16, with a chunksize of 64K). I think this means 
that bitmaps are broken on all big endian systems. It looks like we 
should be using ffs instead of find_first_bit. Patch has been compile 
tested only.

Thanks,
Paul

[-- Attachment #2: md_bitmap_find_first_bug.diff --]
[-- Type: text/plain, Size: 488 bytes --]

--- ./drivers/md/bitmap.c.orig	2006-09-21 16:10:17.000000000 -0400
+++ ./drivers/md/bitmap.c	2006-09-21 16:12:16.000000000 -0400
@@ -1578,8 +1578,7 @@ int bitmap_create(mddev_t *mddev)
 	if (err)
 		goto error;
 
-	bitmap->chunkshift = find_first_bit(&bitmap->chunksize,
-					sizeof(bitmap->chunksize));
+	bitmap->chunkshift = ffs(bitmap->chunksize) - 1;
 
 	/* now that chunksize and chunkshift are set, we can use these macros */
  	chunks = (blocks + CHUNK_BLOCK_RATIO(bitmap) - 1) /

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] md SET_BITMAP_FILE compat ioctl
  2006-09-21 20:26 [BUG/PATCH] md bitmap broken on big endian machines Paul Clements
@ 2006-09-22 15:29 ` Paul Clements
  2006-09-28  7:31   ` Neil Brown
  2006-09-28  6:34 ` [BUG/PATCH] md bitmap broken on big endian machines Neil Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Clements @ 2006-09-22 15:29 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

The SET_BITMAP_FILE ioctl currently doesn't have a compat entry, so on 
64-bit systems a 32-bit mdadm will fail:

Sep 20 16:34:52 caspian kernel: ioctl32(mdadm:8056): Unknown cmd fd(3) 
cmd(8004092b){00} arg(00000004) on /dev/md0

The fix is to build a 64-bit mdadm or apply the following patch (compile 
tested only).

--
Paul

[-- Attachment #2: md_set_bitmap_file_compat_ioctl.diff --]
[-- Type: text/plain, Size: 400 bytes --]

--- ./include/linux/compat_ioctl.h.orig	2006-09-22 11:14:53.000000000 -0400
+++ ./include/linux/compat_ioctl.h	2006-09-22 11:15:24.000000000 -0400
@@ -121,6 +121,7 @@ ULONG_IOCTL(START_ARRAY)
 COMPATIBLE_IOCTL(STOP_ARRAY)
 COMPATIBLE_IOCTL(STOP_ARRAY_RO)
 COMPATIBLE_IOCTL(RESTART_ARRAY_RW)
+ULONG_IOCTL(SET_BITMAP_FILE)
 /* DM */
 COMPATIBLE_IOCTL(DM_VERSION_32)
 COMPATIBLE_IOCTL(DM_REMOVE_ALL_32)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG/PATCH] md bitmap broken on big endian machines
  2006-09-21 20:26 [BUG/PATCH] md bitmap broken on big endian machines Paul Clements
  2006-09-22 15:29 ` [PATCH] md SET_BITMAP_FILE compat ioctl Paul Clements
@ 2006-09-28  6:34 ` Neil Brown
  2006-09-28 22:09   ` Michael Tokarev
  1 sibling, 1 reply; 7+ messages in thread
From: Neil Brown @ 2006-09-28  6:34 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid

On Thursday September 21, paul.clements@steeleye.com wrote:
> Neil,
> 
> We just discovered this problem on a 64-bit IBM POWER (ppc64) system. 
> The symptom was this BUG():
> 
> Sep 20 20:55:51 caspian kernel: kernel BUG in sync_request at 
> drivers/md/raid1.c:1743!

groan....

> 
> I traced the problem back to a bad value in bitmap->chunkshift (the 
> value was 8, instead of 16, with a chunksize of 64K). I think this means 
> that bitmaps are broken on all big endian systems. It looks like we 
> should be using ffs instead of find_first_bit. Patch has been compile 
> tested only.

Does this mean we need to up the superblock version number?  When
reading a version-4 (or earlier) bitmap you cannot be sure if the
correct chunk_shift was used so you cannot trust it.
But I don't like the idea of forcing everyone to upgrade their
bitmap... 
Maybe it doesn't matter too much.  If there does turn out to be a
problem - which there won't be for the little-endian majority - a
simple 'repair' process will resolve everything.

> 
> Thanks,
> Paul
> --- ./drivers/md/bitmap.c.orig	2006-09-21 16:10:17.000000000 -0400
> +++ ./drivers/md/bitmap.c	2006-09-21 16:12:16.000000000 -0400
> @@ -1578,8 +1578,7 @@ int bitmap_create(mddev_t *mddev)
>  	if (err)
>  		goto error;
>  
> -	bitmap->chunkshift = find_first_bit(&bitmap->chunksize,
> -					sizeof(bitmap->chunksize));
> +	bitmap->chunkshift = ffs(bitmap->chunksize) - 1;

bitmap->chunksize is 'unsigned long'. ffs takes an 'int'.
So there are type issues there.
ffz takes an 'unsigned long', so it is probably safer to use that.
See below.

Thanks,
NeilBrown

--------------------

Use ffz instead of find_first_set to convert multiplier to shift.

From: Paul Clements <paul.clements@steeleye.com>

find_first_set doesn't find the least-significant bit on bigendian
machines, so it is really wrong to use it.

ffs is closer, but takes an 'int' and we have a 'unsigned long'.
So use ffz(~X) to convert a chunksize into a chunkshift.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c	2006-09-28 16:30:51.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-09-28 16:31:04.000000000 +1000
@@ -1444,8 +1444,7 @@ int bitmap_create(mddev_t *mddev)
 	if (err)
 		goto error;
 
-	bitmap->chunkshift = find_first_bit(&bitmap->chunksize,
-					sizeof(bitmap->chunksize));
+	bitmap->chunkshift = ffz(~bitmap->chunksize);
 
 	/* now that chunksize and chunkshift are set, we can use these macros */
  	chunks = (blocks + CHUNK_BLOCK_RATIO(bitmap) - 1) /


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] md SET_BITMAP_FILE compat ioctl
  2006-09-22 15:29 ` [PATCH] md SET_BITMAP_FILE compat ioctl Paul Clements
@ 2006-09-28  7:31   ` Neil Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Neil Brown @ 2006-09-28  7:31 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid

On Friday September 22, paul.clements@steeleye.com wrote:
> The SET_BITMAP_FILE ioctl currently doesn't have a compat entry, so on 
> 64-bit systems a 32-bit mdadm will fail:
> 
> Sep 20 16:34:52 caspian kernel: ioctl32(mdadm:8056): Unknown cmd fd(3) 
> cmd(8004092b){00} arg(00000004) on /dev/md0
> 
> The fix is to build a 64-bit mdadm or apply the following patch (compile 
> tested only).

Thanks. I'll make sure it gets to akpm.

NeilBrown

> 
> --
> Paul
> --- ./include/linux/compat_ioctl.h.orig	2006-09-22 11:14:53.000000000 -0400
> +++ ./include/linux/compat_ioctl.h	2006-09-22 11:15:24.000000000 -0400
> @@ -121,6 +121,7 @@ ULONG_IOCTL(START_ARRAY)
>  COMPATIBLE_IOCTL(STOP_ARRAY)
>  COMPATIBLE_IOCTL(STOP_ARRAY_RO)
>  COMPATIBLE_IOCTL(RESTART_ARRAY_RW)
> +ULONG_IOCTL(SET_BITMAP_FILE)
>  /* DM */
>  COMPATIBLE_IOCTL(DM_VERSION_32)
>  COMPATIBLE_IOCTL(DM_REMOVE_ALL_32)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG/PATCH] md bitmap broken on big endian machines
  2006-09-28  6:34 ` [BUG/PATCH] md bitmap broken on big endian machines Neil Brown
@ 2006-09-28 22:09   ` Michael Tokarev
  2006-09-29  0:48     ` Paul Clements
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Tokarev @ 2006-09-28 22:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: Paul Clements, linux-raid

Neil Brown wrote:
[]
> Use ffz instead of find_first_set to convert multiplier to shift.
> 
> From: Paul Clements <paul.clements@steeleye.com>
> 
> find_first_set doesn't find the least-significant bit on bigendian
> machines, so it is really wrong to use it.
> 
> ffs is closer, but takes an 'int' and we have a 'unsigned long'.
> So use ffz(~X) to convert a chunksize into a chunkshift.

So we don't use ffs(int) for an unsigned value because of int vs
unsigned int, but we use ffz() with negated UNSIGNED.  Looks even
more broken to me, even if it happens to work correctly... ;)

/mjt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG/PATCH] md bitmap broken on big endian machines
  2006-09-28 22:09   ` Michael Tokarev
@ 2006-09-29  0:48     ` Paul Clements
  2006-09-29  7:03       ` Michael Tokarev
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Clements @ 2006-09-29  0:48 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Neil Brown, linux-raid

Michael Tokarev wrote:
> Neil Brown wrote:

>> ffs is closer, but takes an 'int' and we have a 'unsigned long'.
>> So use ffz(~X) to convert a chunksize into a chunkshift.
> 
> So we don't use ffs(int) for an unsigned value because of int vs
> unsigned int, but we use ffz() with negated UNSIGNED.  Looks even
> more broken to me, even if it happens to work correctly... ;)

No, it doesn't matter about the signedness, these are just bit 
operations. The problem is the size (int vs. long), even though in 
practice it's very unlikely you'd ever have a bitmap chunk size that 
exceeded 32 bits. But it's better to be correct and not have to worry 
about it.

--
Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG/PATCH] md bitmap broken on big endian machines
  2006-09-29  0:48     ` Paul Clements
@ 2006-09-29  7:03       ` Michael Tokarev
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Tokarev @ 2006-09-29  7:03 UTC (permalink / raw)
  To: Paul Clements; +Cc: Neil Brown, linux-raid

Paul Clements wrote:
> Michael Tokarev wrote:
>> Neil Brown wrote:
> 
>>> ffs is closer, but takes an 'int' and we have a 'unsigned long'.
>>> So use ffz(~X) to convert a chunksize into a chunkshift.
>>
>> So we don't use ffs(int) for an unsigned value because of int vs
>> unsigned int, but we use ffz() with negated UNSIGNED.  Looks even
>> more broken to me, even if it happens to work correctly... ;)
> 
> No, it doesn't matter about the signedness, these are just bit
> operations. The problem is the size (int vs. long), even though in
> practice it's very unlikely you'd ever have a bitmap chunk size that
> exceeded 32 bits. But it's better to be correct and not have to worry
> about it.

I understand the point, in the first place (I didn't mentioned long
vs int above, however).  The thing is: when reading the code, it looks
just plain wrong.  Esp. since function prototypes aren't here, but for
those ffs(), ffz() etc they're hidden somewhere in include/asm/* (as
they're architecture-dependent), and it's not at all obvious which is
signed and which is unsigned, which is long or int etc.

At the very least, return -ENOCOMMENT :)

/mjt

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-09-29  7:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-21 20:26 [BUG/PATCH] md bitmap broken on big endian machines Paul Clements
2006-09-22 15:29 ` [PATCH] md SET_BITMAP_FILE compat ioctl Paul Clements
2006-09-28  7:31   ` Neil Brown
2006-09-28  6:34 ` [BUG/PATCH] md bitmap broken on big endian machines Neil Brown
2006-09-28 22:09   ` Michael Tokarev
2006-09-29  0:48     ` Paul Clements
2006-09-29  7:03       ` Michael Tokarev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.