All of lore.kernel.org
 help / color / mirror / Atom feed
* Custom driver FS brokenness at 4GB?
@ 2015-05-27 13:56 Rob Harris
  2015-05-28 10:59 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Rob Harris @ 2015-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4

Greetings. I have an odd issue and need some ideas of where to go next 
-- I'm out of hair to rip out.

I'm writing a custom block device driver talking to some custom RAID 
hardware (>32TB) using DMA scatter-gather, with no partitions and am 
using make_request() to service all the BIO requests to simplify 
debugging. I have the driver working to the point where using DD against 
the block device seems to work fine (I'm setting iflag|oflag=direct to 
ensure it's writing to the disk). I also have the blk_queue set to only 
request a single 4k I/O per BIO (again to simplify debugging for now.) 
Also, again to debug, I have a mutex wrapping the entire make_request 
call to ensure that only a single request is being serviced at a time. 
So, this should be as "simple" as I can make the environment to debug 
this problem.

Once the driver is loaded, when I try to create a file system (ext4 but 
the same thing happens with xfs) it seems like there is some corruption 
occurring, but only when I set the sector size of the block device over 
4GB. For instance, when I set the size to 4G, I can mkfs.ext4, but after 
2 or 3 mount/umounts the FS refuses to mount anymore and the kernel log 
complains that the journal is missing. This was discovered running this 
loop...

#!/bin/sh
COUNT=4032

while [ 1 ] ; do

figlet ${COUNT}

( umount /mnt ; rmmod smc ) || true
modprobe smc capacity_in_mb=${COUNT} debug=1
mkfs.ext4 -m 0 /dev/smcd

mount /dev/smcd /mnt
cp count_512m.dat /mnt/test
umount /mnt
mount /dev/smcd /mnt
umount /mnt
mount /dev/smcd /mnt
cmp count_512m.dat /mnt/test
umount /mnt
mount /dev/smcd /mnt # ***
sync
umount /mnt
mount /dev/smcd /mnt
sleep 1
umount /mnt

COUNT=$(( COUNT + 64 ))
sleep 1

done

Sometimes I'll get in the kernel log:
May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd): 
ext4_check_descriptors: Checksum for group 0 failed (7009!=0)
May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd): group 
descriptors corrupted!

Others I'll get:
May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs (smcd): no 
journal found


I've seen this loop fail as early as COUNT=4096, but as late as 
COUNT=4220; removing the sync changes the behavior.
When it fails, it usually does so on the 3rd mount (***).
FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048 ); ( 
2048 * 512b (kernel sector) = 1M )

Another example: if I set the sector count of the disk to 16G, I can run 
mkfs.ext4 but the first mount fails and I see May 27 09:07:27 febtober 
kernel: [62653.269387] EXT4-fs (smcd): ext4_check_descriptors: Block 
bitmap for group 0 not in group (block 4294967295)!

But, again, if I set the sector size < 4G, everything seems fine. I can 
currently DD read and write across that 4G boundary without issue -- 
it's ONLY the filesystem accesses. My gut is screaming there's 32/64 bit 
overflow condition somewhere but for the life of me I can't find it. Is 
there something I need to set to tell the block layer I have a 64-bit 
addressible device? set_capacity is always the number of LINUX KERNEL 
sectors (not what I set blk_queue_logical|physical_block_size to) correct?

I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters.

Any help/pointers would be greatly appreciated.

--Rob Harris


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Custom driver FS brokenness at 4GB?
  2015-05-27 13:56 Custom driver FS brokenness at 4GB? Rob Harris
@ 2015-05-28 10:59 ` Jan Kara
  2015-05-28 17:43   ` Andreas Dilger
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2015-05-28 10:59 UTC (permalink / raw)
  To: Rob Harris; +Cc: linux-ext4

On Wed 27-05-15 09:56:29, Rob Harris wrote:
> Greetings. I have an odd issue and need some ideas of where to go
> next -- I'm out of hair to rip out.
> 
> I'm writing a custom block device driver talking to some custom RAID
> hardware (>32TB) using DMA scatter-gather, with no partitions and am
> using make_request() to service all the BIO requests to simplify
> debugging. I have the driver working to the point where using DD
> against the block device seems to work fine (I'm setting
> iflag|oflag=direct to ensure it's writing to the disk). I also have
> the blk_queue set to only request a single 4k I/O per BIO (again to
> simplify debugging for now.) Also, again to debug, I have a mutex
> wrapping the entire make_request call to ensure that only a single
> request is being serviced at a time. So, this should be as "simple"
> as I can make the environment to debug this problem.
> 
> Once the driver is loaded, when I try to create a file system (ext4
> but the same thing happens with xfs) it seems like there is some
> corruption occurring, but only when I set the sector size of the
> block device over 4GB. For instance, when I set the size to 4G, I
> can mkfs.ext4, but after 2 or 3 mount/umounts the FS refuses to
> mount anymore and the kernel log complains that the journal is
> missing. This was discovered running this loop...
  Hard to tell exactly but with 4GB being 32-bit limit, I would first look
for some int / unsigned int number overflow. You could possibly better
debug this when writing some pattern via DD that is different for each
block to verify that each block indeed lands in the expected location...

								Honza
> 
> #!/bin/sh
> COUNT=4032
> 
> while [ 1 ] ; do
> 
> figlet ${COUNT}
> 
> ( umount /mnt ; rmmod smc ) || true
> modprobe smc capacity_in_mb=${COUNT} debug=1
> mkfs.ext4 -m 0 /dev/smcd
> 
> mount /dev/smcd /mnt
> cp count_512m.dat /mnt/test
> umount /mnt
> mount /dev/smcd /mnt
> umount /mnt
> mount /dev/smcd /mnt
> cmp count_512m.dat /mnt/test
> umount /mnt
> mount /dev/smcd /mnt # ***
> sync
> umount /mnt
> mount /dev/smcd /mnt
> sleep 1
> umount /mnt
> 
> COUNT=$(( COUNT + 64 ))
> sleep 1
> 
> done
> 
> Sometimes I'll get in the kernel log:
> May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd):
> ext4_check_descriptors: Checksum for group 0 failed (7009!=0)
> May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd):
> group descriptors corrupted!
> 
> Others I'll get:
> May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs
> (smcd): no journal found
> 
> 
> I've seen this loop fail as early as COUNT=4096, but as late as
> COUNT=4220; removing the sync changes the behavior.
> When it fails, it usually does so on the 3rd mount (***).
> FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048
> ); ( 2048 * 512b (kernel sector) = 1M )
> 
> Another example: if I set the sector count of the disk to 16G, I can
> run mkfs.ext4 but the first mount fails and I see May 27 09:07:27
> febtober kernel: [62653.269387] EXT4-fs (smcd):
> ext4_check_descriptors: Block bitmap for group 0 not in group (block
> 4294967295)!
> 
> But, again, if I set the sector size < 4G, everything seems fine. I
> can currently DD read and write across that 4G boundary without
> issue -- it's ONLY the filesystem accesses. My gut is screaming
> there's 32/64 bit overflow condition somewhere but for the life of
> me I can't find it. Is there something I need to set to tell the
> block layer I have a 64-bit addressible device? set_capacity is
> always the number of LINUX KERNEL sectors (not what I set
> blk_queue_logical|physical_block_size to) correct?
> 
> I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters.
> 
> Any help/pointers would be greatly appreciated.
> 
> --Rob Harris
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Custom driver FS brokenness at 4GB?
  2015-05-28 10:59 ` Jan Kara
@ 2015-05-28 17:43   ` Andreas Dilger
  2015-05-28 18:30     ` Rob Harris
  0 siblings, 1 reply; 4+ messages in thread
From: Andreas Dilger @ 2015-05-28 17:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Rob Harris, linux-ext4

On May 28, 2015, at 4:59 AM, Jan Kara <jack@suse.cz> wrote:
> 
> On Wed 27-05-15 09:56:29, Rob Harris wrote:
>> Greetings. I have an odd issue and need some ideas of where to go
>> next -- I'm out of hair to rip out.
>> 
>> I'm writing a custom block device driver talking to some custom RAID
>> hardware (>32TB) using DMA scatter-gather, with no partitions and am
>> using make_request() to service all the BIO requests to simplify
>> debugging. I have the driver working to the point where using DD
>> against the block device seems to work fine (I'm setting
>> iflag|oflag=direct to ensure it's writing to the disk). I also have
>> the blk_queue set to only request a single 4k I/O per BIO (again to
>> simplify debugging for now.) Also, again to debug, I have a mutex
>> wrapping the entire make_request call to ensure that only a single
>> request is being serviced at a time. So, this should be as "simple"
>> as I can make the environment to debug this problem.
>> 
>> Once the driver is loaded, when I try to create a file system (ext4
>> but the same thing happens with xfs) it seems like there is some
>> corruption occurring, but only when I set the sector size of the
>> block device over 4GB. For instance, when I set the size to 4G, I
>> can mkfs.ext4, but after 2 or 3 mount/umounts the FS refuses to
>> mount anymore and the kernel log complains that the journal is
>> missing. This was discovered running this loop...
>  Hard to tell exactly but with 4GB being 32-bit limit, I would first look
> for some int / unsigned int number overflow. You could possibly better
> debug this when writing some pattern via DD that is different for each
> block to verify that each block indeed lands in the expected location...

We have a tool "llverdev" which does exactly this - write a pattern
to each block in the block device (or in sparse regions covering the
device) with a timestamp and block number to track down sources of
block addressing errors:

http://git.hpdd.intel.com/fs/lustre-release.git/blob/HEAD:/lustre/utils/llverdev.c

Cheers, Andreas

> 								Honza
>> 
>> #!/bin/sh
>> COUNT=4032
>> 
>> while [ 1 ] ; do
>> 
>> figlet ${COUNT}
>> 
>> ( umount /mnt ; rmmod smc ) || true
>> modprobe smc capacity_in_mb=${COUNT} debug=1
>> mkfs.ext4 -m 0 /dev/smcd
>> 
>> mount /dev/smcd /mnt
>> cp count_512m.dat /mnt/test
>> umount /mnt
>> mount /dev/smcd /mnt
>> umount /mnt
>> mount /dev/smcd /mnt
>> cmp count_512m.dat /mnt/test
>> umount /mnt
>> mount /dev/smcd /mnt # ***
>> sync
>> umount /mnt
>> mount /dev/smcd /mnt
>> sleep 1
>> umount /mnt
>> 
>> COUNT=$(( COUNT + 64 ))
>> sleep 1
>> 
>> done
>> 
>> Sometimes I'll get in the kernel log:
>> May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd):
>> ext4_check_descriptors: Checksum for group 0 failed (7009!=0)
>> May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd):
>> group descriptors corrupted!
>> 
>> Others I'll get:
>> May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs
>> (smcd): no journal found
>> 
>> 
>> I've seen this loop fail as early as COUNT=4096, but as late as
>> COUNT=4220; removing the sync changes the behavior.
>> When it fails, it usually does so on the 3rd mount (***).
>> FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048
>> ); ( 2048 * 512b (kernel sector) = 1M )
>> 
>> Another example: if I set the sector count of the disk to 16G, I can
>> run mkfs.ext4 but the first mount fails and I see May 27 09:07:27
>> febtober kernel: [62653.269387] EXT4-fs (smcd):
>> ext4_check_descriptors: Block bitmap for group 0 not in group (block
>> 4294967295)!
>> 
>> But, again, if I set the sector size < 4G, everything seems fine. I
>> can currently DD read and write across that 4G boundary without
>> issue -- it's ONLY the filesystem accesses. My gut is screaming
>> there's 32/64 bit overflow condition somewhere but for the life of
>> me I can't find it. Is there something I need to set to tell the
>> block layer I have a 64-bit addressible device? set_capacity is
>> always the number of LINUX KERNEL sectors (not what I set
>> blk_queue_logical|physical_block_size to) correct?
>> 
>> I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters.
>> 
>> Any help/pointers would be greatly appreciated.
>> 
>> --Rob Harris
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Custom driver FS brokenness at 4GB?
  2015-05-28 17:43   ` Andreas Dilger
@ 2015-05-28 18:30     ` Rob Harris
  0 siblings, 0 replies; 4+ messages in thread
From: Rob Harris @ 2015-05-28 18:30 UTC (permalink / raw)
  To: Andreas Dilger, Jan Kara; +Cc: linux-ext4

Thanks for the pointers everyone. After further testing and code review, 
I was boneheadedly truncating a u64 to a u32 for the sector address as 
part of a function signature with an obscured typedef.

*facepalm*

All seems well now. Thanks for the help!
-R

On 05/28/2015 01:43 PM, Andreas Dilger wrote:
> On May 28, 2015, at 4:59 AM, Jan Kara <jack@suse.cz> wrote:
>> On Wed 27-05-15 09:56:29, Rob Harris wrote:
>>> Greetings. I have an odd issue and need some ideas of where to go
>>> next -- I'm out of hair to rip out.
>>>
>>> I'm writing a custom block device driver talking to some custom RAID
>>> hardware (>32TB) using DMA scatter-gather, with no partitions and am
>>> using make_request() to service all the BIO requests to simplify
>>> debugging. I have the driver working to the point where using DD
>>> against the block device seems to work fine (I'm setting
>>> iflag|oflag=direct to ensure it's writing to the disk). I also have
>>> the blk_queue set to only request a single 4k I/O per BIO (again to
>>> simplify debugging for now.) Also, again to debug, I have a mutex
>>> wrapping the entire make_request call to ensure that only a single
>>> request is being serviced at a time. So, this should be as "simple"
>>> as I can make the environment to debug this problem.
>>>
>>> Once the driver is loaded, when I try to create a file system (ext4
>>> but the same thing happens with xfs) it seems like there is some
>>> corruption occurring, but only when I set the sector size of the
>>> block device over 4GB. For instance, when I set the size to 4G, I
>>> can mkfs.ext4, but after 2 or 3 mount/umounts the FS refuses to
>>> mount anymore and the kernel log complains that the journal is
>>> missing. This was discovered running this loop...
>>   Hard to tell exactly but with 4GB being 32-bit limit, I would first look
>> for some int / unsigned int number overflow. You could possibly better
>> debug this when writing some pattern via DD that is different for each
>> block to verify that each block indeed lands in the expected location...
> We have a tool "llverdev" which does exactly this - write a pattern
> to each block in the block device (or in sparse regions covering the
> device) with a timestamp and block number to track down sources of
> block addressing errors:
>
> http://git.hpdd.intel.com/fs/lustre-release.git/blob/HEAD:/lustre/utils/llverdev.c
>
> Cheers, Andreas
>
>> 								Honza
>>> #!/bin/sh
>>> COUNT=4032
>>>
>>> while [ 1 ] ; do
>>>
>>> figlet ${COUNT}
>>>
>>> ( umount /mnt ; rmmod smc ) || true
>>> modprobe smc capacity_in_mb=${COUNT} debug=1
>>> mkfs.ext4 -m 0 /dev/smcd
>>>
>>> mount /dev/smcd /mnt
>>> cp count_512m.dat /mnt/test
>>> umount /mnt
>>> mount /dev/smcd /mnt
>>> umount /mnt
>>> mount /dev/smcd /mnt
>>> cmp count_512m.dat /mnt/test
>>> umount /mnt
>>> mount /dev/smcd /mnt # ***
>>> sync
>>> umount /mnt
>>> mount /dev/smcd /mnt
>>> sleep 1
>>> umount /mnt
>>>
>>> COUNT=$(( COUNT + 64 ))
>>> sleep 1
>>>
>>> done
>>>
>>> Sometimes I'll get in the kernel log:
>>> May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd):
>>> ext4_check_descriptors: Checksum for group 0 failed (7009!=0)
>>> May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd):
>>> group descriptors corrupted!
>>>
>>> Others I'll get:
>>> May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs
>>> (smcd): no journal found
>>>
>>>
>>> I've seen this loop fail as early as COUNT=4096, but as late as
>>> COUNT=4220; removing the sync changes the behavior.
>>> When it fails, it usually does so on the 3rd mount (***).
>>> FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048
>>> ); ( 2048 * 512b (kernel sector) = 1M )
>>>
>>> Another example: if I set the sector count of the disk to 16G, I can
>>> run mkfs.ext4 but the first mount fails and I see May 27 09:07:27
>>> febtober kernel: [62653.269387] EXT4-fs (smcd):
>>> ext4_check_descriptors: Block bitmap for group 0 not in group (block
>>> 4294967295)!
>>>
>>> But, again, if I set the sector size < 4G, everything seems fine. I
>>> can currently DD read and write across that 4G boundary without
>>> issue -- it's ONLY the filesystem accesses. My gut is screaming
>>> there's 32/64 bit overflow condition somewhere but for the life of
>>> me I can't find it. Is there something I need to set to tell the
>>> block layer I have a 64-bit addressible device? set_capacity is
>>> always the number of LINUX KERNEL sectors (not what I set
>>> blk_queue_logical|physical_block_size to) correct?
>>>
>>> I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters.
>>>
>>> Any help/pointers would be greatly appreciated.
>>>
>>> --Rob Harris
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> -- 
>> Jan Kara <jack@suse.cz>
>> SUSE Labs, CR
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Cheers, Andreas
>
>
>
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-28 18:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27 13:56 Custom driver FS brokenness at 4GB? Rob Harris
2015-05-28 10:59 ` Jan Kara
2015-05-28 17:43   ` Andreas Dilger
2015-05-28 18:30     ` Rob Harris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.