random i/o error without error in dmesg

* random i/o error without error in dmesg
@ 2015-10-26 11:23 Szalma László
  2015-10-26 14:23 ` Marc Joliet
  0 siblings, 1 reply; 16+ messages in thread
From: Szalma László @ 2015-10-26 11:23 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have this error for a time, It's not easy to reproduce, i write 
everything i know at the moment.

I maintain some servers running xen (4.5.1) and gentoo dom0 with recent 
kernels (3.18.*, 4.1.6, 4.2.3, 4.2.4). I use gentoo-sources patchset.
Running xen domu s, for www and mysql.
I have mysql servers in domu with high load (lots of read write). These 
systems are identical in term of configuration and kernel.

Sometimes I got mysql errors randomly (sometimes more than one at a day, 
sometimes one at a week), but it is more frequent on high load.

The mysql errors are because the file cannot be read from the 
filesystem. If i try to run md5sum on it it shows io error.

At this point mysql stop && umount && mount && mysql start solves the 
problem.

calling
echo 3 > /proc/sys/vm/drop_caches
sometimes solves the io error, but not every time. The problem rarely 
randomly fixed without remount.

The problem seems to have no connection to the dom0 kernel and the xen 
version. I have this problem for example on these dom0 -s:

kernel: 3.19.3  xen 4.5.0
kernel: 4.2.3 xen 4.5.1

The problem seems to have started with the kernel 4.0 series, but I am 
not sure. In the summer the load was low, and the problem occured very 
rarely.

In this case of io error:
btrfs scrub finds no error.
no memory or hdd/ssd hardware error (smart, memtest, etc) (not only one 
physical server is affected) and no errors in dmesg at all.
tried different kernel configs, but I don't think I have anything 
extraordinary.
I use deadline scheduler.
I use these mount options:

/dev/xvdb1 on /mnt/mysql_naplo_b2 type btrfs 
(rw,noatime,compress=zlib,nossd,noacl,space_cache,subvolid=5,subvol=/)

I tried to reformat the filesystem with recent btrfs-progs: (and olders 
before)
btrfs-progs v4.2.2
I use default mkfs options (skinny extents)
After format the problem was disappeared for some days. (it seems 
correlation with the age of the filesystem?)
I do manual defragment on the filesystem with a script simply 
recursively check "filefrag" for count the fragmentation and defrag if 
it is more than 50 and the file is larger than 64kbyte. (this sometimes 
lowers the frequency of the problem)
The files unreadable are usually small files, for example:

filefrag:
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD: 2 
extents found
ls -l:
-rw-rw---- 1 mysql mysql 8092 okt   22 08.24 
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD

There is no error in dmesg, no io errors, no kernel panic, etc at all.

The (virtual) servers has 3-4GB of memory, and I use a 2GB tmpfs for the 
temporary tables (this way the physical memory usage is somewhat hectic).

The filesystem has no snapshots, but sometimes (for rebuilding 
replication) I take on, and delete it. (but the problem happens on 
filesystems with no snapshot created ever)

I did not try downgrading the kernel (for 3.18), but I always try to 
upgrade.

I guess this problem has some connection to the memory usage (but there 
is no out of memory).

I am able to try any debug mode if you suggest one, but it's not 
reproducable, it happens randomly. I think there should be some errors 
in the dmesg if I encounter io errors, but I am not sure if this error 
has direct connection for btrfs at all. I didn't try other filesystems. 
The problem was occured with kernel versions: 4.0.1, 4.0.4, 4.1.6, 
4.2.1, 4.2.3, 4.2.4.

I checked the bugzilla, and google for similar problem, but I couldn't 
find any similar.

This problem sometimes (i think it is the same) happen on a www server 
too, with apache log files (they are fragmented heavily), but very 
rarely. I don't have any problem with this configuration on other 
servers even mysql servers with lower load.

I welcome any suggestion:

László Szalma

^ permalink raw reply	[flat|nested] 16+ messages in thread