* timed out in osd1 error in dmes @ 2012-03-13 7:35 madhusudhana 2012-03-13 20:23 ` Josh Durgin 0 siblings, 1 reply; 7+ messages in thread From: madhusudhana @ 2012-03-13 7:35 UTC (permalink / raw) To: ceph-devel Hi all, The server in which i have mounted file system using mount -t ceph is showing below errors in dmesg. libceph: tid 79987 timed out on osd2, will reset osd libceph: tid 81516 timed out on osd0, will reset osd libceph: tid 81133 timed out on osd1, will reset osd libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2 libceph: tid 80108 timed out on osd2, will reset osd libceph: tid 81134 timed out on osd1, will reset osd libceph: tid 81641 timed out on osd1, will reset osd Is is because of this, write/copy operation in my cluster is slow ? is this a error which needs attention or can be safely ignored ? Thanks Madhusudhan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-13 7:35 timed out in osd1 error in dmes madhusudhana @ 2012-03-13 20:23 ` Josh Durgin 2012-03-14 4:20 ` madhusudhana 0 siblings, 1 reply; 7+ messages in thread From: Josh Durgin @ 2012-03-13 20:23 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On 03/13/2012 12:35 AM, madhusudhana wrote: > Hi all, > The server in which i have mounted file system using mount -t ceph > is showing below errors in dmesg. > > > libceph: tid 79987 timed out on osd2, will reset osd > libceph: tid 81516 timed out on osd0, will reset osd > libceph: tid 81133 timed out on osd1, will reset osd > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2 > libceph: tid 80108 timed out on osd2, will reset osd > libceph: tid 81134 timed out on osd1, will reset osd > libceph: tid 81641 timed out on osd1, will reset osd > > > Is is because of this, write/copy operation in my cluster > is slow ? is this a error which needs attention or can be > safely ignored ? These are usually harmless, and could just mean the osds can't keep up with the requests you're giving them. Given your other issues, it might be a symptom of a problem with your osds. What filesystem are the osds using? Are there any warnings from these filesystems in dmesg? > > Thanks > Madhusudhan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-13 20:23 ` Josh Durgin @ 2012-03-14 4:20 ` madhusudhana 2012-03-14 17:59 ` Sage Weil 0 siblings, 1 reply; 7+ messages in thread From: madhusudhana @ 2012-03-14 4:20 UTC (permalink / raw) To: ceph-devel Josh Durgin <josh.durgin <at> dreamhost.com> writes: > > On 03/13/2012 12:35 AM, madhusudhana wrote: > > Hi all, > > The server in which i have mounted file system using mount -t ceph > > is showing below errors in dmesg. > > > > > > libceph: tid 79987 timed out on osd2, will reset osd > > libceph: tid 81516 timed out on osd0, will reset osd > > libceph: tid 81133 timed out on osd1, will reset osd > > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2 > > libceph: tid 80108 timed out on osd2, will reset osd > > libceph: tid 81134 timed out on osd1, will reset osd > > libceph: tid 81641 timed out on osd1, will reset osd > > > > > > Is is because of this, write/copy operation in my cluster > > is slow ? is this a error which needs attention or can be > > safely ignored ? > > These are usually harmless, and could just mean the osds can't keep up > with the requests you're giving them. Given your other issues, it might > be a symptom of a problem with your osds. > > What filesystem are the osds using? Are there any warnings from these > filesystems in dmesg? All my osd's are using btrfs. below are the dmesg tailed from all osd's ceph-node-6 generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK USB Keyboard] on usb-0000:00:1d.1-1/input1 usb 5-1: USB disconnect, device number 3 device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5 device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5 device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5 device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5 device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5 btrfs: truncated 1 orphans btrfs: truncated 1 orphans ceph-node-7 device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5 device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5 device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5 device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5 device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5 device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5 btrfs: truncated 1 orphans ceph-node-8 usb 5-1: New USB device found, idVendor=0603, idProduct=00f2 usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 5-1: Product: USB Keyboard usb 5-1: Manufacturer: NOVATEK input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5- 1:1.0/input/input3 generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB Keyboard] on usb-0000:00:1d.1-1/input0 input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5- 1:1.1/input/input4 generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK USB Keyboard] on usb-0000:00:1d.1-1/input1 usb 5-1: USB disconnect, device number 2 btrfs: truncated 1 orphans do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1 orphans" error. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-14 4:20 ` madhusudhana @ 2012-03-14 17:59 ` Sage Weil 2012-03-15 7:46 ` madhusudhana 0 siblings, 1 reply; 7+ messages in thread From: Sage Weil @ 2012-03-14 17:59 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On Wed, 14 Mar 2012, madhusudhana wrote: > Josh Durgin <josh.durgin <at> dreamhost.com> writes: > > > > > On 03/13/2012 12:35 AM, madhusudhana wrote: > > > Hi all, > > > The server in which i have mounted file system using mount -t ceph > > > is showing below errors in dmesg. > > > > > > > > > libceph: tid 79987 timed out on osd2, will reset osd > > > libceph: tid 81516 timed out on osd0, will reset osd > > > libceph: tid 81133 timed out on osd1, will reset osd > > > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2 > > > libceph: tid 80108 timed out on osd2, will reset osd > > > libceph: tid 81134 timed out on osd1, will reset osd > > > libceph: tid 81641 timed out on osd1, will reset osd > > > > > > > > > Is is because of this, write/copy operation in my cluster > > > is slow ? is this a error which needs attention or can be > > > safely ignored ? > > > > These are usually harmless, and could just mean the osds can't keep up > > with the requests you're giving them. Given your other issues, it might > > be a symptom of a problem with your osds. > > > > What filesystem are the osds using? Are there any warnings from these > > filesystems in dmesg? > > All my osd's are using btrfs. below are the dmesg tailed from all osd's Heh, I should read my mail in order. It sounds like the cp's are probably slow due to the OSDs. > ceph-node-6 > generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK > USB Keyboard] on usb-0000:00:1d.1-1/input1 > usb 5-1: USB disconnect, device number 3 > device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5 > device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5 > device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5 > device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5 > device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5 > btrfs: truncated 1 orphans > btrfs: truncated 1 orphans These are harmless noise, BTW, you can ignore them. Can you tell us how your OSDs are configured? Where are the data directories and journals located? (The [osd] section of ceph.conf would be helpful.) Another useful piece of information would be the ceph-osd's raw performance writing to the local disk+journal, which you can get with $ ceph tell osd.0 bench You might want to check it for several nodes to see if it's consistent, etc. Thanks! sage > ceph-node-7 > device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5 > device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 > device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5 > device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5 > device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 > device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5 > device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5 > device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4 > device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5 > btrfs: truncated 1 orphans > > ceph-node-8 > usb 5-1: New USB device found, idVendor=0603, idProduct=00f2 > usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 > usb 5-1: Product: USB Keyboard > usb 5-1: Manufacturer: NOVATEK > input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5- > 1:1.0/input/input3 > generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB > Keyboard] on usb-0000:00:1d.1-1/input0 > input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5- > 1:1.1/input/input4 > generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK > USB Keyboard] on usb-0000:00:1d.1-1/input1 > usb 5-1: USB disconnect, device number 2 > btrfs: truncated 1 orphans > > do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1 > orphans" error. > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-14 17:59 ` Sage Weil @ 2012-03-15 7:46 ` madhusudhana 2012-03-15 15:47 ` Sage Weil 0 siblings, 1 reply; 7+ messages in thread From: madhusudhana @ 2012-03-15 7:46 UTC (permalink / raw) To: ceph-devel > > These are harmless noise, BTW, you can ignore them. > > Can you tell us how your OSDs are configured? Where are the data > directories and journals located? (The [osd] section of ceph.conf would > be helpful.) > > Another useful piece of information would be the ceph-osd's raw > performance writing to the local disk+journal, which you can get with > > $ ceph tell osd.0 bench > > You might want to check it for several nodes to see if it's consistent, > etc. > Below are the results from above command run against all osd's 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of 4096 KB in 67.474949 sec at 15540 KB/sec' (0) 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of 4096 KB in 70.815932 sec at 14807 KB/sec' (0) 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of 4096 KB in 60.370233 sec at 17369 KB/sec' (0) Do you see any issues Thanks ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-15 7:46 ` madhusudhana @ 2012-03-15 15:47 ` Sage Weil 2012-03-15 17:02 ` madhusudhana 0 siblings, 1 reply; 7+ messages in thread From: Sage Weil @ 2012-03-15 15:47 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel > > These are harmless noise, BTW, you can ignore them. > > > > Can you tell us how your OSDs are configured? Where are the data > > directories and journals located? (The [osd] section of ceph.conf would > > be helpful.) Can you share your ceph.conf please? > > Another useful piece of information would be the ceph-osd's raw > > performance writing to the local disk+journal, which you can get with > > > > $ ceph tell osd.0 bench > > > > You might want to check it for several nodes to see if it's consistent, > > etc. > > > Below are the results from above command run against all osd's > > > 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of > 4096 KB in 67.474949 sec at 15540 KB/sec' (0) > 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of > 4096 KB in 70.815932 sec at 14807 KB/sec' (0) > 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of > 4096 KB in 60.370233 sec at 17369 KB/sec' (0) This is pretty slow, and probably due to the way your osd journals are configured. Please share your ceph.conf! Thanks- sage ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes 2012-03-15 15:47 ` Sage Weil @ 2012-03-15 17:02 ` madhusudhana 0 siblings, 0 replies; 7+ messages in thread From: madhusudhana @ 2012-03-15 17:02 UTC (permalink / raw) To: ceph-devel > > Can you share your ceph.conf please? > > > > Another useful piece of information would be the ceph-osd's raw > > > performance writing to the local disk+journal, which you can get with > > > > > > $ ceph tell osd.0 bench > > > > > > You might want to check it for several nodes to see if it's consistent, > > > etc. > > > > > Below are the results from above command run against all osd's > > > > > > 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of > > 4096 KB in 67.474949 sec at 15540 KB/sec' (0) > > 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of > > 4096 KB in 70.815932 sec at 14807 KB/sec' (0) > > 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of > > 4096 KB in 60.370233 sec at 17369 KB/sec' (0) > > This is pretty slow, and probably due to the way your osd journals are > configured. Please share your ceph.conf! > Below is my ceph conf file [root@ceph-node-8 ~]# cat /etc/ceph/ceph.conf [global] ;auth supported = cephx keyring = /etc/ceph/admin.keyring debug ms = 1 debug mds = 10 [mon] mon data = /data/mon.$id [mon.a] host = ceph-node-4 mon addr = xx.xx.xx.xx [mon.b] host = ceph-node-5 mon addr = xx.xx.xx.xx [mon.c] host = ceph-node-6 mon addr = xx.xx.xx.xx [mds] keyring = /etc/ceph/keyring.$name [mds.ceph-node-1] host = ceph-node-7 [mds.ceph-node-2] host = ceph-node-8 [osd] osd data = /data/osd.$id keyring = /etc/ceph/keyring.$name osd journal = /journal/osd.$id.journal osd journal size = 10000 debug ms = 1 debug osd = 20 debug filestore = 20 debug journal = 20 [osd.0] host = ceph-node-1 btrfs devs = /dev/sda4 [osd.1] host = ceph-node-2 btrfs devs = /dev/sda4 [osd.2] host = ceph-node-3 btrfs devs = /dev/sda4 To brief, i have different partitions for mounting journal and osd. /journal is used for mounting journal /data is used for mounting osd ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-15 17:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-13 7:35 timed out in osd1 error in dmes madhusudhana 2012-03-13 20:23 ` Josh Durgin 2012-03-14 4:20 ` madhusudhana 2012-03-14 17:59 ` Sage Weil 2012-03-15 7:46 ` madhusudhana 2012-03-15 15:47 ` Sage Weil 2012-03-15 17:02 ` madhusudhana
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.