* timed out in osd1 error in dmes
@ 2012-03-13 7:35 madhusudhana
2012-03-13 20:23 ` Josh Durgin
0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-13 7:35 UTC (permalink / raw)
To: ceph-devel
Hi all,
The server in which i have mounted file system using mount -t ceph
is showing below errors in dmesg.
libceph: tid 79987 timed out on osd2, will reset osd
libceph: tid 81516 timed out on osd0, will reset osd
libceph: tid 81133 timed out on osd1, will reset osd
libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
libceph: tid 80108 timed out on osd2, will reset osd
libceph: tid 81134 timed out on osd1, will reset osd
libceph: tid 81641 timed out on osd1, will reset osd
Is is because of this, write/copy operation in my cluster
is slow ? is this a error which needs attention or can be
safely ignored ?
Thanks
Madhusudhan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-13 7:35 timed out in osd1 error in dmes madhusudhana
@ 2012-03-13 20:23 ` Josh Durgin
2012-03-14 4:20 ` madhusudhana
0 siblings, 1 reply; 7+ messages in thread
From: Josh Durgin @ 2012-03-13 20:23 UTC (permalink / raw)
To: madhusudhana; +Cc: ceph-devel
On 03/13/2012 12:35 AM, madhusudhana wrote:
> Hi all,
> The server in which i have mounted file system using mount -t ceph
> is showing below errors in dmesg.
>
>
> libceph: tid 79987 timed out on osd2, will reset osd
> libceph: tid 81516 timed out on osd0, will reset osd
> libceph: tid 81133 timed out on osd1, will reset osd
> libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> libceph: tid 80108 timed out on osd2, will reset osd
> libceph: tid 81134 timed out on osd1, will reset osd
> libceph: tid 81641 timed out on osd1, will reset osd
>
>
> Is is because of this, write/copy operation in my cluster
> is slow ? is this a error which needs attention or can be
> safely ignored ?
These are usually harmless, and could just mean the osds can't keep up
with the requests you're giving them. Given your other issues, it might
be a symptom of a problem with your osds.
What filesystem are the osds using? Are there any warnings from these
filesystems in dmesg?
>
> Thanks
> Madhusudhan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-13 20:23 ` Josh Durgin
@ 2012-03-14 4:20 ` madhusudhana
2012-03-14 17:59 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-14 4:20 UTC (permalink / raw)
To: ceph-devel
Josh Durgin <josh.durgin <at> dreamhost.com> writes:
>
> On 03/13/2012 12:35 AM, madhusudhana wrote:
> > Hi all,
> > The server in which i have mounted file system using mount -t ceph
> > is showing below errors in dmesg.
> >
> >
> > libceph: tid 79987 timed out on osd2, will reset osd
> > libceph: tid 81516 timed out on osd0, will reset osd
> > libceph: tid 81133 timed out on osd1, will reset osd
> > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> > libceph: tid 80108 timed out on osd2, will reset osd
> > libceph: tid 81134 timed out on osd1, will reset osd
> > libceph: tid 81641 timed out on osd1, will reset osd
> >
> >
> > Is is because of this, write/copy operation in my cluster
> > is slow ? is this a error which needs attention or can be
> > safely ignored ?
>
> These are usually harmless, and could just mean the osds can't keep up
> with the requests you're giving them. Given your other issues, it might
> be a symptom of a problem with your osds.
>
> What filesystem are the osds using? Are there any warnings from these
> filesystems in dmesg?
All my osd's are using btrfs. below are the dmesg tailed from all osd's
ceph-node-6
generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK
USB Keyboard] on usb-0000:00:1d.1-1/input1
usb 5-1: USB disconnect, device number 3
device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5
btrfs: truncated 1 orphans
btrfs: truncated 1 orphans
ceph-node-7
device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5
btrfs: truncated 1 orphans
ceph-node-8
usb 5-1: New USB device found, idVendor=0603, idProduct=00f2
usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 5-1: Product: USB Keyboard
usb 5-1: Manufacturer: NOVATEK
input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
1:1.0/input/input3
generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB
Keyboard] on usb-0000:00:1d.1-1/input0
input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
1:1.1/input/input4
generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK
USB Keyboard] on usb-0000:00:1d.1-1/input1
usb 5-1: USB disconnect, device number 2
btrfs: truncated 1 orphans
do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1
orphans" error.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-14 4:20 ` madhusudhana
@ 2012-03-14 17:59 ` Sage Weil
2012-03-15 7:46 ` madhusudhana
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2012-03-14 17:59 UTC (permalink / raw)
To: madhusudhana; +Cc: ceph-devel
On Wed, 14 Mar 2012, madhusudhana wrote:
> Josh Durgin <josh.durgin <at> dreamhost.com> writes:
>
> >
> > On 03/13/2012 12:35 AM, madhusudhana wrote:
> > > Hi all,
> > > The server in which i have mounted file system using mount -t ceph
> > > is showing below errors in dmesg.
> > >
> > >
> > > libceph: tid 79987 timed out on osd2, will reset osd
> > > libceph: tid 81516 timed out on osd0, will reset osd
> > > libceph: tid 81133 timed out on osd1, will reset osd
> > > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> > > libceph: tid 80108 timed out on osd2, will reset osd
> > > libceph: tid 81134 timed out on osd1, will reset osd
> > > libceph: tid 81641 timed out on osd1, will reset osd
> > >
> > >
> > > Is is because of this, write/copy operation in my cluster
> > > is slow ? is this a error which needs attention or can be
> > > safely ignored ?
> >
> > These are usually harmless, and could just mean the osds can't keep up
> > with the requests you're giving them. Given your other issues, it might
> > be a symptom of a problem with your osds.
> >
> > What filesystem are the osds using? Are there any warnings from these
> > filesystems in dmesg?
>
> All my osd's are using btrfs. below are the dmesg tailed from all osd's
Heh, I should read my mail in order. It sounds like the cp's are probably
slow due to the OSDs.
> ceph-node-6
> generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK
> USB Keyboard] on usb-0000:00:1d.1-1/input1
> usb 5-1: USB disconnect, device number 3
> device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
> device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5
> btrfs: truncated 1 orphans
> btrfs: truncated 1 orphans
These are harmless noise, BTW, you can ignore them.
Can you tell us how your OSDs are configured? Where are the data
directories and journals located? (The [osd] section of ceph.conf would
be helpful.)
Another useful piece of information would be the ceph-osd's raw
performance writing to the local disk+journal, which you can get with
$ ceph tell osd.0 bench
You might want to check it for several nodes to see if it's consistent,
etc.
Thanks!
sage
> ceph-node-7
> device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
> device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5
> btrfs: truncated 1 orphans
>
> ceph-node-8
> usb 5-1: New USB device found, idVendor=0603, idProduct=00f2
> usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> usb 5-1: Product: USB Keyboard
> usb 5-1: Manufacturer: NOVATEK
> input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
> 1:1.0/input/input3
> generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB
> Keyboard] on usb-0000:00:1d.1-1/input0
> input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
> 1:1.1/input/input4
> generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK
> USB Keyboard] on usb-0000:00:1d.1-1/input1
> usb 5-1: USB disconnect, device number 2
> btrfs: truncated 1 orphans
>
> do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1
> orphans" error.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-14 17:59 ` Sage Weil
@ 2012-03-15 7:46 ` madhusudhana
2012-03-15 15:47 ` Sage Weil
0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-15 7:46 UTC (permalink / raw)
To: ceph-devel
>
> These are harmless noise, BTW, you can ignore them.
>
> Can you tell us how your OSDs are configured? Where are the data
> directories and journals located? (The [osd] section of ceph.conf would
> be helpful.)
>
> Another useful piece of information would be the ceph-osd's raw
> performance writing to the local disk+journal, which you can get with
>
> $ ceph tell osd.0 bench
>
> You might want to check it for several nodes to see if it's consistent,
> etc.
>
Below are the results from above command run against all osd's
2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of
4096 KB in 67.474949 sec at 15540 KB/sec' (0)
2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of
4096 KB in 70.815932 sec at 14807 KB/sec' (0)
2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of
4096 KB in 60.370233 sec at 17369 KB/sec' (0)
Do you see any issues
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-15 7:46 ` madhusudhana
@ 2012-03-15 15:47 ` Sage Weil
2012-03-15 17:02 ` madhusudhana
0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2012-03-15 15:47 UTC (permalink / raw)
To: madhusudhana; +Cc: ceph-devel
> > These are harmless noise, BTW, you can ignore them.
> >
> > Can you tell us how your OSDs are configured? Where are the data
> > directories and journals located? (The [osd] section of ceph.conf would
> > be helpful.)
Can you share your ceph.conf please?
> > Another useful piece of information would be the ceph-osd's raw
> > performance writing to the local disk+journal, which you can get with
> >
> > $ ceph tell osd.0 bench
> >
> > You might want to check it for several nodes to see if it's consistent,
> > etc.
> >
> Below are the results from above command run against all osd's
>
>
> 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of
> 4096 KB in 67.474949 sec at 15540 KB/sec' (0)
> 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of
> 4096 KB in 70.815932 sec at 14807 KB/sec' (0)
> 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of
> 4096 KB in 60.370233 sec at 17369 KB/sec' (0)
This is pretty slow, and probably due to the way your osd journals are
configured. Please share your ceph.conf!
Thanks-
sage
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: timed out in osd1 error in dmes
2012-03-15 15:47 ` Sage Weil
@ 2012-03-15 17:02 ` madhusudhana
0 siblings, 0 replies; 7+ messages in thread
From: madhusudhana @ 2012-03-15 17:02 UTC (permalink / raw)
To: ceph-devel
>
> Can you share your ceph.conf please?
>
> > > Another useful piece of information would be the ceph-osd's raw
> > > performance writing to the local disk+journal, which you can get with
> > >
> > > $ ceph tell osd.0 bench
> > >
> > > You might want to check it for several nodes to see if it's consistent,
> > > etc.
> > >
> > Below are the results from above command run against all osd's
> >
> >
> > 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of
> > 4096 KB in 67.474949 sec at 15540 KB/sec' (0)
> > 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of
> > 4096 KB in 70.815932 sec at 14807 KB/sec' (0)
> > 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of
> > 4096 KB in 60.370233 sec at 17369 KB/sec' (0)
>
> This is pretty slow, and probably due to the way your osd journals are
> configured. Please share your ceph.conf!
>
Below is my ceph conf file
[root@ceph-node-8 ~]# cat /etc/ceph/ceph.conf
[global]
;auth supported = cephx
keyring = /etc/ceph/admin.keyring
debug ms = 1
debug mds = 10
[mon]
mon data = /data/mon.$id
[mon.a]
host = ceph-node-4
mon addr = xx.xx.xx.xx
[mon.b]
host = ceph-node-5
mon addr = xx.xx.xx.xx
[mon.c]
host = ceph-node-6
mon addr = xx.xx.xx.xx
[mds]
keyring = /etc/ceph/keyring.$name
[mds.ceph-node-1]
host = ceph-node-7
[mds.ceph-node-2]
host = ceph-node-8
[osd]
osd data = /data/osd.$id
keyring = /etc/ceph/keyring.$name
osd journal = /journal/osd.$id.journal
osd journal size = 10000
debug ms = 1
debug osd = 20
debug filestore = 20
debug journal = 20
[osd.0]
host = ceph-node-1
btrfs devs = /dev/sda4
[osd.1]
host = ceph-node-2
btrfs devs = /dev/sda4
[osd.2]
host = ceph-node-3
btrfs devs = /dev/sda4
To brief, i have different partitions for mounting journal and osd.
/journal is used for mounting journal
/data is used for mounting osd
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-15 17:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13 7:35 timed out in osd1 error in dmes madhusudhana
2012-03-13 20:23 ` Josh Durgin
2012-03-14 4:20 ` madhusudhana
2012-03-14 17:59 ` Sage Weil
2012-03-15 7:46 ` madhusudhana
2012-03-15 15:47 ` Sage Weil
2012-03-15 17:02 ` madhusudhana
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.