All of lore.kernel.org
 help / color / mirror / Atom feed
* timed out in osd1 error in dmes
@ 2012-03-13  7:35 madhusudhana
  2012-03-13 20:23 ` Josh Durgin
  0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-13  7:35 UTC (permalink / raw)
  To: ceph-devel

Hi all,
The server in which i have mounted file system using mount -t ceph
is showing below errors in dmesg.


libceph:  tid 79987 timed out on osd2, will reset osd
libceph:  tid 81516 timed out on osd0, will reset osd
libceph:  tid 81133 timed out on osd1, will reset osd
libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
libceph:  tid 80108 timed out on osd2, will reset osd
libceph:  tid 81134 timed out on osd1, will reset osd
libceph:  tid 81641 timed out on osd1, will reset osd


Is is because of this, write/copy operation in my cluster
is slow ? is this a error which needs attention or can be
safely ignored ?

Thanks
Madhusudhan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-13  7:35 timed out in osd1 error in dmes madhusudhana
@ 2012-03-13 20:23 ` Josh Durgin
  2012-03-14  4:20   ` madhusudhana
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Durgin @ 2012-03-13 20:23 UTC (permalink / raw)
  To: madhusudhana; +Cc: ceph-devel

On 03/13/2012 12:35 AM, madhusudhana wrote:
> Hi all,
> The server in which i have mounted file system using mount -t ceph
> is showing below errors in dmesg.
>
>
> libceph:  tid 79987 timed out on osd2, will reset osd
> libceph:  tid 81516 timed out on osd0, will reset osd
> libceph:  tid 81133 timed out on osd1, will reset osd
> libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> libceph:  tid 80108 timed out on osd2, will reset osd
> libceph:  tid 81134 timed out on osd1, will reset osd
> libceph:  tid 81641 timed out on osd1, will reset osd
>
>
> Is is because of this, write/copy operation in my cluster
> is slow ? is this a error which needs attention or can be
> safely ignored ?

These are usually harmless, and could just mean the osds can't keep up 
with the requests you're giving them. Given your other issues, it might 
be a symptom of a problem with your osds.

What filesystem are the osds using? Are there any warnings from these 
filesystems in dmesg?

>
> Thanks
> Madhusudhan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-13 20:23 ` Josh Durgin
@ 2012-03-14  4:20   ` madhusudhana
  2012-03-14 17:59     ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-14  4:20 UTC (permalink / raw)
  To: ceph-devel

Josh Durgin <josh.durgin <at> dreamhost.com> writes:

> 
> On 03/13/2012 12:35 AM, madhusudhana wrote:
> > Hi all,
> > The server in which i have mounted file system using mount -t ceph
> > is showing below errors in dmesg.
> >
> >
> > libceph:  tid 79987 timed out on osd2, will reset osd
> > libceph:  tid 81516 timed out on osd0, will reset osd
> > libceph:  tid 81133 timed out on osd1, will reset osd
> > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> > libceph:  tid 80108 timed out on osd2, will reset osd
> > libceph:  tid 81134 timed out on osd1, will reset osd
> > libceph:  tid 81641 timed out on osd1, will reset osd
> >
> >
> > Is is because of this, write/copy operation in my cluster
> > is slow ? is this a error which needs attention or can be
> > safely ignored ?
> 
> These are usually harmless, and could just mean the osds can't keep up 
> with the requests you're giving them. Given your other issues, it might 
> be a symptom of a problem with your osds.
> 
> What filesystem are the osds using? Are there any warnings from these 
> filesystems in dmesg?

All my osd's are using btrfs.  below are the dmesg tailed from all osd's

ceph-node-6
generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK 
USB Keyboard] on usb-0000:00:1d.1-1/input1
usb 5-1: USB disconnect, device number 3
device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5
btrfs: truncated 1 orphans
btrfs: truncated 1 orphans


ceph-node-7
device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5
btrfs: truncated 1 orphans

ceph-node-8
usb 5-1: New USB device found, idVendor=0603, idProduct=00f2
usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 5-1: Product: USB Keyboard
usb 5-1: Manufacturer: NOVATEK
input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
1:1.0/input/input3
generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB 
Keyboard] on usb-0000:00:1d.1-1/input0
input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
1:1.1/input/input4
generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK 
USB Keyboard] on usb-0000:00:1d.1-1/input1
usb 5-1: USB disconnect, device number 2
btrfs: truncated 1 orphans

do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1 
orphans" error.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-14  4:20   ` madhusudhana
@ 2012-03-14 17:59     ` Sage Weil
  2012-03-15  7:46       ` madhusudhana
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2012-03-14 17:59 UTC (permalink / raw)
  To: madhusudhana; +Cc: ceph-devel

On Wed, 14 Mar 2012, madhusudhana wrote:
> Josh Durgin <josh.durgin <at> dreamhost.com> writes:
> 
> > 
> > On 03/13/2012 12:35 AM, madhusudhana wrote:
> > > Hi all,
> > > The server in which i have mounted file system using mount -t ceph
> > > is showing below errors in dmesg.
> > >
> > >
> > > libceph:  tid 79987 timed out on osd2, will reset osd
> > > libceph:  tid 81516 timed out on osd0, will reset osd
> > > libceph:  tid 81133 timed out on osd1, will reset osd
> > > libceph: skipping osd1 10.25.12.127:6800 seq 1 expected 2
> > > libceph:  tid 80108 timed out on osd2, will reset osd
> > > libceph:  tid 81134 timed out on osd1, will reset osd
> > > libceph:  tid 81641 timed out on osd1, will reset osd
> > >
> > >
> > > Is is because of this, write/copy operation in my cluster
> > > is slow ? is this a error which needs attention or can be
> > > safely ignored ?
> > 
> > These are usually harmless, and could just mean the osds can't keep up 
> > with the requests you're giving them. Given your other issues, it might 
> > be a symptom of a problem with your osds.
> > 
> > What filesystem are the osds using? Are there any warnings from these 
> > filesystems in dmesg?
> 
> All my osd's are using btrfs.  below are the dmesg tailed from all osd's

Heh, I should read my mail in order.  It sounds like the cp's are probably 
slow due to the OSDs.  

> ceph-node-6
> generic-usb 0003:0603:00F2.0004: input,hiddev0: USB HID v1.10 Device [NOVATEK 
> USB Keyboard] on usb-0000:00:1d.1-1/input1
> usb 5-1: USB disconnect, device number 3
> device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
> device fsid aed12ad8-4053-4066-9074-9a9f2419c03f devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 7 /dev/sda5
> device fsid ee29fef4-5e07-4be7-bf2c-592e3b9fa62b devid 1 transid 12 /dev/sda5
> btrfs: truncated 1 orphans
> btrfs: truncated 1 orphans

These are harmless noise, BTW, you can ignore them.

Can you tell us how your OSDs are configured?  Where are the data 
directories and journals located?  (The [osd] section of ceph.conf would 
be helpful.)

Another useful piece of information would be the ceph-osd's raw 
performance writing to the local disk+journal, which you can get with

 $ ceph tell osd.0 bench

You might want to check it for several nodes to see if it's consistent, 
etc.

Thanks!
sage



> ceph-node-7
> device fsid 7baa8339-8d1e-4cca-9e61-c5f9bd4c3ab0 devid 1 transid 10 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
> device fsid 3c3a56cf-2d00-4fea-a49d-c2cb19af1ea2 devid 1 transid 7 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 7 /dev/sda5
> device fsid b8aa714a-347a-4d6c-8bae-8a732bfc380f devid 1 transid 13 /dev/sda4
> device fsid 7c3d2b55-118f-447e-9e65-767005893fec devid 1 transid 12 /dev/sda5
> btrfs: truncated 1 orphans
> 
> ceph-node-8
> usb 5-1: New USB device found, idVendor=0603, idProduct=00f2
> usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> usb 5-1: Product: USB Keyboard
> usb 5-1: Manufacturer: NOVATEK
> input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
> 1:1.0/input/input3
> generic-usb 0003:0603:00F2.0001: input: USB HID v1.10 Keyboard [NOVATEK USB 
> Keyboard] on usb-0000:00:1d.1-1/input0
> input: NOVATEK USB Keyboard as /devices/pci0000:00/0000:00:1d.1/usb5/5-1/5-
> 1:1.1/input/input4
> generic-usb 0003:0603:00F2.0002: input,hiddev0: USB HID v1.10 Device [NOVATEK 
> USB Keyboard] on usb-0000:00:1d.1-1/input1
> usb 5-1: USB disconnect, device number 2
> btrfs: truncated 1 orphans
> 
> do you see any issue with osd? all 3 osd's are showing "btrfs: truncated 1 
> orphans" error.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-14 17:59     ` Sage Weil
@ 2012-03-15  7:46       ` madhusudhana
  2012-03-15 15:47         ` Sage Weil
  0 siblings, 1 reply; 7+ messages in thread
From: madhusudhana @ 2012-03-15  7:46 UTC (permalink / raw)
  To: ceph-devel

> 
> These are harmless noise, BTW, you can ignore them.
> 
> Can you tell us how your OSDs are configured?  Where are the data 
> directories and journals located?  (The [osd] section of ceph.conf would 
> be helpful.)
> 
> Another useful piece of information would be the ceph-osd's raw 
> performance writing to the local disk+journal, which you can get with
> 
>  $ ceph tell osd.0 bench
> 
> You might want to check it for several nodes to see if it's consistent, 
> etc.
> 
Below are the results from above command run against all osd's


2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of 
4096 KB in 67.474949 sec at 15540 KB/sec' (0)
2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of 
4096 KB in 70.815932 sec at 14807 KB/sec' (0)
2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of 
4096 KB in 60.370233 sec at 17369 KB/sec' (0)

Do you see any issues 

Thanks




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-15  7:46       ` madhusudhana
@ 2012-03-15 15:47         ` Sage Weil
  2012-03-15 17:02           ` madhusudhana
  0 siblings, 1 reply; 7+ messages in thread
From: Sage Weil @ 2012-03-15 15:47 UTC (permalink / raw)
  To: madhusudhana; +Cc: ceph-devel

> > These are harmless noise, BTW, you can ignore them.
> > 
> > Can you tell us how your OSDs are configured?  Where are the data 
> > directories and journals located?  (The [osd] section of ceph.conf would 
> > be helpful.)

Can you share your ceph.conf please?

> > Another useful piece of information would be the ceph-osd's raw 
> > performance writing to the local disk+journal, which you can get with
> > 
> >  $ ceph tell osd.0 bench
> > 
> > You might want to check it for several nodes to see if it's consistent, 
> > etc.
> > 
> Below are the results from above command run against all osd's
> 
> 
> 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of 
> 4096 KB in 67.474949 sec at 15540 KB/sec' (0)
> 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of 
> 4096 KB in 70.815932 sec at 14807 KB/sec' (0)
> 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of 
> 4096 KB in 60.370233 sec at 17369 KB/sec' (0)

This is pretty slow, and probably due to the way your osd journals are 
configured.  Please share your ceph.conf!

Thanks-
sage

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: timed out in osd1 error in dmes
  2012-03-15 15:47         ` Sage Weil
@ 2012-03-15 17:02           ` madhusudhana
  0 siblings, 0 replies; 7+ messages in thread
From: madhusudhana @ 2012-03-15 17:02 UTC (permalink / raw)
  To: ceph-devel


> 
> Can you share your ceph.conf please?
> 
> > > Another useful piece of information would be the ceph-osd's raw 
> > > performance writing to the local disk+journal, which you can get with
> > > 
> > >  $ ceph tell osd.0 bench
> > > 
> > > You might want to check it for several nodes to see if it's consistent, 
> > > etc.
> > > 
> > Below are the results from above command run against all osd's
> > 
> > 
> > 2012-03-15 13:06:19.980924 osd.0 -> 'bench: wrote 1024 MB in blocks of 
> > 4096 KB in 67.474949 sec at 15540 KB/sec' (0)
> > 2012-03-15 13:09:20.573176 osd.1 -> 'bench: wrote 1024 MB in blocks of 
> > 4096 KB in 70.815932 sec at 14807 KB/sec' (0)
> > 2012-03-15 13:11:57.895738 osd.2 -> 'bench: wrote 1024 MB in blocks of 
> > 4096 KB in 60.370233 sec at 17369 KB/sec' (0)
> 
> This is pretty slow, and probably due to the way your osd journals are 
> configured.  Please share your ceph.conf!
> 

Below is my ceph conf file

[root@ceph-node-8 ~]# cat /etc/ceph/ceph.conf
[global]
        ;auth supported = cephx
        keyring = /etc/ceph/admin.keyring
        debug ms = 1
        debug mds = 10

[mon]
        mon data = /data/mon.$id

[mon.a]
        host = ceph-node-4
        mon addr = xx.xx.xx.xx

[mon.b]
        host = ceph-node-5
        mon addr = xx.xx.xx.xx

[mon.c]
        host = ceph-node-6
        mon addr = xx.xx.xx.xx
[mds]
        keyring = /etc/ceph/keyring.$name

[mds.ceph-node-1]
        host = ceph-node-7

[mds.ceph-node-2]
        host = ceph-node-8

[osd]
        osd data = /data/osd.$id
        keyring = /etc/ceph/keyring.$name
        osd journal = /journal/osd.$id.journal
        osd journal size = 10000
        debug ms = 1
        debug osd = 20
        debug filestore = 20
        debug journal = 20



[osd.0]
        host = ceph-node-1
        btrfs devs = /dev/sda4

[osd.1]
        host = ceph-node-2
        btrfs devs = /dev/sda4

[osd.2]
        host = ceph-node-3
        btrfs devs = /dev/sda4

To brief, i have different partitions for mounting journal and osd. 

/journal is used for mounting journal 
/data is used for mounting osd



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-15 17:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13  7:35 timed out in osd1 error in dmes madhusudhana
2012-03-13 20:23 ` Josh Durgin
2012-03-14  4:20   ` madhusudhana
2012-03-14 17:59     ` Sage Weil
2012-03-15  7:46       ` madhusudhana
2012-03-15 15:47         ` Sage Weil
2012-03-15 17:02           ` madhusudhana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.