All of lore.kernel.org
 help / color / mirror / Atom feed
* osd crashed after adding new osd
@ 2010-08-10 13:09 Henry C Chang
  2010-08-10 15:42 ` Wido den Hollander
  0 siblings, 1 reply; 6+ messages in thread
From: Henry C Chang @ 2010-08-10 13:09 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

Hi,

I have a ceph cluster: 3 (mon+osd) and 2 (mds).
When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
The error logs are attached.

My procedure to add the 4th osd is:

add [osd3] in the conf file: /etc/ceph/ceph.conf
ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
/tmp/crush.new
ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
/etc/init.d/ceph -c
/etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd

Is my procedure to add an osd incorrect?

Thanks,
Henry

[-- Attachment #2: osd.0.log --]
[-- Type: application/octet-stream, Size: 9436 bytes --]

10.08.10_08:26:52.395879 --- 2579 appending to log /var/log/ceph/osd.0.log ---
ceph version 0.22~rc ()
10.08.10_08:26:52.397723 7fb0857f0720 filestore(/spare/osd0) mount detected btrfs
10.08.10_08:26:52.397825 7fb0857f0720 filestore(/spare/osd0) mount btrfs CLONE_RANGE ioctl is supported
10.08.10_08:26:52.446427 7fb0857f0720 filestore(/spare/osd0) mount btrfs SNAP_CREATE is supported
10.08.10_08:26:52.553023 7fb0857f0720 filestore(/spare/osd0) mount btrfs SNAP_DESTROY is supported
10.08.10_08:26:52.553186 7fb0857f0720 filestore(/spare/osd0) mount found snaps <>
10.08.10_08:26:52.553557 7fb0857f0720 journal read_entry 4096 : seq 1 230 bytes
10.08.10_08:26:52.554123 7fb0857f0720 filestore(/spare/osd0) parse . -> meta = 0
10.08.10_08:26:52.554157 7fb0857f0720 filestore(/spare/osd0) parse .. -> meta = 0
10.08.10_08:26:52.554170 7fb0857f0720 filestore(/spare/osd0) parse commit_op_seq -> meta = 0
10.08.10_08:26:52.554181 7fb0857f0720 filestore(/spare/osd0) parse meta -> meta = 1
10.08.10_08:26:52.554193 7fb0857f0720 filestore(/spare/osd0) parse temp -> temp = 1
10.08.10_08:26:52.555618 7fb0857ef710 -- 0.0.0.0:6800/2579 >> 192.168.0.104:6789/0 pipe(0x1260b30 sd=-1 pgs=0 cs=0 l=0).fault first fault
10.08.10_08:26:55.554764 7fb07aae1710 -- 0.0.0.0:6800/2579 >> 192.168.0.105:6789/0 pipe(0x7fb074000b20 sd=-1 pgs=0 cs=0 l=0).fault first fault
10.08.10_08:26:59.949737 7fb069cf7710 -- 0.0.0.0:6801/2579 >> 192.168.0.103:6801/2735 pipe(0x7fb060000fc0 sd=20 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/2735 not 192.168.0.103:6801/2735 - presumably this is the same node!
10.08.10_08:44:39.257636 7fb0698f3710 -- 192.168.0.105:6800/2579 >> 192.168.0.108:0/2546804884 pipe(0x7fb04c0012c0 sd=19 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:58262/0)
10.08.10_08:47:08.044560 7fb0818ec710 journal check_for_full at 17854464 : JOURNAL FULL 17854464 >= 3805183 (max_size 104857600 start 21659648)
10.08.10_08:47:22.278966 7fb0818ec710 journal check_for_full at 98811904 : JOURNAL FULL 98811904 >= 3993599 (max_size 104857600 start 102805504)
10.08.10_08:47:26.565825 7fb0818ec710 journal check_for_full at 36401152 : JOURNAL FULL 36401152 >= 3993599 (max_size 104857600 start 40394752)
10.08.10_08:47:27.250480 7fb0818ec710 journal check_for_full at 86831104 : JOURNAL FULL 86831104 >= 3993599 (max_size 104857600 start 90824704)
10.08.10_08:47:40.729284 7fb0818ec710 journal check_for_full at 8527872 : JOURNAL FULL 8527872 >= 3993599 (max_size 104857600 start 12521472)
10.08.10_08:47:48.084289 7fb0818ec710 journal check_for_full at 92786688 : JOURNAL FULL 92786688 >= 3993599 (max_size 104857600 start 96780288)
10.08.10_08:53:00.317371 7fb0698f3710 -- 192.168.0.105:6800/2579 >> 192.168.0.108:0/2546804884 pipe(0x7fb04c0012c0 sd=19 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:41058/0)
10.08.10_08:54:37.119427 7fb0818ec710 journal check_for_full at 76455936 : JOURNAL FULL 76455936 >= 3993599 (max_size 104857600 start 80449536)
10.08.10_08:54:37.878130 7fb0818ec710 journal check_for_full at 22032384 : JOURNAL FULL 22032384 >= 3993599 (max_size 104857600 start 26025984)
10.08.10_08:54:38.803107 7fb0818ec710 journal check_for_full at 72462336 : JOURNAL FULL 72462336 >= 3993599 (max_size 104857600 start 76455936)
10.08.10_08:54:40.499484 7fb0818ec710 journal check_for_full at 68476928 : JOURNAL FULL 68476928 >= 3985407 (max_size 104857600 start 72462336)
10.08.10_08:54:42.140118 7fb0818ec710 journal check_for_full at 64528384 : JOURNAL FULL 64528384 >= 3948543 (max_size 104857600 start 68476928)
10.08.10_08:55:04.093597 7fb0818ec710 journal check_for_full at 54763520 : JOURNAL FULL 54763520 >= 3944447 (max_size 104857600 start 58707968)
10.08.10_08:55:10.596337 7fb0818ec710 journal check_for_full at 101199872 : JOURNAL FULL 101199872 >= 3993599 (max_size 104857600 start 339968)
10.08.10_08:55:12.089717 7fb0818ec710 journal check_for_full at 97206272 : JOURNAL FULL 97206272 >= 3993599 (max_size 104857600 start 101199872)
10.08.10_08:55:12.849763 7fb0818ec710 journal check_for_full at 42835968 : JOURNAL FULL 42835968 >= 3940351 (max_size 104857600 start 46776320)
10.08.10_08:55:15.167950 7fb0818ec710 journal check_for_full at 89272320 : JOURNAL FULL 89272320 >= 3993599 (max_size 104857600 start 93265920)
10.08.10_08:55:17.597263 7fb0818ec710 journal check_for_full at 30855168 : JOURNAL FULL 30855168 >= 3993599 (max_size 104857600 start 34848768)
10.08.10_08:55:18.365782 7fb0818ec710 journal check_for_full at 81285120 : JOURNAL FULL 81285120 >= 3993599 (max_size 104857600 start 85278720)
10.08.10_08:55:20.663808 7fb0818ec710 journal check_for_full at 22953984 : JOURNAL FULL 22953984 >= 3993599 (max_size 104857600 start 26947584)
10.08.10_08:55:22.156673 7fb0818ec710 journal check_for_full at 18960384 : JOURNAL FULL 18960384 >= 3993599 (max_size 104857600 start 22953984)
10.08.10_08:55:22.899287 7fb0818ec710 journal check_for_full at 69390336 : JOURNAL FULL 69390336 >= 3993599 (max_size 104857600 start 73383936)
10.08.10_08:55:29.493664 7fb0818ec710 journal check_for_full at 91410432 : JOURNAL FULL 91410432 >= 3993599 (max_size 104857600 start 95404032)
10.08.10_08:55:41.042623 7fb0818ec710 journal check_for_full at 58859520 : JOURNAL FULL 58859520 >= 3444735 (max_size 104857600 start 62304256)
10.08.10_08:55:48.235569 7fb0818ec710 journal check_for_full at 38264832 : JOURNAL FULL 38264832 >= 3993599 (max_size 104857600 start 42258432)
10.08.10_08:55:49.715487 7fb0818ec710 journal check_for_full at 34316288 : JOURNAL FULL 34316288 >= 3948543 (max_size 104857600 start 38264832)
10.08.10_08:55:51.203318 7fb0818ec710 journal check_for_full at 30322688 : JOURNAL FULL 30322688 >= 3993599 (max_size 104857600 start 34316288)
10.08.10_08:55:52.583301 7fb0818ec710 journal check_for_full at 26329088 : JOURNAL FULL 26329088 >= 3993599 (max_size 104857600 start 30322688)
10.08.10_08:55:55.934880 7fb0818ec710 journal check_for_full at 18386944 : JOURNAL FULL 18386944 >= 3948543 (max_size 104857600 start 22335488)
10.08.10_08:55:56.665266 7fb0818ec710 journal check_for_full at 68816896 : JOURNAL FULL 68816896 >= 3993599 (max_size 104857600 start 72810496)
10.08.10_08:55:57.480233 7fb0818ec710 journal check_for_full at 14393344 : JOURNAL FULL 14393344 >= 3993599 (max_size 104857600 start 18386944)
10.08.10_08:55:58.271648 7fb0818ec710 journal check_for_full at 64880640 : JOURNAL FULL 64880640 >= 3936255 (max_size 104857600 start 68816896)
10.08.10_08:55:59.809025 7fb0818ec710 journal check_for_full at 60887040 : JOURNAL FULL 60887040 >= 3993599 (max_size 104857600 start 64880640)
10.08.10_08:56:01.297314 7fb0818ec710 journal check_for_full at 56893440 : JOURNAL FULL 56893440 >= 3993599 (max_size 104857600 start 60887040)
10.08.10_08:56:03.829608 7fb0818ec710 journal check_for_full at 103329792 : JOURNAL FULL 103329792 >= 3993599 (max_size 104857600 start 2469888)
10.08.10_08:56:04.630210 7fb0818ec710 journal check_for_full at 48906240 : JOURNAL FULL 48906240 >= 3993599 (max_size 104857600 start 52899840)
10.08.10_08:56:05.418736 7fb0818ec710 journal check_for_full at 99336192 : JOURNAL FULL 99336192 >= 3993599 (max_size 104857600 start 103329792)
10.08.10_08:56:07.615618 7fb0818ec710 journal check_for_full at 40919040 : JOURNAL FULL 40919040 >= 3993599 (max_size 104857600 start 44912640)
10.08.10_08:56:10.591441 7fb0818ec710 journal check_for_full at 32931840 : JOURNAL FULL 32931840 >= 3993599 (max_size 104857600 start 36925440)
10.08.10_08:56:11.359894 7fb0818ec710 journal check_for_full at 83361792 : JOURNAL FULL 83361792 >= 3993599 (max_size 104857600 start 87355392)
10.08.10_08:56:12.139021 7fb0818ec710 journal check_for_full at 28946432 : JOURNAL FULL 28946432 >= 3985407 (max_size 104857600 start 32931840)
10.08.10_08:58:54.735354 7fb0698f3710 -- 192.168.0.105:6800/2579 >> 192.168.0.108:0/2546804884 pipe(0x7fb04c0012c0 sd=19 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:58449/0)
10.08.10_08:58:57.016447 7fb0818ec710 journal check_for_full at 114688 : JOURNAL FULL 114688 >= 3952639 (max_size 104857600 start 4067328)
10.08.10_08:59:07.159725 7fb0818ec710 journal check_for_full at 26464256 : JOURNAL FULL 26464256 >= 3911679 (max_size 104857600 start 30375936)
10.08.10_09:14:51.934207 7fb0698f3710 -- 192.168.0.105:6800/2579 >> 192.168.0.108:0/2546804884 pipe(0x7fb04c0012c0 sd=19 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:35725/0)
10.08.10_09:25:07.973805 7fb0698f3710 -- 192.168.0.105:6800/2579 >> 192.168.0.108:0/2546804884 pipe(0x7fb04c0012c0 sd=19 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:40756/0)
10.08.10_09:28:29.212338 7fb0818ec710 journal check_for_full at 35061760 : JOURNAL FULL 35061760 >= 3993599 (max_size 104857600 start 39055360)
osd/OSD.cc: In function 'void OSD::handle_pg_notify(MOSDPGNotify*)':
osd/OSD.cc:3377: FAILED assert(role == 0)
 1: (OSD::_dispatch(Message*)+0x35d) [0x4e65fd]
 2: (OSD::ms_dispatch(Message*)+0x39) [0x4e6859]
 3: (SimpleMessenger::dispatch_entry()+0x789) [0x46b099]
 4: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4580bc]
 5: (Thread::_entry_func(void*)+0xa) [0x46bcba]
 6: (()+0x6a3a) [0x7fb0851c6a3a]
 7: (clone()+0x6d) [0x7fb0843e477d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

[-- Attachment #3: osd.1.log --]
[-- Type: application/octet-stream, Size: 12423 bytes --]

10.08.10_08:26:54.092521 --- 2601 appending to log /var/log/ceph/osd.1.log ---
ceph version 0.22~rc ()
10.08.10_08:26:54.093953 7f85085c4720 filestore(/spare/osd1) mount detected btrfs
10.08.10_08:26:54.094064 7f85085c4720 filestore(/spare/osd1) mount btrfs CLONE_RANGE ioctl is supported
10.08.10_08:26:54.137778 7f85085c4720 filestore(/spare/osd1) mount btrfs SNAP_CREATE is supported
10.08.10_08:26:54.203108 7f85085c4720 filestore(/spare/osd1) mount btrfs SNAP_DESTROY is supported
10.08.10_08:26:54.240629 7f85085c4720 filestore(/spare/osd1) mount found snaps <>
10.08.10_08:26:54.246788 7f85085c4720 journal read_entry 4096 : seq 1 230 bytes
10.08.10_08:26:54.247583 7f85085c4720 filestore(/spare/osd1) parse . -> meta = 0
10.08.10_08:26:54.247618 7f85085c4720 filestore(/spare/osd1) parse .. -> meta = 0
10.08.10_08:26:54.247631 7f85085c4720 filestore(/spare/osd1) parse commit_op_seq -> meta = 0
10.08.10_08:26:54.247642 7f85085c4720 filestore(/spare/osd1) parse meta -> meta = 1
10.08.10_08:26:54.247655 7f85085c4720 filestore(/spare/osd1) parse temp -> temp = 1
10.08.10_08:26:54.248295 7f85085c3710 -- 0.0.0.0:6800/2601 >> 192.168.0.105:6789/0 pipe(0x1b3fb30 sd=-1 pgs=0 cs=0 l=0).fault first fault
10.08.10_08:26:57.652453 7f84ff6b8710 osd1 1 map says i am down.  switching to boot state.
10.08.10_08:26:57.652506 7f84ff6b8710 log [WRN] : map e1 wrongly marked me down
10.08.10_08:26:59.703903 7f84ecbf8710 -- 0.0.0.0:6801/2601 >> 192.168.0.103:6801/2735 pipe(0x7f84e4001330 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/2735 not 192.168.0.103:6801/2735 - presumably this is the same node!
10.08.10_08:26:59.805552 7f84eccf9710 -- 192.168.0.104:6801/2601 >> 192.168.0.105:6801/2579 pipe(0x7f84e4000b80 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/2579 not 192.168.0.105:6801/2579 - presumably this is the same node!
10.08.10_08:44:39.392347 7f84ec6f3710 -- 192.168.0.104:6800/2601 >> 192.168.0.108:0/2546804884 pipe(0x7f84a0000de0 sd=20 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:39159/0)
10.08.10_08:46:36.635351 7f85046c0710 journal check_for_full at 65949696 : JOURNAL FULL 65949696 >= 3993599 (max_size 104857600 start 69943296)
10.08.10_08:46:41.189374 7f85046c0710 journal check_for_full at 45596672 : JOURNAL FULL 45596672 >= 1105919 (max_size 104857600 start 46702592)
10.08.10_08:46:41.248857 7f85046c0710 journal check_for_full at 45596672 : JOURNAL FULL 45596672 >= 1105919 (max_size 104857600 start 46702592)
10.08.10_08:46:42.667815 7f85046c0710 journal check_for_full at 41603072 : JOURNAL FULL 41603072 >= 3993599 (max_size 104857600 start 45596672)
10.08.10_08:46:47.697259 7f85046c0710 journal check_for_full at 29704192 : JOURNAL FULL 29704192 >= 3993599 (max_size 104857600 start 33697792)
10.08.10_08:46:50.010934 7f85046c0710 journal check_for_full at 76185600 : JOURNAL FULL 76185600 >= 3948543 (max_size 104857600 start 80134144)
10.08.10_08:47:11.285719 7f85046c0710 journal check_for_full at 12038144 : JOURNAL FULL 12038144 >= 3928063 (max_size 104857600 start 15966208)
10.08.10_08:47:18.093547 7f85046c0710 journal check_for_full at 84148224 : JOURNAL FULL 84148224 >= 3993599 (max_size 104857600 start 88141824)
10.08.10_08:47:40.863509 7f85046c0710 journal check_for_full at 35422208 : JOURNAL FULL 35422208 >= 3993599 (max_size 104857600 start 39415808)
10.08.10_08:47:46.080811 7f85046c0710 journal check_for_full at 23441408 : JOURNAL FULL 23441408 >= 3993599 (max_size 104857600 start 27435008)
10.08.10_08:53:00.514699 7f84ec6f3710 -- 192.168.0.104:6800/2601 >> 192.168.0.108:0/2546804884 pipe(0x7f84a0000de0 sd=20 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:46796/0)
10.08.10_08:53:42.797728 7f85046c0710 journal check_for_full at 41906176 : JOURNAL FULL 41906176 >= 3895295 (max_size 104857600 start 45801472)
10.08.10_08:53:51.560559 7f85046c0710 journal check_for_full at 29925376 : JOURNAL FULL 29925376 >= 3993599 (max_size 104857600 start 33918976)
10.08.10_08:54:31.807478 7f85046c0710 journal check_for_full at 22118400 : JOURNAL FULL 22118400 >= 3993599 (max_size 104857600 start 26112000)
10.08.10_08:54:37.889781 7f85046c0710 journal check_for_full at 60698624 : JOURNAL FULL 60698624 >= 3985407 (max_size 104857600 start 64684032)
10.08.10_08:54:38.871116 7f85046c0710 journal check_for_full at 98521088 : JOURNAL FULL 98521088 >= 3985407 (max_size 104857600 start 102506496)
10.08.10_08:54:51.615431 7f85046c0710 journal check_for_full at 78606336 : JOURNAL FULL 78606336 >= 3993599 (max_size 104857600 start 82599936)
10.08.10_08:54:54.463588 7f85046c0710 journal check_for_full at 83095552 : JOURNAL FULL 83095552 >= 3940351 (max_size 104857600 start 87035904)
10.08.10_08:55:01.031310 7f85046c0710 journal check_for_full at 67129344 : JOURNAL FULL 67129344 >= 3993599 (max_size 104857600 start 71122944)
10.08.10_08:55:04.376707 7f85046c0710 journal check_for_full at 59142144 : JOURNAL FULL 59142144 >= 3993599 (max_size 104857600 start 63135744)
10.08.10_08:55:06.629499 7f85046c0710 journal check_for_full at 724992 : JOURNAL FULL 724992 >= 3993599 (max_size 104857600 start 4718592)
10.08.10_08:55:07.318006 7f85046c0710 journal check_for_full at 51154944 : JOURNAL FULL 51154944 >= 3993599 (max_size 104857600 start 55148544)
10.08.10_08:55:08.039358 7f85046c0710 journal check_for_full at 101584896 : JOURNAL FULL 101584896 >= 3993599 (max_size 104857600 start 724992)
10.08.10_08:55:09.589030 7f85046c0710 journal check_for_full at 97591296 : JOURNAL FULL 97591296 >= 3993599 (max_size 104857600 start 101584896)
10.08.10_08:55:11.931196 7f85046c0710 journal check_for_full at 39223296 : JOURNAL FULL 39223296 >= 3944447 (max_size 104857600 start 43167744)
10.08.10_08:55:13.505991 7f85046c0710 journal check_for_full at 35237888 : JOURNAL FULL 35237888 >= 3985407 (max_size 104857600 start 39223296)
10.08.10_08:55:16.000881 7f85046c0710 journal check_for_full at 81674240 : JOURNAL FULL 81674240 >= 3993599 (max_size 104857600 start 85667840)
10.08.10_08:55:17.390876 7f85046c0710 journal check_for_full at 77680640 : JOURNAL FULL 77680640 >= 3993599 (max_size 104857600 start 81674240)
10.08.10_08:55:19.896661 7f85046c0710 journal check_for_full at 19316736 : JOURNAL FULL 19316736 >= 3993599 (max_size 104857600 start 23310336)
10.08.10_08:55:21.543096 7f85046c0710 journal check_for_full at 15323136 : JOURNAL FULL 15323136 >= 3993599 (max_size 104857600 start 19316736)
10.08.10_08:55:22.203065 7f85046c0710 journal check_for_full at 65753088 : JOURNAL FULL 65753088 >= 3993599 (max_size 104857600 start 69746688)
10.08.10_08:55:24.815646 7f85046c0710 journal check_for_full at 7421952 : JOURNAL FULL 7421952 >= 3907583 (max_size 104857600 start 11329536)
10.08.10_08:55:25.524524 7f85046c0710 journal check_for_full at 57851904 : JOURNAL FULL 57851904 >= 3907583 (max_size 104857600 start 61759488)
10.08.10_08:55:26.245503 7f85046c0710 journal check_for_full at 3428352 : JOURNAL FULL 3428352 >= 3993599 (max_size 104857600 start 7421952)
10.08.10_08:55:27.911343 7f85046c0710 journal check_for_full at 104288256 : JOURNAL FULL 104288256 >= 3993599 (max_size 104857600 start 3428352)
10.08.10_08:55:30.272920 7f85046c0710 journal check_for_full at 46043136 : JOURNAL FULL 46043136 >= 3821567 (max_size 104857600 start 49864704)
10.08.10_08:55:31.930630 7f85046c0710 journal check_for_full at 42049536 : JOURNAL FULL 42049536 >= 3993599 (max_size 104857600 start 46043136)
10.08.10_08:55:34.465836 7f85046c0710 journal check_for_full at 88485888 : JOURNAL FULL 88485888 >= 3993599 (max_size 104857600 start 92479488)
10.08.10_08:55:38.332329 7f85046c0710 journal check_for_full at 26075136 : JOURNAL FULL 26075136 >= 3993599 (max_size 104857600 start 30068736)
10.08.10_08:55:40.628858 7f85046c0710 journal check_for_full at 72527872 : JOURNAL FULL 72527872 >= 3985407 (max_size 104857600 start 76513280)
10.08.10_08:55:42.908364 7f85046c0710 journal check_for_full at 14118912 : JOURNAL FULL 14118912 >= 3985407 (max_size 104857600 start 18104320)
10.08.10_08:55:47.308182 7f85046c0710 journal check_for_full at 2138112 : JOURNAL FULL 2138112 >= 3993599 (max_size 104857600 start 6131712)
10.08.10_08:55:49.495226 7f85046c0710 journal check_for_full at 48619520 : JOURNAL FULL 48619520 >= 3993599 (max_size 104857600 start 52613120)
10.08.10_08:55:50.299068 7f85046c0710 journal check_for_full at 99049472 : JOURNAL FULL 99049472 >= 3993599 (max_size 104857600 start 103043072)
10.08.10_08:55:50.968578 7f85046c0710 journal check_for_full at 44625920 : JOURNAL FULL 44625920 >= 3993599 (max_size 104857600 start 48619520)
10.08.10_08:55:53.437877 7f85046c0710 journal check_for_full at 91107328 : JOURNAL FULL 91107328 >= 3948543 (max_size 104857600 start 95055872)
10.08.10_08:55:56.841572 7f85046c0710 journal check_for_full at 83177472 : JOURNAL FULL 83177472 >= 3936255 (max_size 104857600 start 87113728)
10.08.10_08:56:02.831989 7f85046c0710 journal check_for_full at 67203072 : JOURNAL FULL 67203072 >= 3993599 (max_size 104857600 start 71196672)
10.08.10_08:56:04.279858 7f85046c0710 journal check_for_full at 63209472 : JOURNAL FULL 63209472 >= 3993599 (max_size 104857600 start 67203072)
10.08.10_08:56:07.597733 7f85046c0710 journal check_for_full at 93044736 : JOURNAL FULL 93044736 >= 3993599 (max_size 104857600 start 97038336)
10.08.10_08:56:09.981618 7f85046c0710 journal check_for_full at 34635776 : JOURNAL FULL 34635776 >= 3985407 (max_size 104857600 start 38621184)
10.08.10_08:58:54.970005 7f84ec6f3710 -- 192.168.0.104:6800/2601 >> 192.168.0.108:0/2546804884 pipe(0x7f84a0000de0 sd=20 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:54175/0)
10.08.10_08:58:58.081115 7f85046c0710 journal check_for_full at 56233984 : JOURNAL FULL 56233984 >= 3993599 (max_size 104857600 start 60227584)
10.08.10_08:59:00.023083 7f85046c0710 journal check_for_full at 52240384 : JOURNAL FULL 52240384 >= 3993599 (max_size 104857600 start 56233984)
10.08.10_08:59:00.849352 7f85046c0710 journal check_for_full at 2019328 : JOURNAL FULL 2019328 >= 3993599 (max_size 104857600 start 6012928)
10.08.10_08:59:00.916486 7f85046c0710 journal check_for_full at 2019328 : JOURNAL FULL 2019328 >= 3993599 (max_size 104857600 start 6012928)
10.08.10_08:59:15.808126 7f85046c0710 journal check_for_full at 20725760 : JOURNAL FULL 20725760 >= 3940351 (max_size 104857600 start 24666112)
10.08.10_08:59:23.465961 7f85046c0710 journal check_for_full at 42582016 : JOURNAL FULL 42582016 >= 3993599 (max_size 104857600 start 46575616)
10.08.10_08:59:26.382128 7f85046c0710 journal check_for_full at 89169920 : JOURNAL FULL 89169920 >= 3842047 (max_size 104857600 start 93011968)
10.08.10_08:59:27.107639 7f85046c0710 journal check_for_full at 34746368 : JOURNAL FULL 34746368 >= 3993599 (max_size 104857600 start 38739968)
10.08.10_09:00:15.345265 7f85046c0710 journal check_for_full at 74387456 : JOURNAL FULL 74387456 >= 3993599 (max_size 104857600 start 78381056)
10.08.10_09:14:52.039544 7f84ec6f3710 -- 192.168.0.104:6800/2601 >> 192.168.0.108:0/2546804884 pipe(0x7f84a0000de0 sd=20 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:34316/0)
10.08.10_09:25:08.143170 7f84ec6f3710 -- 192.168.0.104:6800/2601 >> 192.168.0.108:0/2546804884 pipe(0x7f84a0000de0 sd=20 pgs=0 cs=0 l=0).accept peer addr is really 192.168.0.108:0/2546804884 (socket is 192.168.0.108:51078/0)
10.08.10_09:26:55.743474 7f85046c0710 journal check_for_full at 74194944 : JOURNAL FULL 74194944 >= 3993599 (max_size 104857600 start 78188544)
10.08.10_09:28:26.796788 7f85046c0710 journal check_for_full at 82575360 : JOURNAL FULL 82575360 >= 3944447 (max_size 104857600 start 86519808)
10.08.10_09:28:53.191871 7f85046c0710 journal check_for_full at 99405824 : JOURNAL FULL 99405824 >= 3977215 (max_size 104857600 start 103383040)
osd/OSD.cc: In function 'void OSD::handle_pg_query(MOSDPGQuery*)':
osd/OSD.cc:3738: FAILED assert(role > 0)
 1: (OSD::_dispatch(Message*)+0x34d) [0x4e65ed]
 2: (OSD::ms_dispatch(Message*)+0x39) [0x4e6859]
 3: (SimpleMessenger::dispatch_entry()+0x789) [0x46b099]
 4: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4580bc]
 5: (Thread::_entry_func(void*)+0xa) [0x46bcba]
 6: (()+0x6a3a) [0x7f8507f9aa3a]
 7: (clone()+0x6d) [0x7f85071b877d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crashed after adding new osd
  2010-08-10 13:09 osd crashed after adding new osd Henry C Chang
@ 2010-08-10 15:42 ` Wido den Hollander
  2010-08-10 17:27   ` Henry C Chang
  0 siblings, 1 reply; 6+ messages in thread
From: Wido den Hollander @ 2010-08-10 15:42 UTC (permalink / raw)
  To: Henry C Chang; +Cc: ceph-devel

Hi Henry,

Is there a core-dump of these crashes in /? If so, these could help
finding the cause of this.

See: http://ceph.newdream.net/wiki/Troubleshooting

And btw, which version of Ceph are you running?

Wido

On Tue, 2010-08-10 at 21:09 +0800, Henry C Chang wrote:
> Hi,
> 
> I have a ceph cluster: 3 (mon+osd) and 2 (mds).
> When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
> The error logs are attached.
> 
> My procedure to add the 4th osd is:
> 
> add [osd3] in the conf file: /etc/ceph/ceph.conf
> ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
> cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
> ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
> osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
> /tmp/crush.new
> ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
> /etc/init.d/ceph -c
> /etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd
> 
> Is my procedure to add an osd incorrect?
> 
> Thanks,
> Henry


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crashed after adding new osd
  2010-08-10 15:42 ` Wido den Hollander
@ 2010-08-10 17:27   ` Henry C Chang
  2010-08-10 20:56     ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Henry C Chang @ 2010-08-10 17:27 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

Hi Wido,

I am running unstable branch (commit: b72c1bb6e9b77e1ab6c2) dated July 30.
I just uploaded the core dumps to

http://home.anet.net.tw/cycbbb/coredumps/core.2580.gz
http://home.anet.net.tw/cycbbb/coredumps/core.2602.gz



On Tue, Aug 10, 2010 at 11:42 PM, Wido den Hollander <wido@widodh.nl> wrote:
> Hi Henry,
>
> Is there a core-dump of these crashes in /? If so, these could help
> finding the cause of this.
>
> See: http://ceph.newdream.net/wiki/Troubleshooting
>
> And btw, which version of Ceph are you running?
>
> Wido
>
> On Tue, 2010-08-10 at 21:09 +0800, Henry C Chang wrote:
>> Hi,
>>
>> I have a ceph cluster: 3 (mon+osd) and 2 (mds).
>> When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
>> The error logs are attached.
>>
>> My procedure to add the 4th osd is:
>>
>> add [osd3] in the conf file: /etc/ceph/ceph.conf
>> ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
>> cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
>> ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
>> osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
>> /tmp/crush.new
>> ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
>> /etc/init.d/ceph -c
>> /etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd
>>
>> Is my procedure to add an osd incorrect?
>>
>> Thanks,
>> Henry
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crashed after adding new osd
  2010-08-10 17:27   ` Henry C Chang
@ 2010-08-10 20:56     ` Sage Weil
  2010-08-11 11:16       ` Henry C Chang
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2010-08-10 20:56 UTC (permalink / raw)
  To: Henry C Chang; +Cc: Wido den Hollander, ceph-devel

Hi Henry,

Your osd add procedure looks correct.  Did the osd start correct after 
this point?  Are you able to reproduce the problem with osd logging 
turned up (debug osd = 20 in [osd])?

I just tried this with the latest unstable and wasn't able to reproduce 
the problem.

sage

On Wed, 11 Aug 2010, Henry C Chang wrote:

> Hi Wido,
> 
> I am running unstable branch (commit: b72c1bb6e9b77e1ab6c2) dated July 30.
> I just uploaded the core dumps to
> 
> http://home.anet.net.tw/cycbbb/coredumps/core.2580.gz
> http://home.anet.net.tw/cycbbb/coredumps/core.2602.gz
> 
> 
> 
> On Tue, Aug 10, 2010 at 11:42 PM, Wido den Hollander <wido@widodh.nl> wrote:
> > Hi Henry,
> >
> > Is there a core-dump of these crashes in /? If so, these could help
> > finding the cause of this.
> >
> > See: http://ceph.newdream.net/wiki/Troubleshooting
> >
> > And btw, which version of Ceph are you running?
> >
> > Wido
> >
> > On Tue, 2010-08-10 at 21:09 +0800, Henry C Chang wrote:
> >> Hi,
> >>
> >> I have a ceph cluster: 3 (mon+osd) and 2 (mds).
> >> When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
> >> The error logs are attached.
> >>
> >> My procedure to add the 4th osd is:
> >>
> >> add [osd3] in the conf file: /etc/ceph/ceph.conf
> >> ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
> >> cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
> >> ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
> >> osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
> >> /tmp/crush.new
> >> ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
> >> /etc/init.d/ceph -c
> >> /etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd
> >>
> >> Is my procedure to add an osd incorrect?
> >>
> >> Thanks,
> >> Henry
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crashed after adding new osd
  2010-08-10 20:56     ` Sage Weil
@ 2010-08-11 11:16       ` Henry C Chang
  2010-08-11 16:08         ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Henry C Chang @ 2010-08-11 11:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Wido den Hollander, ceph-devel

Hi Sage,

After adding osd3, only osd2 is alive.
But the whole cluster can be recovered after restarting all dead osds.

I read the wiki and reversed the order of my procedure: starting osd
before setting crush map.
It seems to work ok now.

Thanks,
Henry


On Wed, Aug 11, 2010 at 4:56 AM, Sage Weil <sage@newdream.net> wrote:
> Hi Henry,
>
> Your osd add procedure looks correct.  Did the osd start correct after
> this point?  Are you able to reproduce the problem with osd logging
> turned up (debug osd = 20 in [osd])?
>
> I just tried this with the latest unstable and wasn't able to reproduce
> the problem.
>
> sage
>
> On Wed, 11 Aug 2010, Henry C Chang wrote:
>
>> Hi Wido,
>>
>> I am running unstable branch (commit: b72c1bb6e9b77e1ab6c2) dated July 30.
>> I just uploaded the core dumps to
>>
>> http://home.anet.net.tw/cycbbb/coredumps/core.2580.gz
>> http://home.anet.net.tw/cycbbb/coredumps/core.2602.gz
>>
>>
>>
>> On Tue, Aug 10, 2010 at 11:42 PM, Wido den Hollander <wido@widodh.nl> wrote:
>> > Hi Henry,
>> >
>> > Is there a core-dump of these crashes in /? If so, these could help
>> > finding the cause of this.
>> >
>> > See: http://ceph.newdream.net/wiki/Troubleshooting
>> >
>> > And btw, which version of Ceph are you running?
>> >
>> > Wido
>> >
>> > On Tue, 2010-08-10 at 21:09 +0800, Henry C Chang wrote:
>> >> Hi,
>> >>
>> >> I have a ceph cluster: 3 (mon+osd) and 2 (mds).
>> >> When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
>> >> The error logs are attached.
>> >>
>> >> My procedure to add the 4th osd is:
>> >>
>> >> add [osd3] in the conf file: /etc/ceph/ceph.conf
>> >> ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
>> >> cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
>> >> ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
>> >> osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
>> >> /tmp/crush.new
>> >> ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
>> >> /etc/init.d/ceph -c
>> >> /etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd
>> >>
>> >> Is my procedure to add an osd incorrect?
>> >>
>> >> Thanks,
>> >> Henry
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crashed after adding new osd
  2010-08-11 11:16       ` Henry C Chang
@ 2010-08-11 16:08         ` Sage Weil
  0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2010-08-11 16:08 UTC (permalink / raw)
  To: Henry C Chang; +Cc: Wido den Hollander, ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3012 bytes --]

On Wed, 11 Aug 2010, Henry C Chang wrote:
> After adding osd3, only osd2 is alive.
> But the whole cluster can be recovered after restarting all dead osds.
> 
> I read the wiki and reversed the order of my procedure: starting osd
> before setting crush map.
> It seems to work ok now.

Hmm, the order shouldn't matter--there's definitely something going wrong.  
Are you able to reproduce the crash with the old order?  (With logs?  :)

sage


> 
> Thanks,
> Henry
> 
> 
> On Wed, Aug 11, 2010 at 4:56 AM, Sage Weil <sage@newdream.net> wrote:
> > Hi Henry,
> >
> > Your osd add procedure looks correct.  Did the osd start correct after
> > this point?  Are you able to reproduce the problem with osd logging
> > turned up (debug osd = 20 in [osd])?
> >
> > I just tried this with the latest unstable and wasn't able to reproduce
> > the problem.
> >
> > sage
> >
> > On Wed, 11 Aug 2010, Henry C Chang wrote:
> >
> >> Hi Wido,
> >>
> >> I am running unstable branch (commit: b72c1bb6e9b77e1ab6c2) dated July 30.
> >> I just uploaded the core dumps to
> >>
> >> http://home.anet.net.tw/cycbbb/coredumps/core.2580.gz
> >> http://home.anet.net.tw/cycbbb/coredumps/core.2602.gz
> >>
> >>
> >>
> >> On Tue, Aug 10, 2010 at 11:42 PM, Wido den Hollander <wido@widodh.nl> wrote:
> >> > Hi Henry,
> >> >
> >> > Is there a core-dump of these crashes in /? If so, these could help
> >> > finding the cause of this.
> >> >
> >> > See: http://ceph.newdream.net/wiki/Troubleshooting
> >> >
> >> > And btw, which version of Ceph are you running?
> >> >
> >> > Wido
> >> >
> >> > On Tue, 2010-08-10 at 21:09 +0800, Henry C Chang wrote:
> >> >> Hi,
> >> >>
> >> >> I have a ceph cluster: 3 (mon+osd) and 2 (mds).
> >> >> When I tried to add the 4th osd to the cluster, osd0 and osd1 crashed.
> >> >> The error logs are attached.
> >> >>
> >> >> My procedure to add the 4th osd is:
> >> >>
> >> >> add [osd3] in the conf file: /etc/ceph/ceph.conf
> >> >> ceph -c /etc/ceph/ceph.conf mon getmap -o /tmp/monmap
> >> >> cosd -c /etc/ceph/ceph.conf -i 3 --mkfs --monmap /tmp/monmap
> >> >> ceph -c /etc/ceph/ceph.conf osd setmaxosd 4
> >> >> osdmaptool --createsimple 4 --clobber /tmp/osdmap.junk --export-crush
> >> >> /tmp/crush.new
> >> >> ceph -c /etc/ceph.conf osd setcrushmap -i /tmp/crush.new
> >> >> /etc/init.d/ceph -c
> >> >> /etc/ceph/d2c5d946-b888-40b3-aac2-adda05477a81.conf start osd
> >> >>
> >> >> Is my procedure to add an osd incorrect?
> >> >>
> >> >> Thanks,
> >> >> Henry
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-11 16:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-10 13:09 osd crashed after adding new osd Henry C Chang
2010-08-10 15:42 ` Wido den Hollander
2010-08-10 17:27   ` Henry C Chang
2010-08-10 20:56     ` Sage Weil
2010-08-11 11:16       ` Henry C Chang
2010-08-11 16:08         ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.