* Force an OSD to try to peer
@ 2015-03-31 2:15 Robert LeBlanc
2015-03-31 2:16 ` Fwd: " Robert LeBlanc
[not found] ` <CAANLjFowYybKKueFeHHT4Ug2eTW_-RGVQRtRE7vfgF-1XXJwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 2 replies; 8+ messages in thread
From: Robert LeBlanc @ 2015-03-31 2:15 UTC (permalink / raw)
To: Ceph-User, ceph-devel
[-- Attachment #1.1: Type: text/plain, Size: 28454 bytes --]
I've been working at this peering problem all day. I've done a lot of
testing at the network layer and I just don't believe that we have a
problem that would prevent OSDs from peering. When looking though osd_debug
20/20 logs, it just doesn't look like the OSDs are trying to peer. I don't
know if it is because there are so many outstanding creations or what. OSDs
will peer with OSDs on other hosts, but for reason only chooses a certain
number and not one that it needs to finish the peering process.
I've check: firewall, open files, number of threads allowed. These usually
have given me an error in the logs that helped me fix the problem.
I can't find a configuration item that specifies how many peers an OSD
should contact or anything that would be artificially limiting the peering
connections. I've restarted the OSDs a number of times, as well as
rebooting the hosts. I beleive if the OSDs finish peering everything will
clear up. I can't find anything in pg query that would help me figure out
what is blocking it (peering blocked by is empty). The PGs are scattered
across all the hosts so we can't pin it down to a specific host.
Any ideas on what to try would be appreciated.
[ulhglive-root@ceph9 ~]# ceph --version
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
[ulhglive-root@ceph9 ~]# ceph status
cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
monmap e2: 3 mons at {mon1=
10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
election epoch 30, quorum 0,1,2 mon1,mon2,mon3
osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
11447 MB used, 436 TB / 436 TB avail
727 active+clean
990 peering
37 creating+peering
1 down+peering
290 remapped+peering
3 creating+remapped+peering
{ "state": "peering",
"epoch": 707,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"info": { "pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 293,
"last_epoch_started": 343,
"last_epoch_clean": 343,
"last_epoch_split": 0,
"same_up_since": 688,
"same_interval_since": 688,
"same_primary_since": 608,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "326",
"reported_epoch": "707",
"state": "peering",
"last_fresh": "2015-03-30 20:10:39.509855",
"last_change": "2015-03-30 19:44:17.361601",
"last_active": "2015-03-30 11:37:56.956417",
"last_clean": "2015-03-30 11:37:56.956417",
"last_became_active": "0.000000",
"last_unstale": "2015-03-30 20:10:39.509855",
"mapping_epoch": 683,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 293,
"last_epoch_clean": 343,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"up_primary": 40,
"acting_primary": 40},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 348,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}},
"peer_info": [
{ "peer": "48",
"pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 293,
"last_epoch_started": 343,
"last_epoch_clean": 343,
"last_epoch_split": 0,
"same_up_since": 688,
"same_interval_since": 688,
"same_primary_since": 608,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "24",
"reported_epoch": "348",
"state": "peering",
"last_fresh": "2015-03-30 11:39:02.979742",
"last_change": "2015-03-30 11:39:01.650897",
"last_active": "2015-03-30 11:37:56.956417",
"last_clean": "2015-03-30 11:37:56.956417",
"last_became_active": "0.000000",
"last_unstale": "2015-03-30 11:39:02.979742",
"mapping_epoch": 683,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 293,
"last_epoch_clean": 343,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"up_primary": 40,
"acting_primary": 40},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 348,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}},
{ "peer": "110",
"pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 0,
"last_epoch_started": 0,
"last_epoch_clean": 0,
"last_epoch_split": 0,
"same_up_since": 0,
"same_interval_since": 0,
"same_primary_since": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "0",
"reported_epoch": "0",
"state": "inactive",
"last_fresh": "0.000000",
"last_change": "0.000000",
"last_active": "0.000000",
"last_clean": "0.000000",
"last_became_active": "0.000000",
"last_unstale": "0.000000",
"mapping_epoch": 0,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 0,
"last_epoch_clean": 0,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [],
"acting": [],
"up_primary": -1,
"acting_primary": -1},
"empty": 1,
"dne": 1,
"incomplete": 0,
"last_epoch_started": 0,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}}],
"recovery_state": [
{ "name": "Started\/Primary\/Peering\/GetInfo",
"enter_time": "2015-03-30 19:44:18.709317",
"requested_info_from": [
{ "osd": "0"},
{ "osd": "5"},
{ "osd": "10"},
{ "osd": "22"},
{ "osd": "54"},
{ "osd": "91"},
{ "osd": "92"},
{ "osd": "113"},
{ "osd": "114"}]},
{ "name": "Started\/Primary\/Peering",
"enter_time": "2015-03-30 19:44:18.709316",
"past_intervals": [
{ "first": 342,
"last": 346,
"maybe_went_rw": 1,
"up": [
40,
92,
114],
"acting": [
40,
92,
114,
40,
40]},
{ "first": 347,
"last": 353,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 354,
"last": 356,
"maybe_went_rw": 1,
"up": [
92,
48],
"acting": [
92,
48,
92,
92]},
{ "first": 357,
"last": 359,
"maybe_went_rw": 1,
"up": [
113,
48,
114],
"acting": [
113,
48,
114,
113,
113]},
{ "first": 360,
"last": 361,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 362,
"last": 364,
"maybe_went_rw": 1,
"up": [
40,
92],
"acting": [
40,
92,
40,
40]},
{ "first": 365,
"last": 369,
"maybe_went_rw": 1,
"up": [
40,
92,
114],
"acting": [
40,
92,
114,
40,
40]},
{ "first": 370,
"last": 379,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 380,
"last": 400,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 401,
"last": 409,
"maybe_went_rw": 1,
"up": [
92,
48,
91],
"acting": [
92,
48,
91,
92,
92]},
{ "first": 410,
"last": 414,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
0],
"acting": [
113,
48,
114,
0,
113,
113]},
{ "first": 415,
"last": 435,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
10],
"acting": [
113,
48,
114,
10,
113,
113]},
{ "first": 436,
"last": 442,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 443,
"last": 446,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 447,
"last": 457,
"maybe_went_rw": 1,
"up": [
40,
48],
"acting": [
40,
48,
40,
40]},
{ "first": 458,
"last": 460,
"maybe_went_rw": 1,
"up": [
40,
48,
10],
"acting": [
40,
48,
10,
40,
40]},
{ "first": 461,
"last": 466,
"maybe_went_rw": 1,
"up": [
40,
48,
22],
"acting": [
40,
48,
22,
40,
40]},
{ "first": 467,
"last": 478,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
5],
"acting": [
40,
48,
22,
5,
40,
40]},
{ "first": 479,
"last": 489,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
110],
"acting": [
40,
48,
22,
110,
40,
40]},
{ "first": 490,
"last": 496,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
0],
"acting": [
40,
48,
22,
0,
40,
40]},
{ "first": 497,
"last": 507,
"maybe_went_rw": 1,
"up": [
40,
48,
114,
10],
"acting": [
40,
48,
114,
10,
40,
40]},
{ "first": 508,
"last": 511,
"maybe_went_rw": 1,
"up": [
40,
48,
54,
91],
"acting": [
40,
48,
54,
91,
40,
40]},
{ "first": 512,
"last": 579,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 580,
"last": 580,
"maybe_went_rw": 0,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 581,
"last": 591,
"maybe_went_rw": 1,
"up": [
92,
91],
"acting": [
92,
91,
92,
92]},
{ "first": 592,
"last": 595,
"maybe_went_rw": 1,
"up": [
113,
114,
22,
0],
"acting": [
113,
114,
22,
0,
113,
113]},
{ "first": 596,
"last": 599,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
10],
"acting": [
113,
48,
114,
10,
113,
113]},
{ "first": 600,
"last": 606,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 607,
"last": 607,
"maybe_went_rw": 0,
"up": [
92,
91],
"acting": [
92,
91,
92,
92]},
{ "first": 608,
"last": 616,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 617,
"last": 625,
"maybe_went_rw": 1,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 626,
"last": 632,
"maybe_went_rw": 1,
"up": [
40,
92,
114,
10],
"acting": [
40,
92,
114,
10,
40,
40]},
{ "first": 633,
"last": 639,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 640,
"last": 643,
"maybe_went_rw": 1,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 644,
"last": 662,
"maybe_went_rw": 1,
"up": [
40,
92,
114,
10],
"acting": [
40,
92,
114,
10,
40,
40]},
{ "first": 663,
"last": 679,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 680,
"last": 682,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 683,
"last": 687,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
10],
"acting": [
40,
92,
48,
10,
40,
40]}],
"probing_osds": [
"0",
"5",
"10",
"22",
"40",
"48",
"54",
"91",
"92",
"110",
"113",
"114"],
"down_osds_we_would_probe": [],
"peering_blocked_by": []},
{ "name": "Started",
"enter_time": "2015-03-30 19:44:18.709312"}],
"agent_state": {}}
[-- Attachment #1.2: Type: text/html, Size: 51118 bytes --]
[-- Attachment #2: Type: text/plain, Size: 178 bytes --]
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Fwd: Force an OSD to try to peer
2015-03-31 2:15 Force an OSD to try to peer Robert LeBlanc
@ 2015-03-31 2:16 ` Robert LeBlanc
[not found] ` <CAANLjFowYybKKueFeHHT4Ug2eTW_-RGVQRtRE7vfgF-1XXJwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 0 replies; 8+ messages in thread
From: Robert LeBlanc @ 2015-03-31 2:16 UTC (permalink / raw)
To: Ceph-User, ceph-devel
Sorry HTML snuck in somewhere.
---------- Forwarded message ----------
From: Robert LeBlanc <robert@leblancnet.us>
Date: Mon, Mar 30, 2015 at 8:15 PM
Subject: Force an OSD to try to peer
To: Ceph-User <ceph-users@ceph.com>, ceph-devel <ceph-devel@vger.kernel.org>
I've been working at this peering problem all day. I've done a lot of
testing at the network layer and I just don't believe that we have a
problem that would prevent OSDs from peering. When looking though
osd_debug 20/20 logs, it just doesn't look like the OSDs are trying to
peer. I don't know if it is because there are so many outstanding
creations or what. OSDs will peer with OSDs on other hosts, but for
reason only chooses a certain number and not one that it needs to
finish the peering process.
I've check: firewall, open files, number of threads allowed. These
usually have given me an error in the logs that helped me fix the
problem.
I can't find a configuration item that specifies how many peers an OSD
should contact or anything that would be artificially limiting the
peering connections. I've restarted the OSDs a number of times, as
well as rebooting the hosts. I beleive if the OSDs finish peering
everything will clear up. I can't find anything in pg query that would
help me figure out what is blocking it (peering blocked by is empty).
The PGs are scattered across all the hosts so we can't pin it down to
a specific host.
Any ideas on what to try would be appreciated.
[ulhglive-root@ceph9 ~]# ceph --version
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
[ulhglive-root@ceph9 ~]# ceph status
cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
monmap e2: 3 mons at
{mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
election epoch 30, quorum 0,1,2 mon1,mon2,mon3
osdmap e704: 120 osds: 120 up, 120 in
pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
11447 MB used, 436 TB / 436 TB avail
727 active+clean
990 peering
37 creating+peering
1 down+peering
290 remapped+peering
3 creating+remapped+peering
{ "state": "peering",
"epoch": 707,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"info": { "pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 293,
"last_epoch_started": 343,
"last_epoch_clean": 343,
"last_epoch_split": 0,
"same_up_since": 688,
"same_interval_since": 688,
"same_primary_since": 608,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "326",
"reported_epoch": "707",
"state": "peering",
"last_fresh": "2015-03-30 20:10:39.509855",
"last_change": "2015-03-30 19:44:17.361601",
"last_active": "2015-03-30 11:37:56.956417",
"last_clean": "2015-03-30 11:37:56.956417",
"last_became_active": "0.000000",
"last_unstale": "2015-03-30 20:10:39.509855",
"mapping_epoch": 683,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 293,
"last_epoch_clean": 343,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"up_primary": 40,
"acting_primary": 40},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 348,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}},
"peer_info": [
{ "peer": "48",
"pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 293,
"last_epoch_started": 343,
"last_epoch_clean": 343,
"last_epoch_split": 0,
"same_up_since": 688,
"same_interval_since": 688,
"same_primary_since": 608,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "24",
"reported_epoch": "348",
"state": "peering",
"last_fresh": "2015-03-30 11:39:02.979742",
"last_change": "2015-03-30 11:39:01.650897",
"last_active": "2015-03-30 11:37:56.956417",
"last_clean": "2015-03-30 11:37:56.956417",
"last_became_active": "0.000000",
"last_unstale": "2015-03-30 11:39:02.979742",
"mapping_epoch": 683,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 293,
"last_epoch_clean": 343,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91],
"up_primary": 40,
"acting_primary": 40},
"empty": 1,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 348,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}},
{ "peer": "110",
"pgid": "7.171",
"last_update": "0'0",
"last_complete": "0'0",
"log_tail": "0'0",
"last_user_version": 0,
"last_backfill": "MAX",
"purged_snaps": "[]",
"history": { "epoch_created": 0,
"last_epoch_started": 0,
"last_epoch_clean": 0,
"last_epoch_split": 0,
"same_up_since": 0,
"same_interval_since": 0,
"same_primary_since": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000"},
"stats": { "version": "0'0",
"reported_seq": "0",
"reported_epoch": "0",
"state": "inactive",
"last_fresh": "0.000000",
"last_change": "0.000000",
"last_active": "0.000000",
"last_clean": "0.000000",
"last_became_active": "0.000000",
"last_unstale": "0.000000",
"mapping_epoch": 0,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 0,
"last_epoch_clean": 0,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": "0",
"stat_sum": { "num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
"stat_cat_sum": {},
"up": [],
"acting": [],
"up_primary": -1,
"acting_primary": -1},
"empty": 1,
"dne": 1,
"incomplete": 0,
"last_epoch_started": 0,
"hit_set_history": { "current_last_update": "0'0",
"current_last_stamp": "0.000000",
"current_info": { "begin": "0.000000",
"end": "0.000000",
"version": "0'0"},
"history": []}}],
"recovery_state": [
{ "name": "Started\/Primary\/Peering\/GetInfo",
"enter_time": "2015-03-30 19:44:18.709317",
"requested_info_from": [
{ "osd": "0"},
{ "osd": "5"},
{ "osd": "10"},
{ "osd": "22"},
{ "osd": "54"},
{ "osd": "91"},
{ "osd": "92"},
{ "osd": "113"},
{ "osd": "114"}]},
{ "name": "Started\/Primary\/Peering",
"enter_time": "2015-03-30 19:44:18.709316",
"past_intervals": [
{ "first": 342,
"last": 346,
"maybe_went_rw": 1,
"up": [
40,
92,
114],
"acting": [
40,
92,
114,
40,
40]},
{ "first": 347,
"last": 353,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 354,
"last": 356,
"maybe_went_rw": 1,
"up": [
92,
48],
"acting": [
92,
48,
92,
92]},
{ "first": 357,
"last": 359,
"maybe_went_rw": 1,
"up": [
113,
48,
114],
"acting": [
113,
48,
114,
113,
113]},
{ "first": 360,
"last": 361,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 362,
"last": 364,
"maybe_went_rw": 1,
"up": [
40,
92],
"acting": [
40,
92,
40,
40]},
{ "first": 365,
"last": 369,
"maybe_went_rw": 1,
"up": [
40,
92,
114],
"acting": [
40,
92,
114,
40,
40]},
{ "first": 370,
"last": 379,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 380,
"last": 400,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 401,
"last": 409,
"maybe_went_rw": 1,
"up": [
92,
48,
91],
"acting": [
92,
48,
91,
92,
92]},
{ "first": 410,
"last": 414,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
0],
"acting": [
113,
48,
114,
0,
113,
113]},
{ "first": 415,
"last": 435,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
10],
"acting": [
113,
48,
114,
10,
113,
113]},
{ "first": 436,
"last": 442,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 443,
"last": 446,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 447,
"last": 457,
"maybe_went_rw": 1,
"up": [
40,
48],
"acting": [
40,
48,
40,
40]},
{ "first": 458,
"last": 460,
"maybe_went_rw": 1,
"up": [
40,
48,
10],
"acting": [
40,
48,
10,
40,
40]},
{ "first": 461,
"last": 466,
"maybe_went_rw": 1,
"up": [
40,
48,
22],
"acting": [
40,
48,
22,
40,
40]},
{ "first": 467,
"last": 478,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
5],
"acting": [
40,
48,
22,
5,
40,
40]},
{ "first": 479,
"last": 489,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
110],
"acting": [
40,
48,
22,
110,
40,
40]},
{ "first": 490,
"last": 496,
"maybe_went_rw": 1,
"up": [
40,
48,
22,
0],
"acting": [
40,
48,
22,
0,
40,
40]},
{ "first": 497,
"last": 507,
"maybe_went_rw": 1,
"up": [
40,
48,
114,
10],
"acting": [
40,
48,
114,
10,
40,
40]},
{ "first": 508,
"last": 511,
"maybe_went_rw": 1,
"up": [
40,
48,
54,
91],
"acting": [
40,
48,
54,
91,
40,
40]},
{ "first": 512,
"last": 579,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 580,
"last": 580,
"maybe_went_rw": 0,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 581,
"last": 591,
"maybe_went_rw": 1,
"up": [
92,
91],
"acting": [
92,
91,
92,
92]},
{ "first": 592,
"last": 595,
"maybe_went_rw": 1,
"up": [
113,
114,
22,
0],
"acting": [
113,
114,
22,
0,
113,
113]},
{ "first": 596,
"last": 599,
"maybe_went_rw": 1,
"up": [
113,
48,
114,
10],
"acting": [
113,
48,
114,
10,
113,
113]},
{ "first": 600,
"last": 606,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 607,
"last": 607,
"maybe_went_rw": 0,
"up": [
92,
91],
"acting": [
92,
91,
92,
92]},
{ "first": 608,
"last": 616,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 617,
"last": 625,
"maybe_went_rw": 1,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 626,
"last": 632,
"maybe_went_rw": 1,
"up": [
40,
92,
114,
10],
"acting": [
40,
92,
114,
10,
40,
40]},
{ "first": 633,
"last": 639,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 640,
"last": 643,
"maybe_went_rw": 1,
"up": [
40,
92,
91],
"acting": [
40,
92,
91,
40,
40]},
{ "first": 644,
"last": 662,
"maybe_went_rw": 1,
"up": [
40,
92,
114,
10],
"acting": [
40,
92,
114,
10,
40,
40]},
{ "first": 663,
"last": 679,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
91],
"acting": [
40,
92,
48,
91,
40,
40]},
{ "first": 680,
"last": 682,
"maybe_went_rw": 1,
"up": [
40,
92,
48],
"acting": [
40,
92,
48,
40,
40]},
{ "first": 683,
"last": 687,
"maybe_went_rw": 1,
"up": [
40,
92,
48,
10],
"acting": [
40,
92,
48,
10,
40,
40]}],
"probing_osds": [
"0",
"5",
"10",
"22",
"40",
"48",
"54",
"91",
"92",
"110",
"113",
"114"],
"down_osds_we_would_probe": [],
"peering_blocked_by": []},
{ "name": "Started",
"enter_time": "2015-03-30 19:44:18.709312"}],
"agent_state": {}}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Force an OSD to try to peer
[not found] ` <CAANLjFowYybKKueFeHHT4Ug2eTW_-RGVQRtRE7vfgF-1XXJwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-31 17:07 ` Robert LeBlanc
2015-03-31 17:36 ` Sage Weil
0 siblings, 1 reply; 8+ messages in thread
From: Robert LeBlanc @ 2015-03-31 17:07 UTC (permalink / raw)
To: Ceph-User, ceph-devel
Turns out jumbo frames was not set on all the switch ports. Once that
was resolved the cluster quickly became healthy.
On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> I've been working at this peering problem all day. I've done a lot of
> testing at the network layer and I just don't believe that we have a problem
> that would prevent OSDs from peering. When looking though osd_debug 20/20
> logs, it just doesn't look like the OSDs are trying to peer. I don't know if
> it is because there are so many outstanding creations or what. OSDs will
> peer with OSDs on other hosts, but for reason only chooses a certain number
> and not one that it needs to finish the peering process.
>
> I've check: firewall, open files, number of threads allowed. These usually
> have given me an error in the logs that helped me fix the problem.
>
> I can't find a configuration item that specifies how many peers an OSD
> should contact or anything that would be artificially limiting the peering
> connections. I've restarted the OSDs a number of times, as well as rebooting
> the hosts. I beleive if the OSDs finish peering everything will clear up. I
> can't find anything in pg query that would help me figure out what is
> blocking it (peering blocked by is empty). The PGs are scattered across all
> the hosts so we can't pin it down to a specific host.
>
> Any ideas on what to try would be appreciated.
>
> [ulhglive-root@ceph9 ~]# ceph --version
> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> [ulhglive-root@ceph9 ~]# ceph status
> cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
> health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
> inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
> monmap e2: 3 mons at
> {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
> election epoch 30, quorum 0,1,2 mon1,mon2,mon3
> osdmap e704: 120 osds: 120 up, 120 in
> pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> 11447 MB used, 436 TB / 436 TB avail
> 727 active+clean
> 990 peering
> 37 creating+peering
> 1 down+peering
> 290 remapped+peering
> 3 creating+remapped+peering
>
> { "state": "peering",
> "epoch": 707,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91],
> "info": { "pgid": "7.171",
> "last_update": "0'0",
> "last_complete": "0'0",
> "log_tail": "0'0",
> "last_user_version": 0,
> "last_backfill": "MAX",
> "purged_snaps": "[]",
> "history": { "epoch_created": 293,
> "last_epoch_started": 343,
> "last_epoch_clean": 343,
> "last_epoch_split": 0,
> "same_up_since": 688,
> "same_interval_since": 688,
> "same_primary_since": 608,
> "last_scrub": "0'0",
> "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_clean_scrub_stamp": "0.000000"},
> "stats": { "version": "0'0",
> "reported_seq": "326",
> "reported_epoch": "707",
> "state": "peering",
> "last_fresh": "2015-03-30 20:10:39.509855",
> "last_change": "2015-03-30 19:44:17.361601",
> "last_active": "2015-03-30 11:37:56.956417",
> "last_clean": "2015-03-30 11:37:56.956417",
> "last_became_active": "0.000000",
> "last_unstale": "2015-03-30 20:10:39.509855",
> "mapping_epoch": 683,
> "log_start": "0'0",
> "ondisk_log_start": "0'0",
> "created": 293,
> "last_epoch_clean": 343,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "0'0",
> "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_clean_scrub_stamp": "0.000000",
> "log_size": 0,
> "ondisk_log_size": 0,
> "stats_invalid": "0",
> "stat_sum": { "num_bytes": 0,
> "num_objects": 0,
> "num_object_clones": 0,
> "num_object_copies": 0,
> "num_objects_missing_on_primary": 0,
> "num_objects_degraded": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 0,
> "num_whiteouts": 0,
> "num_read": 0,
> "num_read_kb": 0,
> "num_write": 0,
> "num_write_kb": 0,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 0,
> "num_bytes_recovered": 0,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0},
> "stat_cat_sum": {},
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91],
> "up_primary": 40,
> "acting_primary": 40},
> "empty": 1,
> "dne": 0,
> "incomplete": 0,
> "last_epoch_started": 348,
> "hit_set_history": { "current_last_update": "0'0",
> "current_last_stamp": "0.000000",
> "current_info": { "begin": "0.000000",
> "end": "0.000000",
> "version": "0'0"},
> "history": []}},
> "peer_info": [
> { "peer": "48",
> "pgid": "7.171",
> "last_update": "0'0",
> "last_complete": "0'0",
> "log_tail": "0'0",
> "last_user_version": 0,
> "last_backfill": "MAX",
> "purged_snaps": "[]",
> "history": { "epoch_created": 293,
> "last_epoch_started": 343,
> "last_epoch_clean": 343,
> "last_epoch_split": 0,
> "same_up_since": 688,
> "same_interval_since": 688,
> "same_primary_since": 608,
> "last_scrub": "0'0",
> "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_clean_scrub_stamp": "0.000000"},
> "stats": { "version": "0'0",
> "reported_seq": "24",
> "reported_epoch": "348",
> "state": "peering",
> "last_fresh": "2015-03-30 11:39:02.979742",
> "last_change": "2015-03-30 11:39:01.650897",
> "last_active": "2015-03-30 11:37:56.956417",
> "last_clean": "2015-03-30 11:37:56.956417",
> "last_became_active": "0.000000",
> "last_unstale": "2015-03-30 11:39:02.979742",
> "mapping_epoch": 683,
> "log_start": "0'0",
> "ondisk_log_start": "0'0",
> "created": 293,
> "last_epoch_clean": 343,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "0'0",
> "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> "last_clean_scrub_stamp": "0.000000",
> "log_size": 0,
> "ondisk_log_size": 0,
> "stats_invalid": "0",
> "stat_sum": { "num_bytes": 0,
> "num_objects": 0,
> "num_object_clones": 0,
> "num_object_copies": 0,
> "num_objects_missing_on_primary": 0,
> "num_objects_degraded": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 0,
> "num_whiteouts": 0,
> "num_read": 0,
> "num_read_kb": 0,
> "num_write": 0,
> "num_write_kb": 0,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 0,
> "num_bytes_recovered": 0,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0},
> "stat_cat_sum": {},
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91],
> "up_primary": 40,
> "acting_primary": 40},
> "empty": 1,
> "dne": 0,
> "incomplete": 0,
> "last_epoch_started": 348,
> "hit_set_history": { "current_last_update": "0'0",
> "current_last_stamp": "0.000000",
> "current_info": { "begin": "0.000000",
> "end": "0.000000",
> "version": "0'0"},
> "history": []}},
> { "peer": "110",
> "pgid": "7.171",
> "last_update": "0'0",
> "last_complete": "0'0",
> "log_tail": "0'0",
> "last_user_version": 0,
> "last_backfill": "MAX",
> "purged_snaps": "[]",
> "history": { "epoch_created": 0,
> "last_epoch_started": 0,
> "last_epoch_clean": 0,
> "last_epoch_split": 0,
> "same_up_since": 0,
> "same_interval_since": 0,
> "same_primary_since": 0,
> "last_scrub": "0'0",
> "last_scrub_stamp": "0.000000",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "0.000000",
> "last_clean_scrub_stamp": "0.000000"},
> "stats": { "version": "0'0",
> "reported_seq": "0",
> "reported_epoch": "0",
> "state": "inactive",
> "last_fresh": "0.000000",
> "last_change": "0.000000",
> "last_active": "0.000000",
> "last_clean": "0.000000",
> "last_became_active": "0.000000",
> "last_unstale": "0.000000",
> "mapping_epoch": 0,
> "log_start": "0'0",
> "ondisk_log_start": "0'0",
> "created": 0,
> "last_epoch_clean": 0,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "0'0",
> "last_scrub_stamp": "0.000000",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "0.000000",
> "last_clean_scrub_stamp": "0.000000",
> "log_size": 0,
> "ondisk_log_size": 0,
> "stats_invalid": "0",
> "stat_sum": { "num_bytes": 0,
> "num_objects": 0,
> "num_object_clones": 0,
> "num_object_copies": 0,
> "num_objects_missing_on_primary": 0,
> "num_objects_degraded": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 0,
> "num_whiteouts": 0,
> "num_read": 0,
> "num_read_kb": 0,
> "num_write": 0,
> "num_write_kb": 0,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 0,
> "num_bytes_recovered": 0,
> "num_keys_recovered": 0,
> "num_objects_omap": 0,
> "num_objects_hit_set_archive": 0},
> "stat_cat_sum": {},
> "up": [],
> "acting": [],
> "up_primary": -1,
> "acting_primary": -1},
> "empty": 1,
> "dne": 1,
> "incomplete": 0,
> "last_epoch_started": 0,
> "hit_set_history": { "current_last_update": "0'0",
> "current_last_stamp": "0.000000",
> "current_info": { "begin": "0.000000",
> "end": "0.000000",
> "version": "0'0"},
> "history": []}}],
> "recovery_state": [
> { "name": "Started\/Primary\/Peering\/GetInfo",
> "enter_time": "2015-03-30 19:44:18.709317",
> "requested_info_from": [
> { "osd": "0"},
> { "osd": "5"},
> { "osd": "10"},
> { "osd": "22"},
> { "osd": "54"},
> { "osd": "91"},
> { "osd": "92"},
> { "osd": "113"},
> { "osd": "114"}]},
> { "name": "Started\/Primary\/Peering",
> "enter_time": "2015-03-30 19:44:18.709316",
> "past_intervals": [
> { "first": 342,
> "last": 346,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 114],
> "acting": [
> 40,
> 92,
> 114,
> 40,
> 40]},
> { "first": 347,
> "last": 353,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48],
> "acting": [
> 40,
> 92,
> 48,
> 40,
> 40]},
> { "first": 354,
> "last": 356,
> "maybe_went_rw": 1,
> "up": [
> 92,
> 48],
> "acting": [
> 92,
> 48,
> 92,
> 92]},
> { "first": 357,
> "last": 359,
> "maybe_went_rw": 1,
> "up": [
> 113,
> 48,
> 114],
> "acting": [
> 113,
> 48,
> 114,
> 113,
> 113]},
> { "first": 360,
> "last": 361,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48],
> "acting": [
> 40,
> 92,
> 48,
> 40,
> 40]},
> { "first": 362,
> "last": 364,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92],
> "acting": [
> 40,
> 92,
> 40,
> 40]},
> { "first": 365,
> "last": 369,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 114],
> "acting": [
> 40,
> 92,
> 114,
> 40,
> 40]},
> { "first": 370,
> "last": 379,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48],
> "acting": [
> 40,
> 92,
> 48,
> 40,
> 40]},
> { "first": 380,
> "last": 400,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 401,
> "last": 409,
> "maybe_went_rw": 1,
> "up": [
> 92,
> 48,
> 91],
> "acting": [
> 92,
> 48,
> 91,
> 92,
> 92]},
> { "first": 410,
> "last": 414,
> "maybe_went_rw": 1,
> "up": [
> 113,
> 48,
> 114,
> 0],
> "acting": [
> 113,
> 48,
> 114,
> 0,
> 113,
> 113]},
> { "first": 415,
> "last": 435,
> "maybe_went_rw": 1,
> "up": [
> 113,
> 48,
> 114,
> 10],
> "acting": [
> 113,
> 48,
> 114,
> 10,
> 113,
> 113]},
> { "first": 436,
> "last": 442,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 443,
> "last": 446,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48],
> "acting": [
> 40,
> 92,
> 48,
> 40,
> 40]},
> { "first": 447,
> "last": 457,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48],
> "acting": [
> 40,
> 48,
> 40,
> 40]},
> { "first": 458,
> "last": 460,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 10],
> "acting": [
> 40,
> 48,
> 10,
> 40,
> 40]},
> { "first": 461,
> "last": 466,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 22],
> "acting": [
> 40,
> 48,
> 22,
> 40,
> 40]},
> { "first": 467,
> "last": 478,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 22,
> 5],
> "acting": [
> 40,
> 48,
> 22,
> 5,
> 40,
> 40]},
> { "first": 479,
> "last": 489,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 22,
> 110],
> "acting": [
> 40,
> 48,
> 22,
> 110,
> 40,
> 40]},
> { "first": 490,
> "last": 496,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 22,
> 0],
> "acting": [
> 40,
> 48,
> 22,
> 0,
> 40,
> 40]},
> { "first": 497,
> "last": 507,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 114,
> 10],
> "acting": [
> 40,
> 48,
> 114,
> 10,
> 40,
> 40]},
> { "first": 508,
> "last": 511,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 48,
> 54,
> 91],
> "acting": [
> 40,
> 48,
> 54,
> 91,
> 40,
> 40]},
> { "first": 512,
> "last": 579,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 580,
> "last": 580,
> "maybe_went_rw": 0,
> "up": [
> 40,
> 92,
> 91],
> "acting": [
> 40,
> 92,
> 91,
> 40,
> 40]},
> { "first": 581,
> "last": 591,
> "maybe_went_rw": 1,
> "up": [
> 92,
> 91],
> "acting": [
> 92,
> 91,
> 92,
> 92]},
> { "first": 592,
> "last": 595,
> "maybe_went_rw": 1,
> "up": [
> 113,
> 114,
> 22,
> 0],
> "acting": [
> 113,
> 114,
> 22,
> 0,
> 113,
> 113]},
> { "first": 596,
> "last": 599,
> "maybe_went_rw": 1,
> "up": [
> 113,
> 48,
> 114,
> 10],
> "acting": [
> 113,
> 48,
> 114,
> 10,
> 113,
> 113]},
> { "first": 600,
> "last": 606,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 607,
> "last": 607,
> "maybe_went_rw": 0,
> "up": [
> 92,
> 91],
> "acting": [
> 92,
> 91,
> 92,
> 92]},
> { "first": 608,
> "last": 616,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 617,
> "last": 625,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 91],
> "acting": [
> 40,
> 92,
> 91,
> 40,
> 40]},
> { "first": 626,
> "last": 632,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 114,
> 10],
> "acting": [
> 40,
> 92,
> 114,
> 10,
> 40,
> 40]},
> { "first": 633,
> "last": 639,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 640,
> "last": 643,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 91],
> "acting": [
> 40,
> 92,
> 91,
> 40,
> 40]},
> { "first": 644,
> "last": 662,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 114,
> 10],
> "acting": [
> 40,
> 92,
> 114,
> 10,
> 40,
> 40]},
> { "first": 663,
> "last": 679,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 91],
> "acting": [
> 40,
> 92,
> 48,
> 91,
> 40,
> 40]},
> { "first": 680,
> "last": 682,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48],
> "acting": [
> 40,
> 92,
> 48,
> 40,
> 40]},
> { "first": 683,
> "last": 687,
> "maybe_went_rw": 1,
> "up": [
> 40,
> 92,
> 48,
> 10],
> "acting": [
> 40,
> 92,
> 48,
> 10,
> 40,
> 40]}],
> "probing_osds": [
> "0",
> "5",
> "10",
> "22",
> "40",
> "48",
> "54",
> "91",
> "92",
> "110",
> "113",
> "114"],
> "down_osds_we_would_probe": [],
> "peering_blocked_by": []},
> { "name": "Started",
> "enter_time": "2015-03-30 19:44:18.709312"}],
> "agent_state": {}}
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Force an OSD to try to peer
2015-03-31 17:07 ` Robert LeBlanc
@ 2015-03-31 17:36 ` Sage Weil
2015-03-31 18:08 ` Robert LeBlanc
0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2015-03-31 17:36 UTC (permalink / raw)
To: Robert LeBlanc; +Cc: Ceph-User, ceph-devel
On Tue, 31 Mar 2015, Robert LeBlanc wrote:
> Turns out jumbo frames was not set on all the switch ports. Once that
> was resolved the cluster quickly became healthy.
I always hesitate to point the finger at the jumbo frames configuration
but almost every time that is the culprit!
Thanks for the update. :)
sage
>
> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
> > I've been working at this peering problem all day. I've done a lot of
> > testing at the network layer and I just don't believe that we have a problem
> > that would prevent OSDs from peering. When looking though osd_debug 20/20
> > logs, it just doesn't look like the OSDs are trying to peer. I don't know if
> > it is because there are so many outstanding creations or what. OSDs will
> > peer with OSDs on other hosts, but for reason only chooses a certain number
> > and not one that it needs to finish the peering process.
> >
> > I've check: firewall, open files, number of threads allowed. These usually
> > have given me an error in the logs that helped me fix the problem.
> >
> > I can't find a configuration item that specifies how many peers an OSD
> > should contact or anything that would be artificially limiting the peering
> > connections. I've restarted the OSDs a number of times, as well as rebooting
> > the hosts. I beleive if the OSDs finish peering everything will clear up. I
> > can't find anything in pg query that would help me figure out what is
> > blocking it (peering blocked by is empty). The PGs are scattered across all
> > the hosts so we can't pin it down to a specific host.
> >
> > Any ideas on what to try would be appreciated.
> >
> > [ulhglive-root@ceph9 ~]# ceph --version
> > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> > [ulhglive-root@ceph9 ~]# ceph status
> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
> > health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
> > inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
> > monmap e2: 3 mons at
> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
> > election epoch 30, quorum 0,1,2 mon1,mon2,mon3
> > osdmap e704: 120 osds: 120 up, 120 in
> > pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> > 11447 MB used, 436 TB / 436 TB avail
> > 727 active+clean
> > 990 peering
> > 37 creating+peering
> > 1 down+peering
> > 290 remapped+peering
> > 3 creating+remapped+peering
> >
> > { "state": "peering",
> > "epoch": 707,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "info": { "pgid": "7.171",
> > "last_update": "0'0",
> > "last_complete": "0'0",
> > "log_tail": "0'0",
> > "last_user_version": 0,
> > "last_backfill": "MAX",
> > "purged_snaps": "[]",
> > "history": { "epoch_created": 293,
> > "last_epoch_started": 343,
> > "last_epoch_clean": 343,
> > "last_epoch_split": 0,
> > "same_up_since": 688,
> > "same_interval_since": 688,
> > "same_primary_since": 608,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_clean_scrub_stamp": "0.000000"},
> > "stats": { "version": "0'0",
> > "reported_seq": "326",
> > "reported_epoch": "707",
> > "state": "peering",
> > "last_fresh": "2015-03-30 20:10:39.509855",
> > "last_change": "2015-03-30 19:44:17.361601",
> > "last_active": "2015-03-30 11:37:56.956417",
> > "last_clean": "2015-03-30 11:37:56.956417",
> > "last_became_active": "0.000000",
> > "last_unstale": "2015-03-30 20:10:39.509855",
> > "mapping_epoch": 683,
> > "log_start": "0'0",
> > "ondisk_log_start": "0'0",
> > "created": 293,
> > "last_epoch_clean": 343,
> > "parent": "0.0",
> > "parent_split_bits": 0,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_clean_scrub_stamp": "0.000000",
> > "log_size": 0,
> > "ondisk_log_size": 0,
> > "stats_invalid": "0",
> > "stat_sum": { "num_bytes": 0,
> > "num_objects": 0,
> > "num_object_clones": 0,
> > "num_object_copies": 0,
> > "num_objects_missing_on_primary": 0,
> > "num_objects_degraded": 0,
> > "num_objects_unfound": 0,
> > "num_objects_dirty": 0,
> > "num_whiteouts": 0,
> > "num_read": 0,
> > "num_read_kb": 0,
> > "num_write": 0,
> > "num_write_kb": 0,
> > "num_scrub_errors": 0,
> > "num_shallow_scrub_errors": 0,
> > "num_deep_scrub_errors": 0,
> > "num_objects_recovered": 0,
> > "num_bytes_recovered": 0,
> > "num_keys_recovered": 0,
> > "num_objects_omap": 0,
> > "num_objects_hit_set_archive": 0},
> > "stat_cat_sum": {},
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "up_primary": 40,
> > "acting_primary": 40},
> > "empty": 1,
> > "dne": 0,
> > "incomplete": 0,
> > "last_epoch_started": 348,
> > "hit_set_history": { "current_last_update": "0'0",
> > "current_last_stamp": "0.000000",
> > "current_info": { "begin": "0.000000",
> > "end": "0.000000",
> > "version": "0'0"},
> > "history": []}},
> > "peer_info": [
> > { "peer": "48",
> > "pgid": "7.171",
> > "last_update": "0'0",
> > "last_complete": "0'0",
> > "log_tail": "0'0",
> > "last_user_version": 0,
> > "last_backfill": "MAX",
> > "purged_snaps": "[]",
> > "history": { "epoch_created": 293,
> > "last_epoch_started": 343,
> > "last_epoch_clean": 343,
> > "last_epoch_split": 0,
> > "same_up_since": 688,
> > "same_interval_since": 688,
> > "same_primary_since": 608,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_clean_scrub_stamp": "0.000000"},
> > "stats": { "version": "0'0",
> > "reported_seq": "24",
> > "reported_epoch": "348",
> > "state": "peering",
> > "last_fresh": "2015-03-30 11:39:02.979742",
> > "last_change": "2015-03-30 11:39:01.650897",
> > "last_active": "2015-03-30 11:37:56.956417",
> > "last_clean": "2015-03-30 11:37:56.956417",
> > "last_became_active": "0.000000",
> > "last_unstale": "2015-03-30 11:39:02.979742",
> > "mapping_epoch": 683,
> > "log_start": "0'0",
> > "ondisk_log_start": "0'0",
> > "created": 293,
> > "last_epoch_clean": 343,
> > "parent": "0.0",
> > "parent_split_bits": 0,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> > "last_clean_scrub_stamp": "0.000000",
> > "log_size": 0,
> > "ondisk_log_size": 0,
> > "stats_invalid": "0",
> > "stat_sum": { "num_bytes": 0,
> > "num_objects": 0,
> > "num_object_clones": 0,
> > "num_object_copies": 0,
> > "num_objects_missing_on_primary": 0,
> > "num_objects_degraded": 0,
> > "num_objects_unfound": 0,
> > "num_objects_dirty": 0,
> > "num_whiteouts": 0,
> > "num_read": 0,
> > "num_read_kb": 0,
> > "num_write": 0,
> > "num_write_kb": 0,
> > "num_scrub_errors": 0,
> > "num_shallow_scrub_errors": 0,
> > "num_deep_scrub_errors": 0,
> > "num_objects_recovered": 0,
> > "num_bytes_recovered": 0,
> > "num_keys_recovered": 0,
> > "num_objects_omap": 0,
> > "num_objects_hit_set_archive": 0},
> > "stat_cat_sum": {},
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "up_primary": 40,
> > "acting_primary": 40},
> > "empty": 1,
> > "dne": 0,
> > "incomplete": 0,
> > "last_epoch_started": 348,
> > "hit_set_history": { "current_last_update": "0'0",
> > "current_last_stamp": "0.000000",
> > "current_info": { "begin": "0.000000",
> > "end": "0.000000",
> > "version": "0'0"},
> > "history": []}},
> > { "peer": "110",
> > "pgid": "7.171",
> > "last_update": "0'0",
> > "last_complete": "0'0",
> > "log_tail": "0'0",
> > "last_user_version": 0,
> > "last_backfill": "MAX",
> > "purged_snaps": "[]",
> > "history": { "epoch_created": 0,
> > "last_epoch_started": 0,
> > "last_epoch_clean": 0,
> > "last_epoch_split": 0,
> > "same_up_since": 0,
> > "same_interval_since": 0,
> > "same_primary_since": 0,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "0.000000",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "0.000000",
> > "last_clean_scrub_stamp": "0.000000"},
> > "stats": { "version": "0'0",
> > "reported_seq": "0",
> > "reported_epoch": "0",
> > "state": "inactive",
> > "last_fresh": "0.000000",
> > "last_change": "0.000000",
> > "last_active": "0.000000",
> > "last_clean": "0.000000",
> > "last_became_active": "0.000000",
> > "last_unstale": "0.000000",
> > "mapping_epoch": 0,
> > "log_start": "0'0",
> > "ondisk_log_start": "0'0",
> > "created": 0,
> > "last_epoch_clean": 0,
> > "parent": "0.0",
> > "parent_split_bits": 0,
> > "last_scrub": "0'0",
> > "last_scrub_stamp": "0.000000",
> > "last_deep_scrub": "0'0",
> > "last_deep_scrub_stamp": "0.000000",
> > "last_clean_scrub_stamp": "0.000000",
> > "log_size": 0,
> > "ondisk_log_size": 0,
> > "stats_invalid": "0",
> > "stat_sum": { "num_bytes": 0,
> > "num_objects": 0,
> > "num_object_clones": 0,
> > "num_object_copies": 0,
> > "num_objects_missing_on_primary": 0,
> > "num_objects_degraded": 0,
> > "num_objects_unfound": 0,
> > "num_objects_dirty": 0,
> > "num_whiteouts": 0,
> > "num_read": 0,
> > "num_read_kb": 0,
> > "num_write": 0,
> > "num_write_kb": 0,
> > "num_scrub_errors": 0,
> > "num_shallow_scrub_errors": 0,
> > "num_deep_scrub_errors": 0,
> > "num_objects_recovered": 0,
> > "num_bytes_recovered": 0,
> > "num_keys_recovered": 0,
> > "num_objects_omap": 0,
> > "num_objects_hit_set_archive": 0},
> > "stat_cat_sum": {},
> > "up": [],
> > "acting": [],
> > "up_primary": -1,
> > "acting_primary": -1},
> > "empty": 1,
> > "dne": 1,
> > "incomplete": 0,
> > "last_epoch_started": 0,
> > "hit_set_history": { "current_last_update": "0'0",
> > "current_last_stamp": "0.000000",
> > "current_info": { "begin": "0.000000",
> > "end": "0.000000",
> > "version": "0'0"},
> > "history": []}}],
> > "recovery_state": [
> > { "name": "Started\/Primary\/Peering\/GetInfo",
> > "enter_time": "2015-03-30 19:44:18.709317",
> > "requested_info_from": [
> > { "osd": "0"},
> > { "osd": "5"},
> > { "osd": "10"},
> > { "osd": "22"},
> > { "osd": "54"},
> > { "osd": "91"},
> > { "osd": "92"},
> > { "osd": "113"},
> > { "osd": "114"}]},
> > { "name": "Started\/Primary\/Peering",
> > "enter_time": "2015-03-30 19:44:18.709316",
> > "past_intervals": [
> > { "first": 342,
> > "last": 346,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 114],
> > "acting": [
> > 40,
> > 92,
> > 114,
> > 40,
> > 40]},
> > { "first": 347,
> > "last": 353,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 40,
> > 40]},
> > { "first": 354,
> > "last": 356,
> > "maybe_went_rw": 1,
> > "up": [
> > 92,
> > 48],
> > "acting": [
> > 92,
> > 48,
> > 92,
> > 92]},
> > { "first": 357,
> > "last": 359,
> > "maybe_went_rw": 1,
> > "up": [
> > 113,
> > 48,
> > 114],
> > "acting": [
> > 113,
> > 48,
> > 114,
> > 113,
> > 113]},
> > { "first": 360,
> > "last": 361,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 40,
> > 40]},
> > { "first": 362,
> > "last": 364,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92],
> > "acting": [
> > 40,
> > 92,
> > 40,
> > 40]},
> > { "first": 365,
> > "last": 369,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 114],
> > "acting": [
> > 40,
> > 92,
> > 114,
> > 40,
> > 40]},
> > { "first": 370,
> > "last": 379,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 40,
> > 40]},
> > { "first": 380,
> > "last": 400,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 401,
> > "last": 409,
> > "maybe_went_rw": 1,
> > "up": [
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 92,
> > 48,
> > 91,
> > 92,
> > 92]},
> > { "first": 410,
> > "last": 414,
> > "maybe_went_rw": 1,
> > "up": [
> > 113,
> > 48,
> > 114,
> > 0],
> > "acting": [
> > 113,
> > 48,
> > 114,
> > 0,
> > 113,
> > 113]},
> > { "first": 415,
> > "last": 435,
> > "maybe_went_rw": 1,
> > "up": [
> > 113,
> > 48,
> > 114,
> > 10],
> > "acting": [
> > 113,
> > 48,
> > 114,
> > 10,
> > 113,
> > 113]},
> > { "first": 436,
> > "last": 442,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 443,
> > "last": 446,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 40,
> > 40]},
> > { "first": 447,
> > "last": 457,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48],
> > "acting": [
> > 40,
> > 48,
> > 40,
> > 40]},
> > { "first": 458,
> > "last": 460,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 10],
> > "acting": [
> > 40,
> > 48,
> > 10,
> > 40,
> > 40]},
> > { "first": 461,
> > "last": 466,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 22],
> > "acting": [
> > 40,
> > 48,
> > 22,
> > 40,
> > 40]},
> > { "first": 467,
> > "last": 478,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 22,
> > 5],
> > "acting": [
> > 40,
> > 48,
> > 22,
> > 5,
> > 40,
> > 40]},
> > { "first": 479,
> > "last": 489,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 22,
> > 110],
> > "acting": [
> > 40,
> > 48,
> > 22,
> > 110,
> > 40,
> > 40]},
> > { "first": 490,
> > "last": 496,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 22,
> > 0],
> > "acting": [
> > 40,
> > 48,
> > 22,
> > 0,
> > 40,
> > 40]},
> > { "first": 497,
> > "last": 507,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 114,
> > 10],
> > "acting": [
> > 40,
> > 48,
> > 114,
> > 10,
> > 40,
> > 40]},
> > { "first": 508,
> > "last": 511,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 48,
> > 54,
> > 91],
> > "acting": [
> > 40,
> > 48,
> > 54,
> > 91,
> > 40,
> > 40]},
> > { "first": 512,
> > "last": 579,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 580,
> > "last": 580,
> > "maybe_went_rw": 0,
> > "up": [
> > 40,
> > 92,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 91,
> > 40,
> > 40]},
> > { "first": 581,
> > "last": 591,
> > "maybe_went_rw": 1,
> > "up": [
> > 92,
> > 91],
> > "acting": [
> > 92,
> > 91,
> > 92,
> > 92]},
> > { "first": 592,
> > "last": 595,
> > "maybe_went_rw": 1,
> > "up": [
> > 113,
> > 114,
> > 22,
> > 0],
> > "acting": [
> > 113,
> > 114,
> > 22,
> > 0,
> > 113,
> > 113]},
> > { "first": 596,
> > "last": 599,
> > "maybe_went_rw": 1,
> > "up": [
> > 113,
> > 48,
> > 114,
> > 10],
> > "acting": [
> > 113,
> > 48,
> > 114,
> > 10,
> > 113,
> > 113]},
> > { "first": 600,
> > "last": 606,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 607,
> > "last": 607,
> > "maybe_went_rw": 0,
> > "up": [
> > 92,
> > 91],
> > "acting": [
> > 92,
> > 91,
> > 92,
> > 92]},
> > { "first": 608,
> > "last": 616,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 617,
> > "last": 625,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 91,
> > 40,
> > 40]},
> > { "first": 626,
> > "last": 632,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 114,
> > 10],
> > "acting": [
> > 40,
> > 92,
> > 114,
> > 10,
> > 40,
> > 40]},
> > { "first": 633,
> > "last": 639,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 640,
> > "last": 643,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 91,
> > 40,
> > 40]},
> > { "first": 644,
> > "last": 662,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 114,
> > 10],
> > "acting": [
> > 40,
> > 92,
> > 114,
> > 10,
> > 40,
> > 40]},
> > { "first": 663,
> > "last": 679,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 91],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 91,
> > 40,
> > 40]},
> > { "first": 680,
> > "last": 682,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 40,
> > 40]},
> > { "first": 683,
> > "last": 687,
> > "maybe_went_rw": 1,
> > "up": [
> > 40,
> > 92,
> > 48,
> > 10],
> > "acting": [
> > 40,
> > 92,
> > 48,
> > 10,
> > 40,
> > 40]}],
> > "probing_osds": [
> > "0",
> > "5",
> > "10",
> > "22",
> > "40",
> > "48",
> > "54",
> > "91",
> > "92",
> > "110",
> > "113",
> > "114"],
> > "down_osds_we_would_probe": [],
> > "peering_blocked_by": []},
> > { "name": "Started",
> > "enter_time": "2015-03-30 19:44:18.709312"}],
> > "agent_state": {}}
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Force an OSD to try to peer
2015-03-31 17:36 ` Sage Weil
@ 2015-03-31 18:08 ` Robert LeBlanc
[not found] ` <CAANLjFp0pStF0iBw21XSv8a03YX4iy74rZSr_MD8e1mGR0KCAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Robert LeBlanc @ 2015-03-31 18:08 UTC (permalink / raw)
To: Sage Weil; +Cc: Ceph-User, ceph-devel
I was desperate for anything after exhausting every other possibility
I could think of. Maybe I should put a checklist in the Ceph docs of
things to look for.
Thanks,
On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>> Turns out jumbo frames was not set on all the switch ports. Once that
>> was resolved the cluster quickly became healthy.
>
> I always hesitate to point the finger at the jumbo frames configuration
> but almost every time that is the culprit!
>
> Thanks for the update. :)
> sage
>
>
>
>>
>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
>> > I've been working at this peering problem all day. I've done a lot of
>> > testing at the network layer and I just don't believe that we have a problem
>> > that would prevent OSDs from peering. When looking though osd_debug 20/20
>> > logs, it just doesn't look like the OSDs are trying to peer. I don't know if
>> > it is because there are so many outstanding creations or what. OSDs will
>> > peer with OSDs on other hosts, but for reason only chooses a certain number
>> > and not one that it needs to finish the peering process.
>> >
>> > I've check: firewall, open files, number of threads allowed. These usually
>> > have given me an error in the logs that helped me fix the problem.
>> >
>> > I can't find a configuration item that specifies how many peers an OSD
>> > should contact or anything that would be artificially limiting the peering
>> > connections. I've restarted the OSDs a number of times, as well as rebooting
>> > the hosts. I beleive if the OSDs finish peering everything will clear up. I
>> > can't find anything in pg query that would help me figure out what is
>> > blocking it (peering blocked by is empty). The PGs are scattered across all
>> > the hosts so we can't pin it down to a specific host.
>> >
>> > Any ideas on what to try would be appreciated.
>> >
>> > [ulhglive-root@ceph9 ~]# ceph --version
>> > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>> > [ulhglive-root@ceph9 ~]# ceph status
>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>> > health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs stuck
>> > inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>> > monmap e2: 3 mons at
>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0},
>> > election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>> > osdmap e704: 120 osds: 120 up, 120 in
>> > pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>> > 11447 MB used, 436 TB / 436 TB avail
>> > 727 active+clean
>> > 990 peering
>> > 37 creating+peering
>> > 1 down+peering
>> > 290 remapped+peering
>> > 3 creating+remapped+peering
>> >
>> > { "state": "peering",
>> > "epoch": 707,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "info": { "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 293,
>> > "last_epoch_started": 343,
>> > "last_epoch_clean": 343,
>> > "last_epoch_split": 0,
>> > "same_up_since": 688,
>> > "same_interval_since": 688,
>> > "same_primary_since": 608,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "326",
>> > "reported_epoch": "707",
>> > "state": "peering",
>> > "last_fresh": "2015-03-30 20:10:39.509855",
>> > "last_change": "2015-03-30 19:44:17.361601",
>> > "last_active": "2015-03-30 11:37:56.956417",
>> > "last_clean": "2015-03-30 11:37:56.956417",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "2015-03-30 20:10:39.509855",
>> > "mapping_epoch": 683,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 293,
>> > "last_epoch_clean": 343,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "up_primary": 40,
>> > "acting_primary": 40},
>> > "empty": 1,
>> > "dne": 0,
>> > "incomplete": 0,
>> > "last_epoch_started": 348,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}},
>> > "peer_info": [
>> > { "peer": "48",
>> > "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 293,
>> > "last_epoch_started": 343,
>> > "last_epoch_clean": 343,
>> > "last_epoch_split": 0,
>> > "same_up_since": 688,
>> > "same_interval_since": 688,
>> > "same_primary_since": 608,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "24",
>> > "reported_epoch": "348",
>> > "state": "peering",
>> > "last_fresh": "2015-03-30 11:39:02.979742",
>> > "last_change": "2015-03-30 11:39:01.650897",
>> > "last_active": "2015-03-30 11:37:56.956417",
>> > "last_clean": "2015-03-30 11:37:56.956417",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "2015-03-30 11:39:02.979742",
>> > "mapping_epoch": 683,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 293,
>> > "last_epoch_clean": 343,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "up_primary": 40,
>> > "acting_primary": 40},
>> > "empty": 1,
>> > "dne": 0,
>> > "incomplete": 0,
>> > "last_epoch_started": 348,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}},
>> > { "peer": "110",
>> > "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 0,
>> > "last_epoch_started": 0,
>> > "last_epoch_clean": 0,
>> > "last_epoch_split": 0,
>> > "same_up_since": 0,
>> > "same_interval_since": 0,
>> > "same_primary_since": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "0.000000",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "0.000000",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "0",
>> > "reported_epoch": "0",
>> > "state": "inactive",
>> > "last_fresh": "0.000000",
>> > "last_change": "0.000000",
>> > "last_active": "0.000000",
>> > "last_clean": "0.000000",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "0.000000",
>> > "mapping_epoch": 0,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 0,
>> > "last_epoch_clean": 0,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "0.000000",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "0.000000",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [],
>> > "acting": [],
>> > "up_primary": -1,
>> > "acting_primary": -1},
>> > "empty": 1,
>> > "dne": 1,
>> > "incomplete": 0,
>> > "last_epoch_started": 0,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}}],
>> > "recovery_state": [
>> > { "name": "Started\/Primary\/Peering\/GetInfo",
>> > "enter_time": "2015-03-30 19:44:18.709317",
>> > "requested_info_from": [
>> > { "osd": "0"},
>> > { "osd": "5"},
>> > { "osd": "10"},
>> > { "osd": "22"},
>> > { "osd": "54"},
>> > { "osd": "91"},
>> > { "osd": "92"},
>> > { "osd": "113"},
>> > { "osd": "114"}]},
>> > { "name": "Started\/Primary\/Peering",
>> > "enter_time": "2015-03-30 19:44:18.709316",
>> > "past_intervals": [
>> > { "first": 342,
>> > "last": 346,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 40,
>> > 40]},
>> > { "first": 347,
>> > "last": 353,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 354,
>> > "last": 356,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 48],
>> > "acting": [
>> > 92,
>> > 48,
>> > 92,
>> > 92]},
>> > { "first": 357,
>> > "last": 359,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 113,
>> > 113]},
>> > { "first": 360,
>> > "last": 361,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 362,
>> > "last": 364,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92],
>> > "acting": [
>> > 40,
>> > 92,
>> > 40,
>> > 40]},
>> > { "first": 365,
>> > "last": 369,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 40,
>> > 40]},
>> > { "first": 370,
>> > "last": 379,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 380,
>> > "last": 400,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 401,
>> > "last": 409,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 92,
>> > 48,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 410,
>> > "last": 414,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 0],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 0,
>> > 113,
>> > 113]},
>> > { "first": 415,
>> > "last": 435,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 10,
>> > 113,
>> > 113]},
>> > { "first": 436,
>> > "last": 442,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 443,
>> > "last": 446,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 447,
>> > "last": 457,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48],
>> > "acting": [
>> > 40,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 458,
>> > "last": 460,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 10],
>> > "acting": [
>> > 40,
>> > 48,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 461,
>> > "last": 466,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 40,
>> > 40]},
>> > { "first": 467,
>> > "last": 478,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 5],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 5,
>> > 40,
>> > 40]},
>> > { "first": 479,
>> > "last": 489,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 110],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 110,
>> > 40,
>> > 40]},
>> > { "first": 490,
>> > "last": 496,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 0],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 0,
>> > 40,
>> > 40]},
>> > { "first": 497,
>> > "last": 507,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 48,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 508,
>> > "last": 511,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 54,
>> > 91],
>> > "acting": [
>> > 40,
>> > 48,
>> > 54,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 512,
>> > "last": 579,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 580,
>> > "last": 580,
>> > "maybe_went_rw": 0,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 581,
>> > "last": 591,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 91],
>> > "acting": [
>> > 92,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 592,
>> > "last": 595,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 114,
>> > 22,
>> > 0],
>> > "acting": [
>> > 113,
>> > 114,
>> > 22,
>> > 0,
>> > 113,
>> > 113]},
>> > { "first": 596,
>> > "last": 599,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 10,
>> > 113,
>> > 113]},
>> > { "first": 600,
>> > "last": 606,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 607,
>> > "last": 607,
>> > "maybe_went_rw": 0,
>> > "up": [
>> > 92,
>> > 91],
>> > "acting": [
>> > 92,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 608,
>> > "last": 616,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 617,
>> > "last": 625,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 626,
>> > "last": 632,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 633,
>> > "last": 639,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 640,
>> > "last": 643,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 644,
>> > "last": 662,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 663,
>> > "last": 679,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 680,
>> > "last": 682,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 683,
>> > "last": 687,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 10,
>> > 40,
>> > 40]}],
>> > "probing_osds": [
>> > "0",
>> > "5",
>> > "10",
>> > "22",
>> > "40",
>> > "48",
>> > "54",
>> > "91",
>> > "92",
>> > "110",
>> > "113",
>> > "114"],
>> > "down_osds_we_would_probe": [],
>> > "peering_blocked_by": []},
>> > { "name": "Started",
>> > "enter_time": "2015-03-30 19:44:18.709312"}],
>> > "agent_state": {}}
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Force an OSD to try to peer
[not found] ` <CAANLjFp0pStF0iBw21XSv8a03YX4iy74rZSr_MD8e1mGR0KCAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-31 18:10 ` Somnath Roy
2015-03-31 18:20 ` [ceph-users] " Robert LeBlanc
[not found] ` <755F6B91B3BE364F9BCA11EA3F9E0C6F28CB737E-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
0 siblings, 2 replies; 8+ messages in thread
From: Somnath Roy @ 2015-03-31 18:10 UTC (permalink / raw)
To: Robert LeBlanc, Sage Weil; +Cc: ceph-devel, Ceph-User
But, do we know why Jumbo frames may have an impact on peering ?
In our setup so far, we haven't enabled jumbo frames other than performance reason (if at all).
Thanks & Regards
Somnath
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org] On Behalf Of Robert LeBlanc
Sent: Tuesday, March 31, 2015 11:08 AM
To: Sage Weil
Cc: ceph-devel; Ceph-User
Subject: Re: [ceph-users] Force an OSD to try to peer
I was desperate for anything after exhausting every other possibility I could think of. Maybe I should put a checklist in the Ceph docs of things to look for.
Thanks,
On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org> wrote:
> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>> Turns out jumbo frames was not set on all the switch ports. Once that
>> was resolved the cluster quickly became healthy.
>
> I always hesitate to point the finger at the jumbo frames
> configuration but almost every time that is the culprit!
>
> Thanks for the update. :)
> sage
>
>
>
>>
>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> > I've been working at this peering problem all day. I've done a lot
>> > of testing at the network layer and I just don't believe that we
>> > have a problem that would prevent OSDs from peering. When looking
>> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
>> > trying to peer. I don't know if it is because there are so many
>> > outstanding creations or what. OSDs will peer with OSDs on other
>> > hosts, but for reason only chooses a certain number and not one that it needs to finish the peering process.
>> >
>> > I've check: firewall, open files, number of threads allowed. These
>> > usually have given me an error in the logs that helped me fix the problem.
>> >
>> > I can't find a configuration item that specifies how many peers an
>> > OSD should contact or anything that would be artificially limiting
>> > the peering connections. I've restarted the OSDs a number of times,
>> > as well as rebooting the hosts. I beleive if the OSDs finish
>> > peering everything will clear up. I can't find anything in pg query
>> > that would help me figure out what is blocking it (peering blocked
>> > by is empty). The PGs are scattered across all the hosts so we can't pin it down to a specific host.
>> >
>> > Any ideas on what to try would be appreciated.
>> >
>> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
>> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
>> > [ulhglive-root@ceph9 ~]# ceph status
>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>> > health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
>> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>> > monmap e2: 3 mons at
>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
>> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>> > osdmap e704: 120 osds: 120 up, 120 in
>> > pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>> > 11447 MB used, 436 TB / 436 TB avail
>> > 727 active+clean
>> > 990 peering
>> > 37 creating+peering
>> > 1 down+peering
>> > 290 remapped+peering
>> > 3 creating+remapped+peering
>> >
>> > { "state": "peering",
>> > "epoch": 707,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "info": { "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 293,
>> > "last_epoch_started": 343,
>> > "last_epoch_clean": 343,
>> > "last_epoch_split": 0,
>> > "same_up_since": 688,
>> > "same_interval_since": 688,
>> > "same_primary_since": 608,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "326",
>> > "reported_epoch": "707",
>> > "state": "peering",
>> > "last_fresh": "2015-03-30 20:10:39.509855",
>> > "last_change": "2015-03-30 19:44:17.361601",
>> > "last_active": "2015-03-30 11:37:56.956417",
>> > "last_clean": "2015-03-30 11:37:56.956417",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "2015-03-30 20:10:39.509855",
>> > "mapping_epoch": 683,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 293,
>> > "last_epoch_clean": 343,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "up_primary": 40,
>> > "acting_primary": 40},
>> > "empty": 1,
>> > "dne": 0,
>> > "incomplete": 0,
>> > "last_epoch_started": 348,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}},
>> > "peer_info": [
>> > { "peer": "48",
>> > "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 293,
>> > "last_epoch_started": 343,
>> > "last_epoch_clean": 343,
>> > "last_epoch_split": 0,
>> > "same_up_since": 688,
>> > "same_interval_since": 688,
>> > "same_primary_since": 608,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "24",
>> > "reported_epoch": "348",
>> > "state": "peering",
>> > "last_fresh": "2015-03-30 11:39:02.979742",
>> > "last_change": "2015-03-30 11:39:01.650897",
>> > "last_active": "2015-03-30 11:37:56.956417",
>> > "last_clean": "2015-03-30 11:37:56.956417",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "2015-03-30 11:39:02.979742",
>> > "mapping_epoch": 683,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 293,
>> > "last_epoch_clean": 343,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "up_primary": 40,
>> > "acting_primary": 40},
>> > "empty": 1,
>> > "dne": 0,
>> > "incomplete": 0,
>> > "last_epoch_started": 348,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}},
>> > { "peer": "110",
>> > "pgid": "7.171",
>> > "last_update": "0'0",
>> > "last_complete": "0'0",
>> > "log_tail": "0'0",
>> > "last_user_version": 0,
>> > "last_backfill": "MAX",
>> > "purged_snaps": "[]",
>> > "history": { "epoch_created": 0,
>> > "last_epoch_started": 0,
>> > "last_epoch_clean": 0,
>> > "last_epoch_split": 0,
>> > "same_up_since": 0,
>> > "same_interval_since": 0,
>> > "same_primary_since": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "0.000000",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "0.000000",
>> > "last_clean_scrub_stamp": "0.000000"},
>> > "stats": { "version": "0'0",
>> > "reported_seq": "0",
>> > "reported_epoch": "0",
>> > "state": "inactive",
>> > "last_fresh": "0.000000",
>> > "last_change": "0.000000",
>> > "last_active": "0.000000",
>> > "last_clean": "0.000000",
>> > "last_became_active": "0.000000",
>> > "last_unstale": "0.000000",
>> > "mapping_epoch": 0,
>> > "log_start": "0'0",
>> > "ondisk_log_start": "0'0",
>> > "created": 0,
>> > "last_epoch_clean": 0,
>> > "parent": "0.0",
>> > "parent_split_bits": 0,
>> > "last_scrub": "0'0",
>> > "last_scrub_stamp": "0.000000",
>> > "last_deep_scrub": "0'0",
>> > "last_deep_scrub_stamp": "0.000000",
>> > "last_clean_scrub_stamp": "0.000000",
>> > "log_size": 0,
>> > "ondisk_log_size": 0,
>> > "stats_invalid": "0",
>> > "stat_sum": { "num_bytes": 0,
>> > "num_objects": 0,
>> > "num_object_clones": 0,
>> > "num_object_copies": 0,
>> > "num_objects_missing_on_primary": 0,
>> > "num_objects_degraded": 0,
>> > "num_objects_unfound": 0,
>> > "num_objects_dirty": 0,
>> > "num_whiteouts": 0,
>> > "num_read": 0,
>> > "num_read_kb": 0,
>> > "num_write": 0,
>> > "num_write_kb": 0,
>> > "num_scrub_errors": 0,
>> > "num_shallow_scrub_errors": 0,
>> > "num_deep_scrub_errors": 0,
>> > "num_objects_recovered": 0,
>> > "num_bytes_recovered": 0,
>> > "num_keys_recovered": 0,
>> > "num_objects_omap": 0,
>> > "num_objects_hit_set_archive": 0},
>> > "stat_cat_sum": {},
>> > "up": [],
>> > "acting": [],
>> > "up_primary": -1,
>> > "acting_primary": -1},
>> > "empty": 1,
>> > "dne": 1,
>> > "incomplete": 0,
>> > "last_epoch_started": 0,
>> > "hit_set_history": { "current_last_update": "0'0",
>> > "current_last_stamp": "0.000000",
>> > "current_info": { "begin": "0.000000",
>> > "end": "0.000000",
>> > "version": "0'0"},
>> > "history": []}}],
>> > "recovery_state": [
>> > { "name": "Started\/Primary\/Peering\/GetInfo",
>> > "enter_time": "2015-03-30 19:44:18.709317",
>> > "requested_info_from": [
>> > { "osd": "0"},
>> > { "osd": "5"},
>> > { "osd": "10"},
>> > { "osd": "22"},
>> > { "osd": "54"},
>> > { "osd": "91"},
>> > { "osd": "92"},
>> > { "osd": "113"},
>> > { "osd": "114"}]},
>> > { "name": "Started\/Primary\/Peering",
>> > "enter_time": "2015-03-30 19:44:18.709316",
>> > "past_intervals": [
>> > { "first": 342,
>> > "last": 346,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 40,
>> > 40]},
>> > { "first": 347,
>> > "last": 353,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 354,
>> > "last": 356,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 48],
>> > "acting": [
>> > 92,
>> > 48,
>> > 92,
>> > 92]},
>> > { "first": 357,
>> > "last": 359,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 113,
>> > 113]},
>> > { "first": 360,
>> > "last": 361,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 362,
>> > "last": 364,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92],
>> > "acting": [
>> > 40,
>> > 92,
>> > 40,
>> > 40]},
>> > { "first": 365,
>> > "last": 369,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 40,
>> > 40]},
>> > { "first": 370,
>> > "last": 379,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 380,
>> > "last": 400,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 401,
>> > "last": 409,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 92,
>> > 48,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 410,
>> > "last": 414,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 0],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 0,
>> > 113,
>> > 113]},
>> > { "first": 415,
>> > "last": 435,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 10,
>> > 113,
>> > 113]},
>> > { "first": 436,
>> > "last": 442,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 443,
>> > "last": 446,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 447,
>> > "last": 457,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48],
>> > "acting": [
>> > 40,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 458,
>> > "last": 460,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 10],
>> > "acting": [
>> > 40,
>> > 48,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 461,
>> > "last": 466,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 40,
>> > 40]},
>> > { "first": 467,
>> > "last": 478,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 5],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 5,
>> > 40,
>> > 40]},
>> > { "first": 479,
>> > "last": 489,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 110],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 110,
>> > 40,
>> > 40]},
>> > { "first": 490,
>> > "last": 496,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 22,
>> > 0],
>> > "acting": [
>> > 40,
>> > 48,
>> > 22,
>> > 0,
>> > 40,
>> > 40]},
>> > { "first": 497,
>> > "last": 507,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 48,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 508,
>> > "last": 511,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 48,
>> > 54,
>> > 91],
>> > "acting": [
>> > 40,
>> > 48,
>> > 54,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 512,
>> > "last": 579,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 580,
>> > "last": 580,
>> > "maybe_went_rw": 0,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 581,
>> > "last": 591,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 92,
>> > 91],
>> > "acting": [
>> > 92,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 592,
>> > "last": 595,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 114,
>> > 22,
>> > 0],
>> > "acting": [
>> > 113,
>> > 114,
>> > 22,
>> > 0,
>> > 113,
>> > 113]},
>> > { "first": 596,
>> > "last": 599,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 113,
>> > 48,
>> > 114,
>> > 10],
>> > "acting": [
>> > 113,
>> > 48,
>> > 114,
>> > 10,
>> > 113,
>> > 113]},
>> > { "first": 600,
>> > "last": 606,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 607,
>> > "last": 607,
>> > "maybe_went_rw": 0,
>> > "up": [
>> > 92,
>> > 91],
>> > "acting": [
>> > 92,
>> > 91,
>> > 92,
>> > 92]},
>> > { "first": 608,
>> > "last": 616,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 617,
>> > "last": 625,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 626,
>> > "last": 632,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 633,
>> > "last": 639,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 640,
>> > "last": 643,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 644,
>> > "last": 662,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 114,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 114,
>> > 10,
>> > 40,
>> > 40]},
>> > { "first": 663,
>> > "last": 679,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 91],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 91,
>> > 40,
>> > 40]},
>> > { "first": 680,
>> > "last": 682,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 40,
>> > 40]},
>> > { "first": 683,
>> > "last": 687,
>> > "maybe_went_rw": 1,
>> > "up": [
>> > 40,
>> > 92,
>> > 48,
>> > 10],
>> > "acting": [
>> > 40,
>> > 92,
>> > 48,
>> > 10,
>> > 40,
>> > 40]}],
>> > "probing_osds": [
>> > "0",
>> > "5",
>> > "10",
>> > "22",
>> > "40",
>> > "48",
>> > "54",
>> > "91",
>> > "92",
>> > "110",
>> > "113",
>> > "114"],
>> > "down_osds_we_would_probe": [],
>> > "peering_blocked_by": []},
>> > { "name": "Started",
>> > "enter_time": "2015-03-30 19:44:18.709312"}],
>> > "agent_state": {}}
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-users] Force an OSD to try to peer
2015-03-31 18:10 ` Somnath Roy
@ 2015-03-31 18:20 ` Robert LeBlanc
[not found] ` <755F6B91B3BE364F9BCA11EA3F9E0C6F28CB737E-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
1 sibling, 0 replies; 8+ messages in thread
From: Robert LeBlanc @ 2015-03-31 18:20 UTC (permalink / raw)
To: Somnath Roy; +Cc: Sage Weil, ceph-devel, Ceph-User
At the L2 level, if the hosts and switches don't accept jumbo frames,
they just drop them because they are too big. They are not fragmented
because they don't go through a router. My problem is that OSDs were
able to peer with other OSDs on the host, but my guess is that they
never sent/received packets larger than 1500 bytes. Then other OSD
processes tried to peer but sent packets larger than 1500 bytes
causing the packets to be dropped and peering to stall.
On Tue, Mar 31, 2015 at 12:10 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> But, do we know why Jumbo frames may have an impact on peering ?
> In our setup so far, we haven't enabled jumbo frames other than performance reason (if at all).
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com] On Behalf Of Robert LeBlanc
> Sent: Tuesday, March 31, 2015 11:08 AM
> To: Sage Weil
> Cc: ceph-devel; Ceph-User
> Subject: Re: [ceph-users] Force an OSD to try to peer
>
> I was desperate for anything after exhausting every other possibility I could think of. Maybe I should put a checklist in the Ceph docs of things to look for.
>
> Thanks,
>
> On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil <sage@newdream.net> wrote:
>> On Tue, 31 Mar 2015, Robert LeBlanc wrote:
>>> Turns out jumbo frames was not set on all the switch ports. Once that
>>> was resolved the cluster quickly became healthy.
>>
>> I always hesitate to point the finger at the jumbo frames
>> configuration but almost every time that is the culprit!
>>
>> Thanks for the update. :)
>> sage
>>
>>
>>
>>>
>>> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert@leblancnet.us> wrote:
>>> > I've been working at this peering problem all day. I've done a lot
>>> > of testing at the network layer and I just don't believe that we
>>> > have a problem that would prevent OSDs from peering. When looking
>>> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
>>> > trying to peer. I don't know if it is because there are so many
>>> > outstanding creations or what. OSDs will peer with OSDs on other
>>> > hosts, but for reason only chooses a certain number and not one that it needs to finish the peering process.
>>> >
>>> > I've check: firewall, open files, number of threads allowed. These
>>> > usually have given me an error in the logs that helped me fix the problem.
>>> >
>>> > I can't find a configuration item that specifies how many peers an
>>> > OSD should contact or anything that would be artificially limiting
>>> > the peering connections. I've restarted the OSDs a number of times,
>>> > as well as rebooting the hosts. I beleive if the OSDs finish
>>> > peering everything will clear up. I can't find anything in pg query
>>> > that would help me figure out what is blocking it (peering blocked
>>> > by is empty). The PGs are scattered across all the hosts so we can't pin it down to a specific host.
>>> >
>>> > Any ideas on what to try would be appreciated.
>>> >
>>> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
>>> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>> > [ulhglive-root@ceph9 ~]# ceph status
>>> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
>>> > health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
>>> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
>>> > monmap e2: 3 mons at
>>> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
>>> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
>>> > osdmap e704: 120 osds: 120 up, 120 in
>>> > pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
>>> > 11447 MB used, 436 TB / 436 TB avail
>>> > 727 active+clean
>>> > 990 peering
>>> > 37 creating+peering
>>> > 1 down+peering
>>> > 290 remapped+peering
>>> > 3 creating+remapped+peering
>>> >
>>> > { "state": "peering",
>>> > "epoch": 707,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "info": { "pgid": "7.171",
>>> > "last_update": "0'0",
>>> > "last_complete": "0'0",
>>> > "log_tail": "0'0",
>>> > "last_user_version": 0,
>>> > "last_backfill": "MAX",
>>> > "purged_snaps": "[]",
>>> > "history": { "epoch_created": 293,
>>> > "last_epoch_started": 343,
>>> > "last_epoch_clean": 343,
>>> > "last_epoch_split": 0,
>>> > "same_up_since": 688,
>>> > "same_interval_since": 688,
>>> > "same_primary_since": 608,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_clean_scrub_stamp": "0.000000"},
>>> > "stats": { "version": "0'0",
>>> > "reported_seq": "326",
>>> > "reported_epoch": "707",
>>> > "state": "peering",
>>> > "last_fresh": "2015-03-30 20:10:39.509855",
>>> > "last_change": "2015-03-30 19:44:17.361601",
>>> > "last_active": "2015-03-30 11:37:56.956417",
>>> > "last_clean": "2015-03-30 11:37:56.956417",
>>> > "last_became_active": "0.000000",
>>> > "last_unstale": "2015-03-30 20:10:39.509855",
>>> > "mapping_epoch": 683,
>>> > "log_start": "0'0",
>>> > "ondisk_log_start": "0'0",
>>> > "created": 293,
>>> > "last_epoch_clean": 343,
>>> > "parent": "0.0",
>>> > "parent_split_bits": 0,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_clean_scrub_stamp": "0.000000",
>>> > "log_size": 0,
>>> > "ondisk_log_size": 0,
>>> > "stats_invalid": "0",
>>> > "stat_sum": { "num_bytes": 0,
>>> > "num_objects": 0,
>>> > "num_object_clones": 0,
>>> > "num_object_copies": 0,
>>> > "num_objects_missing_on_primary": 0,
>>> > "num_objects_degraded": 0,
>>> > "num_objects_unfound": 0,
>>> > "num_objects_dirty": 0,
>>> > "num_whiteouts": 0,
>>> > "num_read": 0,
>>> > "num_read_kb": 0,
>>> > "num_write": 0,
>>> > "num_write_kb": 0,
>>> > "num_scrub_errors": 0,
>>> > "num_shallow_scrub_errors": 0,
>>> > "num_deep_scrub_errors": 0,
>>> > "num_objects_recovered": 0,
>>> > "num_bytes_recovered": 0,
>>> > "num_keys_recovered": 0,
>>> > "num_objects_omap": 0,
>>> > "num_objects_hit_set_archive": 0},
>>> > "stat_cat_sum": {},
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "up_primary": 40,
>>> > "acting_primary": 40},
>>> > "empty": 1,
>>> > "dne": 0,
>>> > "incomplete": 0,
>>> > "last_epoch_started": 348,
>>> > "hit_set_history": { "current_last_update": "0'0",
>>> > "current_last_stamp": "0.000000",
>>> > "current_info": { "begin": "0.000000",
>>> > "end": "0.000000",
>>> > "version": "0'0"},
>>> > "history": []}},
>>> > "peer_info": [
>>> > { "peer": "48",
>>> > "pgid": "7.171",
>>> > "last_update": "0'0",
>>> > "last_complete": "0'0",
>>> > "log_tail": "0'0",
>>> > "last_user_version": 0,
>>> > "last_backfill": "MAX",
>>> > "purged_snaps": "[]",
>>> > "history": { "epoch_created": 293,
>>> > "last_epoch_started": 343,
>>> > "last_epoch_clean": 343,
>>> > "last_epoch_split": 0,
>>> > "same_up_since": 688,
>>> > "same_interval_since": 688,
>>> > "same_primary_since": 608,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_clean_scrub_stamp": "0.000000"},
>>> > "stats": { "version": "0'0",
>>> > "reported_seq": "24",
>>> > "reported_epoch": "348",
>>> > "state": "peering",
>>> > "last_fresh": "2015-03-30 11:39:02.979742",
>>> > "last_change": "2015-03-30 11:39:01.650897",
>>> > "last_active": "2015-03-30 11:37:56.956417",
>>> > "last_clean": "2015-03-30 11:37:56.956417",
>>> > "last_became_active": "0.000000",
>>> > "last_unstale": "2015-03-30 11:39:02.979742",
>>> > "mapping_epoch": 683,
>>> > "log_start": "0'0",
>>> > "ondisk_log_start": "0'0",
>>> > "created": 293,
>>> > "last_epoch_clean": 343,
>>> > "parent": "0.0",
>>> > "parent_split_bits": 0,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
>>> > "last_clean_scrub_stamp": "0.000000",
>>> > "log_size": 0,
>>> > "ondisk_log_size": 0,
>>> > "stats_invalid": "0",
>>> > "stat_sum": { "num_bytes": 0,
>>> > "num_objects": 0,
>>> > "num_object_clones": 0,
>>> > "num_object_copies": 0,
>>> > "num_objects_missing_on_primary": 0,
>>> > "num_objects_degraded": 0,
>>> > "num_objects_unfound": 0,
>>> > "num_objects_dirty": 0,
>>> > "num_whiteouts": 0,
>>> > "num_read": 0,
>>> > "num_read_kb": 0,
>>> > "num_write": 0,
>>> > "num_write_kb": 0,
>>> > "num_scrub_errors": 0,
>>> > "num_shallow_scrub_errors": 0,
>>> > "num_deep_scrub_errors": 0,
>>> > "num_objects_recovered": 0,
>>> > "num_bytes_recovered": 0,
>>> > "num_keys_recovered": 0,
>>> > "num_objects_omap": 0,
>>> > "num_objects_hit_set_archive": 0},
>>> > "stat_cat_sum": {},
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "up_primary": 40,
>>> > "acting_primary": 40},
>>> > "empty": 1,
>>> > "dne": 0,
>>> > "incomplete": 0,
>>> > "last_epoch_started": 348,
>>> > "hit_set_history": { "current_last_update": "0'0",
>>> > "current_last_stamp": "0.000000",
>>> > "current_info": { "begin": "0.000000",
>>> > "end": "0.000000",
>>> > "version": "0'0"},
>>> > "history": []}},
>>> > { "peer": "110",
>>> > "pgid": "7.171",
>>> > "last_update": "0'0",
>>> > "last_complete": "0'0",
>>> > "log_tail": "0'0",
>>> > "last_user_version": 0,
>>> > "last_backfill": "MAX",
>>> > "purged_snaps": "[]",
>>> > "history": { "epoch_created": 0,
>>> > "last_epoch_started": 0,
>>> > "last_epoch_clean": 0,
>>> > "last_epoch_split": 0,
>>> > "same_up_since": 0,
>>> > "same_interval_since": 0,
>>> > "same_primary_since": 0,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "0.000000",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "0.000000",
>>> > "last_clean_scrub_stamp": "0.000000"},
>>> > "stats": { "version": "0'0",
>>> > "reported_seq": "0",
>>> > "reported_epoch": "0",
>>> > "state": "inactive",
>>> > "last_fresh": "0.000000",
>>> > "last_change": "0.000000",
>>> > "last_active": "0.000000",
>>> > "last_clean": "0.000000",
>>> > "last_became_active": "0.000000",
>>> > "last_unstale": "0.000000",
>>> > "mapping_epoch": 0,
>>> > "log_start": "0'0",
>>> > "ondisk_log_start": "0'0",
>>> > "created": 0,
>>> > "last_epoch_clean": 0,
>>> > "parent": "0.0",
>>> > "parent_split_bits": 0,
>>> > "last_scrub": "0'0",
>>> > "last_scrub_stamp": "0.000000",
>>> > "last_deep_scrub": "0'0",
>>> > "last_deep_scrub_stamp": "0.000000",
>>> > "last_clean_scrub_stamp": "0.000000",
>>> > "log_size": 0,
>>> > "ondisk_log_size": 0,
>>> > "stats_invalid": "0",
>>> > "stat_sum": { "num_bytes": 0,
>>> > "num_objects": 0,
>>> > "num_object_clones": 0,
>>> > "num_object_copies": 0,
>>> > "num_objects_missing_on_primary": 0,
>>> > "num_objects_degraded": 0,
>>> > "num_objects_unfound": 0,
>>> > "num_objects_dirty": 0,
>>> > "num_whiteouts": 0,
>>> > "num_read": 0,
>>> > "num_read_kb": 0,
>>> > "num_write": 0,
>>> > "num_write_kb": 0,
>>> > "num_scrub_errors": 0,
>>> > "num_shallow_scrub_errors": 0,
>>> > "num_deep_scrub_errors": 0,
>>> > "num_objects_recovered": 0,
>>> > "num_bytes_recovered": 0,
>>> > "num_keys_recovered": 0,
>>> > "num_objects_omap": 0,
>>> > "num_objects_hit_set_archive": 0},
>>> > "stat_cat_sum": {},
>>> > "up": [],
>>> > "acting": [],
>>> > "up_primary": -1,
>>> > "acting_primary": -1},
>>> > "empty": 1,
>>> > "dne": 1,
>>> > "incomplete": 0,
>>> > "last_epoch_started": 0,
>>> > "hit_set_history": { "current_last_update": "0'0",
>>> > "current_last_stamp": "0.000000",
>>> > "current_info": { "begin": "0.000000",
>>> > "end": "0.000000",
>>> > "version": "0'0"},
>>> > "history": []}}],
>>> > "recovery_state": [
>>> > { "name": "Started\/Primary\/Peering\/GetInfo",
>>> > "enter_time": "2015-03-30 19:44:18.709317",
>>> > "requested_info_from": [
>>> > { "osd": "0"},
>>> > { "osd": "5"},
>>> > { "osd": "10"},
>>> > { "osd": "22"},
>>> > { "osd": "54"},
>>> > { "osd": "91"},
>>> > { "osd": "92"},
>>> > { "osd": "113"},
>>> > { "osd": "114"}]},
>>> > { "name": "Started\/Primary\/Peering",
>>> > "enter_time": "2015-03-30 19:44:18.709316",
>>> > "past_intervals": [
>>> > { "first": 342,
>>> > "last": 346,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 114],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 40,
>>> > 40]},
>>> > { "first": 347,
>>> > "last": 353,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 354,
>>> > "last": 356,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 92,
>>> > 48,
>>> > 92,
>>> > 92]},
>>> > { "first": 357,
>>> > "last": 359,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 113,
>>> > 48,
>>> > 114],
>>> > "acting": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 113,
>>> > 113]},
>>> > { "first": 360,
>>> > "last": 361,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 362,
>>> > "last": 364,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 40,
>>> > 40]},
>>> > { "first": 365,
>>> > "last": 369,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 114],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 40,
>>> > 40]},
>>> > { "first": 370,
>>> > "last": 379,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 380,
>>> > "last": 400,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 401,
>>> > "last": 409,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 92,
>>> > 92]},
>>> > { "first": 410,
>>> > "last": 414,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 0],
>>> > "acting": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 0,
>>> > 113,
>>> > 113]},
>>> > { "first": 415,
>>> > "last": 435,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 10],
>>> > "acting": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 10,
>>> > 113,
>>> > 113]},
>>> > { "first": 436,
>>> > "last": 442,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 443,
>>> > "last": 446,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 447,
>>> > "last": 457,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 458,
>>> > "last": 460,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 10],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 10,
>>> > 40,
>>> > 40]},
>>> > { "first": 461,
>>> > "last": 466,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 22],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 40,
>>> > 40]},
>>> > { "first": 467,
>>> > "last": 478,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 5],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 5,
>>> > 40,
>>> > 40]},
>>> > { "first": 479,
>>> > "last": 489,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 110],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 110,
>>> > 40,
>>> > 40]},
>>> > { "first": 490,
>>> > "last": 496,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 0],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 22,
>>> > 0,
>>> > 40,
>>> > 40]},
>>> > { "first": 497,
>>> > "last": 507,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 114,
>>> > 10],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 114,
>>> > 10,
>>> > 40,
>>> > 40]},
>>> > { "first": 508,
>>> > "last": 511,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 48,
>>> > 54,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 48,
>>> > 54,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 512,
>>> > "last": 579,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 580,
>>> > "last": 580,
>>> > "maybe_went_rw": 0,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 581,
>>> > "last": 591,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 92,
>>> > 91],
>>> > "acting": [
>>> > 92,
>>> > 91,
>>> > 92,
>>> > 92]},
>>> > { "first": 592,
>>> > "last": 595,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 113,
>>> > 114,
>>> > 22,
>>> > 0],
>>> > "acting": [
>>> > 113,
>>> > 114,
>>> > 22,
>>> > 0,
>>> > 113,
>>> > 113]},
>>> > { "first": 596,
>>> > "last": 599,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 10],
>>> > "acting": [
>>> > 113,
>>> > 48,
>>> > 114,
>>> > 10,
>>> > 113,
>>> > 113]},
>>> > { "first": 600,
>>> > "last": 606,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 607,
>>> > "last": 607,
>>> > "maybe_went_rw": 0,
>>> > "up": [
>>> > 92,
>>> > 91],
>>> > "acting": [
>>> > 92,
>>> > 91,
>>> > 92,
>>> > 92]},
>>> > { "first": 608,
>>> > "last": 616,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 617,
>>> > "last": 625,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 626,
>>> > "last": 632,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 10],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 10,
>>> > 40,
>>> > 40]},
>>> > { "first": 633,
>>> > "last": 639,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 640,
>>> > "last": 643,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 644,
>>> > "last": 662,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 10],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 114,
>>> > 10,
>>> > 40,
>>> > 40]},
>>> > { "first": 663,
>>> > "last": 679,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 91,
>>> > 40,
>>> > 40]},
>>> > { "first": 680,
>>> > "last": 682,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 40,
>>> > 40]},
>>> > { "first": 683,
>>> > "last": 687,
>>> > "maybe_went_rw": 1,
>>> > "up": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 10],
>>> > "acting": [
>>> > 40,
>>> > 92,
>>> > 48,
>>> > 10,
>>> > 40,
>>> > 40]}],
>>> > "probing_osds": [
>>> > "0",
>>> > "5",
>>> > "10",
>>> > "22",
>>> > "40",
>>> > "48",
>>> > "54",
>>> > "91",
>>> > "92",
>>> > "110",
>>> > "113",
>>> > "114"],
>>> > "down_osds_we_would_probe": [],
>>> > "peering_blocked_by": []},
>>> > { "name": "Started",
>>> > "enter_time": "2015-03-30 19:44:18.709312"}],
>>> > "agent_state": {}}
>>> >
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Force an OSD to try to peer
[not found] ` <755F6B91B3BE364F9BCA11EA3F9E0C6F28CB737E-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
@ 2015-03-31 18:23 ` Sage Weil
0 siblings, 0 replies; 8+ messages in thread
From: Sage Weil @ 2015-03-31 18:23 UTC (permalink / raw)
To: Somnath Roy; +Cc: ceph-devel, Ceph-User
On Tue, 31 Mar 2015, Somnath Roy wrote:
> But, do we know why Jumbo frames may have an impact on peering ?
> In our setup so far, we haven't enabled jumbo frames other than performance reason (if at all).
It's nothing specific to peering (or ceph). The symptom we've seen is
just that byte stop passing across a TCP connection, usually when there is
some largish messages being sent. The ping/heartbeat messages get through
because they are small and we disable nagle so they never end up in large
frames.
It's a pain to diagnose.
sage
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org] On Behalf Of Robert LeBlanc
> Sent: Tuesday, March 31, 2015 11:08 AM
> To: Sage Weil
> Cc: ceph-devel; Ceph-User
> Subject: Re: [ceph-users] Force an OSD to try to peer
>
> I was desperate for anything after exhausting every other possibility I could think of. Maybe I should put a checklist in the Ceph docs of things to look for.
>
> Thanks,
>
> On Tue, Mar 31, 2015 at 11:36 AM, Sage Weil <sage-BnTBU8nroG7k1uMJSBkQmQ@public.gmane.org> wrote:
> > On Tue, 31 Mar 2015, Robert LeBlanc wrote:
> >> Turns out jumbo frames was not set on all the switch ports. Once that
> >> was resolved the cluster quickly became healthy.
> >
> > I always hesitate to point the finger at the jumbo frames
> > configuration but almost every time that is the culprit!
> >
> > Thanks for the update. :)
> > sage
> >
> >
> >
> >>
> >> On Mon, Mar 30, 2015 at 8:15 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> >> > I've been working at this peering problem all day. I've done a lot
> >> > of testing at the network layer and I just don't believe that we
> >> > have a problem that would prevent OSDs from peering. When looking
> >> > though osd_debug 20/20 logs, it just doesn't look like the OSDs are
> >> > trying to peer. I don't know if it is because there are so many
> >> > outstanding creations or what. OSDs will peer with OSDs on other
> >> > hosts, but for reason only chooses a certain number and not one that it needs to finish the peering process.
> >> >
> >> > I've check: firewall, open files, number of threads allowed. These
> >> > usually have given me an error in the logs that helped me fix the problem.
> >> >
> >> > I can't find a configuration item that specifies how many peers an
> >> > OSD should contact or anything that would be artificially limiting
> >> > the peering connections. I've restarted the OSDs a number of times,
> >> > as well as rebooting the hosts. I beleive if the OSDs finish
> >> > peering everything will clear up. I can't find anything in pg query
> >> > that would help me figure out what is blocking it (peering blocked
> >> > by is empty). The PGs are scattered across all the hosts so we can't pin it down to a specific host.
> >> >
> >> > Any ideas on what to try would be appreciated.
> >> >
> >> > [ulhglive-root@ceph9 ~]# ceph --version ceph version 0.80.7
> >> > (6c0127fcb58008793d3c8b62d925bc91963672a3)
> >> > [ulhglive-root@ceph9 ~]# ceph status
> >> > cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
> >> > health HEALTH_WARN 1 pgs down; 1321 pgs peering; 1321 pgs
> >> > stuck inactive; 1321 pgs stuck unclean; too few pgs per osd (17 < min 20)
> >> > monmap e2: 3 mons at
> >> > {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.2
> >> > 9:6789/0}, election epoch 30, quorum 0,1,2 mon1,mon2,mon3
> >> > osdmap e704: 120 osds: 120 up, 120 in
> >> > pgmap v1895: 2048 pgs, 1 pools, 0 bytes data, 0 objects
> >> > 11447 MB used, 436 TB / 436 TB avail
> >> > 727 active+clean
> >> > 990 peering
> >> > 37 creating+peering
> >> > 1 down+peering
> >> > 290 remapped+peering
> >> > 3 creating+remapped+peering
> >> >
> >> > { "state": "peering",
> >> > "epoch": 707,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "info": { "pgid": "7.171",
> >> > "last_update": "0'0",
> >> > "last_complete": "0'0",
> >> > "log_tail": "0'0",
> >> > "last_user_version": 0,
> >> > "last_backfill": "MAX",
> >> > "purged_snaps": "[]",
> >> > "history": { "epoch_created": 293,
> >> > "last_epoch_started": 343,
> >> > "last_epoch_clean": 343,
> >> > "last_epoch_split": 0,
> >> > "same_up_since": 688,
> >> > "same_interval_since": 688,
> >> > "same_primary_since": 608,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_clean_scrub_stamp": "0.000000"},
> >> > "stats": { "version": "0'0",
> >> > "reported_seq": "326",
> >> > "reported_epoch": "707",
> >> > "state": "peering",
> >> > "last_fresh": "2015-03-30 20:10:39.509855",
> >> > "last_change": "2015-03-30 19:44:17.361601",
> >> > "last_active": "2015-03-30 11:37:56.956417",
> >> > "last_clean": "2015-03-30 11:37:56.956417",
> >> > "last_became_active": "0.000000",
> >> > "last_unstale": "2015-03-30 20:10:39.509855",
> >> > "mapping_epoch": 683,
> >> > "log_start": "0'0",
> >> > "ondisk_log_start": "0'0",
> >> > "created": 293,
> >> > "last_epoch_clean": 343,
> >> > "parent": "0.0",
> >> > "parent_split_bits": 0,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_clean_scrub_stamp": "0.000000",
> >> > "log_size": 0,
> >> > "ondisk_log_size": 0,
> >> > "stats_invalid": "0",
> >> > "stat_sum": { "num_bytes": 0,
> >> > "num_objects": 0,
> >> > "num_object_clones": 0,
> >> > "num_object_copies": 0,
> >> > "num_objects_missing_on_primary": 0,
> >> > "num_objects_degraded": 0,
> >> > "num_objects_unfound": 0,
> >> > "num_objects_dirty": 0,
> >> > "num_whiteouts": 0,
> >> > "num_read": 0,
> >> > "num_read_kb": 0,
> >> > "num_write": 0,
> >> > "num_write_kb": 0,
> >> > "num_scrub_errors": 0,
> >> > "num_shallow_scrub_errors": 0,
> >> > "num_deep_scrub_errors": 0,
> >> > "num_objects_recovered": 0,
> >> > "num_bytes_recovered": 0,
> >> > "num_keys_recovered": 0,
> >> > "num_objects_omap": 0,
> >> > "num_objects_hit_set_archive": 0},
> >> > "stat_cat_sum": {},
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "up_primary": 40,
> >> > "acting_primary": 40},
> >> > "empty": 1,
> >> > "dne": 0,
> >> > "incomplete": 0,
> >> > "last_epoch_started": 348,
> >> > "hit_set_history": { "current_last_update": "0'0",
> >> > "current_last_stamp": "0.000000",
> >> > "current_info": { "begin": "0.000000",
> >> > "end": "0.000000",
> >> > "version": "0'0"},
> >> > "history": []}},
> >> > "peer_info": [
> >> > { "peer": "48",
> >> > "pgid": "7.171",
> >> > "last_update": "0'0",
> >> > "last_complete": "0'0",
> >> > "log_tail": "0'0",
> >> > "last_user_version": 0,
> >> > "last_backfill": "MAX",
> >> > "purged_snaps": "[]",
> >> > "history": { "epoch_created": 293,
> >> > "last_epoch_started": 343,
> >> > "last_epoch_clean": 343,
> >> > "last_epoch_split": 0,
> >> > "same_up_since": 688,
> >> > "same_interval_since": 688,
> >> > "same_primary_since": 608,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_clean_scrub_stamp": "0.000000"},
> >> > "stats": { "version": "0'0",
> >> > "reported_seq": "24",
> >> > "reported_epoch": "348",
> >> > "state": "peering",
> >> > "last_fresh": "2015-03-30 11:39:02.979742",
> >> > "last_change": "2015-03-30 11:39:01.650897",
> >> > "last_active": "2015-03-30 11:37:56.956417",
> >> > "last_clean": "2015-03-30 11:37:56.956417",
> >> > "last_became_active": "0.000000",
> >> > "last_unstale": "2015-03-30 11:39:02.979742",
> >> > "mapping_epoch": 683,
> >> > "log_start": "0'0",
> >> > "ondisk_log_start": "0'0",
> >> > "created": 293,
> >> > "last_epoch_clean": 343,
> >> > "parent": "0.0",
> >> > "parent_split_bits": 0,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "2015-03-30 11:11:18.872851",
> >> > "last_clean_scrub_stamp": "0.000000",
> >> > "log_size": 0,
> >> > "ondisk_log_size": 0,
> >> > "stats_invalid": "0",
> >> > "stat_sum": { "num_bytes": 0,
> >> > "num_objects": 0,
> >> > "num_object_clones": 0,
> >> > "num_object_copies": 0,
> >> > "num_objects_missing_on_primary": 0,
> >> > "num_objects_degraded": 0,
> >> > "num_objects_unfound": 0,
> >> > "num_objects_dirty": 0,
> >> > "num_whiteouts": 0,
> >> > "num_read": 0,
> >> > "num_read_kb": 0,
> >> > "num_write": 0,
> >> > "num_write_kb": 0,
> >> > "num_scrub_errors": 0,
> >> > "num_shallow_scrub_errors": 0,
> >> > "num_deep_scrub_errors": 0,
> >> > "num_objects_recovered": 0,
> >> > "num_bytes_recovered": 0,
> >> > "num_keys_recovered": 0,
> >> > "num_objects_omap": 0,
> >> > "num_objects_hit_set_archive": 0},
> >> > "stat_cat_sum": {},
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "up_primary": 40,
> >> > "acting_primary": 40},
> >> > "empty": 1,
> >> > "dne": 0,
> >> > "incomplete": 0,
> >> > "last_epoch_started": 348,
> >> > "hit_set_history": { "current_last_update": "0'0",
> >> > "current_last_stamp": "0.000000",
> >> > "current_info": { "begin": "0.000000",
> >> > "end": "0.000000",
> >> > "version": "0'0"},
> >> > "history": []}},
> >> > { "peer": "110",
> >> > "pgid": "7.171",
> >> > "last_update": "0'0",
> >> > "last_complete": "0'0",
> >> > "log_tail": "0'0",
> >> > "last_user_version": 0,
> >> > "last_backfill": "MAX",
> >> > "purged_snaps": "[]",
> >> > "history": { "epoch_created": 0,
> >> > "last_epoch_started": 0,
> >> > "last_epoch_clean": 0,
> >> > "last_epoch_split": 0,
> >> > "same_up_since": 0,
> >> > "same_interval_since": 0,
> >> > "same_primary_since": 0,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "0.000000",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "0.000000",
> >> > "last_clean_scrub_stamp": "0.000000"},
> >> > "stats": { "version": "0'0",
> >> > "reported_seq": "0",
> >> > "reported_epoch": "0",
> >> > "state": "inactive",
> >> > "last_fresh": "0.000000",
> >> > "last_change": "0.000000",
> >> > "last_active": "0.000000",
> >> > "last_clean": "0.000000",
> >> > "last_became_active": "0.000000",
> >> > "last_unstale": "0.000000",
> >> > "mapping_epoch": 0,
> >> > "log_start": "0'0",
> >> > "ondisk_log_start": "0'0",
> >> > "created": 0,
> >> > "last_epoch_clean": 0,
> >> > "parent": "0.0",
> >> > "parent_split_bits": 0,
> >> > "last_scrub": "0'0",
> >> > "last_scrub_stamp": "0.000000",
> >> > "last_deep_scrub": "0'0",
> >> > "last_deep_scrub_stamp": "0.000000",
> >> > "last_clean_scrub_stamp": "0.000000",
> >> > "log_size": 0,
> >> > "ondisk_log_size": 0,
> >> > "stats_invalid": "0",
> >> > "stat_sum": { "num_bytes": 0,
> >> > "num_objects": 0,
> >> > "num_object_clones": 0,
> >> > "num_object_copies": 0,
> >> > "num_objects_missing_on_primary": 0,
> >> > "num_objects_degraded": 0,
> >> > "num_objects_unfound": 0,
> >> > "num_objects_dirty": 0,
> >> > "num_whiteouts": 0,
> >> > "num_read": 0,
> >> > "num_read_kb": 0,
> >> > "num_write": 0,
> >> > "num_write_kb": 0,
> >> > "num_scrub_errors": 0,
> >> > "num_shallow_scrub_errors": 0,
> >> > "num_deep_scrub_errors": 0,
> >> > "num_objects_recovered": 0,
> >> > "num_bytes_recovered": 0,
> >> > "num_keys_recovered": 0,
> >> > "num_objects_omap": 0,
> >> > "num_objects_hit_set_archive": 0},
> >> > "stat_cat_sum": {},
> >> > "up": [],
> >> > "acting": [],
> >> > "up_primary": -1,
> >> > "acting_primary": -1},
> >> > "empty": 1,
> >> > "dne": 1,
> >> > "incomplete": 0,
> >> > "last_epoch_started": 0,
> >> > "hit_set_history": { "current_last_update": "0'0",
> >> > "current_last_stamp": "0.000000",
> >> > "current_info": { "begin": "0.000000",
> >> > "end": "0.000000",
> >> > "version": "0'0"},
> >> > "history": []}}],
> >> > "recovery_state": [
> >> > { "name": "Started\/Primary\/Peering\/GetInfo",
> >> > "enter_time": "2015-03-30 19:44:18.709317",
> >> > "requested_info_from": [
> >> > { "osd": "0"},
> >> > { "osd": "5"},
> >> > { "osd": "10"},
> >> > { "osd": "22"},
> >> > { "osd": "54"},
> >> > { "osd": "91"},
> >> > { "osd": "92"},
> >> > { "osd": "113"},
> >> > { "osd": "114"}]},
> >> > { "name": "Started\/Primary\/Peering",
> >> > "enter_time": "2015-03-30 19:44:18.709316",
> >> > "past_intervals": [
> >> > { "first": 342,
> >> > "last": 346,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 114],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 40,
> >> > 40]},
> >> > { "first": 347,
> >> > "last": 353,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 354,
> >> > "last": 356,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 92,
> >> > 48,
> >> > 92,
> >> > 92]},
> >> > { "first": 357,
> >> > "last": 359,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 113,
> >> > 48,
> >> > 114],
> >> > "acting": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 113,
> >> > 113]},
> >> > { "first": 360,
> >> > "last": 361,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 362,
> >> > "last": 364,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 40,
> >> > 40]},
> >> > { "first": 365,
> >> > "last": 369,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 114],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 40,
> >> > 40]},
> >> > { "first": 370,
> >> > "last": 379,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 380,
> >> > "last": 400,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 401,
> >> > "last": 409,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 92,
> >> > 92]},
> >> > { "first": 410,
> >> > "last": 414,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 0],
> >> > "acting": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 0,
> >> > 113,
> >> > 113]},
> >> > { "first": 415,
> >> > "last": 435,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 10],
> >> > "acting": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 10,
> >> > 113,
> >> > 113]},
> >> > { "first": 436,
> >> > "last": 442,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 443,
> >> > "last": 446,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 447,
> >> > "last": 457,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 458,
> >> > "last": 460,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 10],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 10,
> >> > 40,
> >> > 40]},
> >> > { "first": 461,
> >> > "last": 466,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 22],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 40,
> >> > 40]},
> >> > { "first": 467,
> >> > "last": 478,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 5],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 5,
> >> > 40,
> >> > 40]},
> >> > { "first": 479,
> >> > "last": 489,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 110],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 110,
> >> > 40,
> >> > 40]},
> >> > { "first": 490,
> >> > "last": 496,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 0],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 22,
> >> > 0,
> >> > 40,
> >> > 40]},
> >> > { "first": 497,
> >> > "last": 507,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 114,
> >> > 10],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 114,
> >> > 10,
> >> > 40,
> >> > 40]},
> >> > { "first": 508,
> >> > "last": 511,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 48,
> >> > 54,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 48,
> >> > 54,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 512,
> >> > "last": 579,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 580,
> >> > "last": 580,
> >> > "maybe_went_rw": 0,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 581,
> >> > "last": 591,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 92,
> >> > 91],
> >> > "acting": [
> >> > 92,
> >> > 91,
> >> > 92,
> >> > 92]},
> >> > { "first": 592,
> >> > "last": 595,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 113,
> >> > 114,
> >> > 22,
> >> > 0],
> >> > "acting": [
> >> > 113,
> >> > 114,
> >> > 22,
> >> > 0,
> >> > 113,
> >> > 113]},
> >> > { "first": 596,
> >> > "last": 599,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 10],
> >> > "acting": [
> >> > 113,
> >> > 48,
> >> > 114,
> >> > 10,
> >> > 113,
> >> > 113]},
> >> > { "first": 600,
> >> > "last": 606,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 607,
> >> > "last": 607,
> >> > "maybe_went_rw": 0,
> >> > "up": [
> >> > 92,
> >> > 91],
> >> > "acting": [
> >> > 92,
> >> > 91,
> >> > 92,
> >> > 92]},
> >> > { "first": 608,
> >> > "last": 616,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 617,
> >> > "last": 625,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 626,
> >> > "last": 632,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 10],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 10,
> >> > 40,
> >> > 40]},
> >> > { "first": 633,
> >> > "last": 639,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 640,
> >> > "last": 643,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 644,
> >> > "last": 662,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 10],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 114,
> >> > 10,
> >> > 40,
> >> > 40]},
> >> > { "first": 663,
> >> > "last": 679,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 91,
> >> > 40,
> >> > 40]},
> >> > { "first": 680,
> >> > "last": 682,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 40,
> >> > 40]},
> >> > { "first": 683,
> >> > "last": 687,
> >> > "maybe_went_rw": 1,
> >> > "up": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 10],
> >> > "acting": [
> >> > 40,
> >> > 92,
> >> > 48,
> >> > 10,
> >> > 40,
> >> > 40]}],
> >> > "probing_osds": [
> >> > "0",
> >> > "5",
> >> > "10",
> >> > "22",
> >> > "40",
> >> > "48",
> >> > "54",
> >> > "91",
> >> > "92",
> >> > "110",
> >> > "113",
> >> > "114"],
> >> > "down_osds_we_would_probe": [],
> >> > "peering_blocked_by": []},
> >> > { "name": "Started",
> >> > "enter_time": "2015-03-30 19:44:18.709312"}],
> >> > "agent_state": {}}
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-31 18:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 2:15 Force an OSD to try to peer Robert LeBlanc
2015-03-31 2:16 ` Fwd: " Robert LeBlanc
[not found] ` <CAANLjFowYybKKueFeHHT4Ug2eTW_-RGVQRtRE7vfgF-1XXJwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-31 17:07 ` Robert LeBlanc
2015-03-31 17:36 ` Sage Weil
2015-03-31 18:08 ` Robert LeBlanc
[not found] ` <CAANLjFp0pStF0iBw21XSv8a03YX4iy74rZSr_MD8e1mGR0KCAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-31 18:10 ` Somnath Roy
2015-03-31 18:20 ` [ceph-users] " Robert LeBlanc
[not found] ` <755F6B91B3BE364F9BCA11EA3F9E0C6F28CB737E-cXZ6iGhjG0il5HHZYNR2WTJ2aSJ780jGSxCzGc5ayCJWk0Htik3J/w@public.gmane.org>
2015-03-31 18:23 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.