* clearing unfound objects
@ 2017-09-12 22:20 Two Spirit
2017-09-12 22:48 ` Sage Weil
0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-12 22:20 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, ceph-devel
[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]
>On Tue, 12 Sep 2017, Two Spirit wrote:
>> I don't have any OSDs that are down, so the 1 unfound object I think
>> needs to be manually cleared. I ran across a webpage a while ago that
>> talked about how to clear it, but if you have a reference, would save
>> me a little time.
>
>http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
Thanks. That was the page I had read earlier.
I've attached the full outputs to this mail and show just clips below.
# ceph health detail
OBJECT_UNFOUND 1/731529 unfound (0.000%)
pg 6.2 has 1 unfound objects
There looks like one number that shouldn't be there...
# ceph pg 6.2 list_missing
{
"offset": {
...
"pool": -9223372036854775808,
"namespace": ""
},
...
# ceph -s
osd: 6 osds: 6 up, 6 in; 10 remapped pgs
This shows under the pg query that something believes that osd "2" is
down, but all OSDs are up, as seen in the previous ceph -s command.
# ceph pg 6.2 query
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2017-09-12 10:33:11.193486",
"might_have_unfound": [
{
"osd": "0",
"status": "already probed"
},
{
"osd": "1",
"status": "already probed"
},
{
"osd": "2",
"status": "osd is down"
},
{
"osd": "4",
"status": "already probed"
},
{
"osd": "5",
"status": "already probed"
}
If i go to a couple other OSDs, and run the same command,
the osd "2" is listed as "already probed". They are not in sync. I
double checked that all the OSDs were up on all 3 times I ran the
command.
Now. my question to debug this to figure out if I want to
"revert|delete", is what in the heck are these file(s)/object(s)
associated with the pg? I assume this might be in the MDS, but I'd
like to see a file name associated with this to make a further
determination of what I should do. I don't have enough information at
this point to figure out how I should recover.
[-- Attachment #2: ceph_pg_6.2_list_missing.out --]
[-- Type: application/octet-stream, Size: 662 bytes --]
{
"offset": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854775808,
"namespace": ""
},
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "200.0000052d",
"key": "",
"snapid": -2,
"hash": 2728386690,
"max": 0,
"pool": 6,
"namespace": ""
},
"need": "1496'15853",
"have": "0'0",
"flags": "none",
"locations": []
}
],
"more": false
}
[-- Attachment #3: ceph_pg_6.2_query.out --]
[-- Type: application/octet-stream, Size: 26148 bytes --]
{
"state": "active+recovery_wait+degraded+remapped",
"snap_trimq": "[]",
"epoch": 1692,
"up": [
4,
3
],
"acting": [
3,
1
],
"backfill_targets": [
"4"
],
"actingbackfill": [
"1",
"3",
"4"
],
"info": {
"pgid": "6.2",
"last_update": "1496'15853",
"last_complete": "1496'15852",
"log_tail": "1473'13058",
"last_user_version": 15853,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 22,
"epoch_pool_created": 22,
"last_epoch_started": 1654,
"last_interval_started": 1653,
"last_epoch_clean": 1482,
"last_interval_clean": 1479,
"last_epoch_split": 1191,
"last_epoch_marked_full": 0,
"same_up_since": 1653,
"same_interval_since": 1653,
"same_primary_since": 1651,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
},
"stats": {
"version": "1496'15853",
"reported_seq": "37466",
"reported_epoch": "1692",
"state": "active+recovery_wait+degraded+remapped",
"last_fresh": "2017-09-12 14:40:44.555997",
"last_change": "2017-09-12 14:40:44.555997",
"last_active": "2017-09-12 14:40:44.555997",
"last_peered": "2017-09-12 14:40:44.555997",
"last_clean": "2017-09-02 23:51:42.040480",
"last_became_active": "2017-09-12 10:33:11.231301",
"last_became_peered": "2017-09-12 10:33:11.231301",
"last_unstale": "2017-09-12 14:40:44.555997",
"last_undegraded": "2017-09-12 10:33:11.193432",
"last_fullsized": "2017-09-12 14:40:44.555997",
"mapping_epoch": 1653,
"log_start": "1473'13058",
"ondisk_log_start": "1473'13058",
"created": 22,
"last_epoch_clean": 1482,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869",
"log_size": 2795,
"ondisk_log_size": 2795,
"stats_invalid": true,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 9177517,
"num_objects": 649,
"num_object_clones": 0,
"num_object_copies": 1298,
"num_objects_missing_on_primary": 1,
"num_objects_missing": 0,
"num_objects_degraded": 2,
"num_objects_misplaced": 648,
"num_objects_unfound": 1,
"num_objects_dirty": 649,
"num_whiteouts": 0,
"num_read": 15129,
"num_read_kb": 168601,
"num_write": 10611,
"num_write_kb": 152359,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 946,
"num_bytes_recovered": 4454578,
"num_keys_recovered": 25320,
"num_objects_omap": 646,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
4,
3
],
"acting": [
3,
1
],
"blocked_by": [],
"up_primary": 4,
"acting_primary": 3
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 1654,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
"peer_info": [
{
"peer": "0",
"pgid": "6.2",
"last_update": "1496'15853",
"last_complete": "1496'15853",
"log_tail": "1473'14353",
"last_user_version": 0,
"last_backfill": "MIN",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 22,
"epoch_pool_created": 22,
"last_epoch_started": 1652,
"last_interval_started": 1651,
"last_epoch_clean": 1482,
"last_interval_clean": 1479,
"last_epoch_split": 1191,
"last_epoch_marked_full": 0,
"same_up_since": 1653,
"same_interval_since": 1653,
"same_primary_since": 1651,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
},
"stats": {
"version": "0'0",
"reported_seq": "0",
"reported_epoch": "0",
"state": "unknown",
"last_fresh": "0.000000",
"last_change": "0.000000",
"last_active": "0.000000",
"last_peered": "0.000000",
"last_clean": "0.000000",
"last_became_active": "0.000000",
"last_became_peered": "0.000000",
"last_unstale": "0.000000",
"last_undegraded": "0.000000",
"last_fullsized": "0.000000",
"mapping_epoch": 1653,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 0,
"last_epoch_clean": 0,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
4,
3
],
"acting": [
3,
1
],
"blocked_by": [],
"up_primary": 4,
"acting_primary": 3
},
"empty": 0,
"dne": 0,
"incomplete": 1,
"last_epoch_started": 1607,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
{
"peer": "1",
"pgid": "6.2",
"last_update": "1496'15853",
"last_complete": "1496'15852",
"log_tail": "1473'13058",
"last_user_version": 15853,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 22,
"epoch_pool_created": 22,
"last_epoch_started": 1654,
"last_interval_started": 1653,
"last_epoch_clean": 1482,
"last_interval_clean": 1479,
"last_epoch_split": 1191,
"last_epoch_marked_full": 0,
"same_up_since": 1653,
"same_interval_since": 1653,
"same_primary_since": 1651,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
},
"stats": {
"version": "1496'15853",
"reported_seq": "37756",
"reported_epoch": "1650",
"state": "remapped",
"last_fresh": "2017-09-12 10:31:48.226105",
"last_change": "2017-09-12 10:31:48.226105",
"last_active": "2017-09-12 10:31:48.155329",
"last_peered": "2017-09-12 10:31:05.287521",
"last_clean": "2017-09-02 23:51:42.040480",
"last_became_active": "2017-09-12 10:31:05.287078",
"last_became_peered": "2017-09-12 10:31:05.287078",
"last_unstale": "2017-09-12 10:31:48.226105",
"last_undegraded": "2017-09-12 10:31:48.226105",
"last_fullsized": "2017-09-12 10:31:48.226105",
"mapping_epoch": 1653,
"log_start": "1473'13058",
"ondisk_log_start": "1473'13058",
"created": 22,
"last_epoch_clean": 1482,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869",
"log_size": 2795,
"ondisk_log_size": 2795,
"stats_invalid": true,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 9177517,
"num_objects": 649,
"num_object_clones": 0,
"num_object_copies": 1298,
"num_objects_missing_on_primary": 1,
"num_objects_missing": 1,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 649,
"num_whiteouts": 0,
"num_read": 15129,
"num_read_kb": 168601,
"num_write": 10611,
"num_write_kb": 152359,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 946,
"num_bytes_recovered": 4454578,
"num_keys_recovered": 25320,
"num_objects_omap": 646,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
4,
3
],
"acting": [
3,
1
],
"blocked_by": [],
"up_primary": 4,
"acting_primary": 3
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 1654,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
{
"peer": "4",
"pgid": "6.2",
"last_update": "1496'15853",
"last_complete": "1496'15853",
"log_tail": "1473'14353",
"last_user_version": 0,
"last_backfill": "MIN",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 22,
"epoch_pool_created": 22,
"last_epoch_started": 1654,
"last_interval_started": 1653,
"last_epoch_clean": 1482,
"last_interval_clean": 1479,
"last_epoch_split": 1191,
"last_epoch_marked_full": 0,
"same_up_since": 1653,
"same_interval_since": 1653,
"same_primary_since": 1651,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
},
"stats": {
"version": "0'0",
"reported_seq": "0",
"reported_epoch": "0",
"state": "unknown",
"last_fresh": "0.000000",
"last_change": "0.000000",
"last_active": "0.000000",
"last_peered": "0.000000",
"last_clean": "0.000000",
"last_became_active": "0.000000",
"last_became_peered": "0.000000",
"last_unstale": "0.000000",
"last_undegraded": "0.000000",
"last_fullsized": "0.000000",
"mapping_epoch": 1653,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 0,
"last_epoch_clean": 0,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
4,
3
],
"acting": [
3,
1
],
"blocked_by": [],
"up_primary": 4,
"acting_primary": 3
},
"empty": 0,
"dne": 0,
"incomplete": 1,
"last_epoch_started": 1654,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
{
"peer": "5",
"pgid": "6.2",
"last_update": "1496'15853",
"last_complete": "1496'15853",
"log_tail": "1473'14353",
"last_user_version": 0,
"last_backfill": "MIN",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 22,
"epoch_pool_created": 22,
"last_epoch_started": 1652,
"last_interval_started": 1651,
"last_epoch_clean": 1482,
"last_interval_clean": 1479,
"last_epoch_split": 1191,
"last_epoch_marked_full": 0,
"same_up_since": 1653,
"same_interval_since": 1653,
"same_primary_since": 1651,
"last_scrub": "1184'12948",
"last_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_deep_scrub": "1184'12948",
"last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
"last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
},
"stats": {
"version": "0'0",
"reported_seq": "0",
"reported_epoch": "0",
"state": "unknown",
"last_fresh": "0.000000",
"last_change": "0.000000",
"last_active": "0.000000",
"last_peered": "0.000000",
"last_clean": "0.000000",
"last_became_active": "0.000000",
"last_became_peered": "0.000000",
"last_unstale": "0.000000",
"last_undegraded": "0.000000",
"last_fullsized": "0.000000",
"mapping_epoch": 1653,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 0,
"last_epoch_clean": 0,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "0'0",
"last_scrub_stamp": "0.000000",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "0.000000",
"last_clean_scrub_stamp": "0.000000",
"log_size": 0,
"ondisk_log_size": 0,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"stat_sum": {
"num_bytes": 0,
"num_objects": 0,
"num_object_clones": 0,
"num_object_copies": 0,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 0,
"num_whiteouts": 0,
"num_read": 0,
"num_read_kb": 0,
"num_write": 0,
"num_write_kb": 0,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0
},
"up": [
4,
3
],
"acting": [
3,
1
],
"blocked_by": [],
"up_primary": 4,
"acting_primary": 3
},
"empty": 0,
"dne": 0,
"incomplete": 1,
"last_epoch_started": 1516,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}
],
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2017-09-12 10:33:11.193486",
"might_have_unfound": [
{
"osd": "0",
"status": "already probed"
},
{
"osd": "1",
"status": "already probed"
},
{
"osd": "2",
"status": "osd is down"
},
{
"osd": "4",
"status": "already probed"
},
{
"osd": "5",
"status": "already probed"
}
],
"recovery_progress": {
"backfill_targets": [
"4"
],
"waiting_on_backfill": [],
"last_backfill_started": "MIN",
"backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "0",
"scrubber.active": false,
"scrubber.state": "INACTIVE",
"scrubber.start": "MIN",
"scrubber.end": "MIN",
"scrubber.subset_last_update": "0'0",
"scrubber.deep": false,
"scrubber.seed": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2017-09-12 10:33:10.326608"
}
],
"agent_state": {}
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: clearing unfound objects
2017-09-12 22:20 clearing unfound objects Two Spirit
@ 2017-09-12 22:48 ` Sage Weil
2017-09-13 0:07 ` Two Spirit
0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2017-09-12 22:48 UTC (permalink / raw)
To: Two Spirit; +Cc: John Spray, ceph-devel
On Tue, 12 Sep 2017, Two Spirit wrote:
> >On Tue, 12 Sep 2017, Two Spirit wrote:
> >> I don't have any OSDs that are down, so the 1 unfound object I think
> >> needs to be manually cleared. I ran across a webpage a while ago that
> >> talked about how to clear it, but if you have a reference, would save
> >> me a little time.
> >
> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>
> Thanks. That was the page I had read earlier.
>
> I've attached the full outputs to this mail and show just clips below.
>
> # ceph health detail
> OBJECT_UNFOUND 1/731529 unfound (0.000%)
> pg 6.2 has 1 unfound objects
>
> There looks like one number that shouldn't be there...
> # ceph pg 6.2 list_missing
> {
> "offset": {
> ...
> "pool": -9223372036854775808,
> "namespace": ""
> },
> ...
I think you've snipped out the bit that has the name of the unfound
object?
sage
>
> # ceph -s
> osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>
> This shows under the pg query that something believes that osd "2" is
> down, but all OSDs are up, as seen in the previous ceph -s command.
> # ceph pg 6.2 query
> "recovery_state": [
> {
> "name": "Started/Primary/Active",
> "enter_time": "2017-09-12 10:33:11.193486",
> "might_have_unfound": [
> {
> "osd": "0",
> "status": "already probed"
> },
> {
> "osd": "1",
> "status": "already probed"
> },
> {
> "osd": "2",
> "status": "osd is down"
> },
> {
> "osd": "4",
> "status": "already probed"
> },
> {
> "osd": "5",
> "status": "already probed"
> }
>
>
> If i go to a couple other OSDs, and run the same command,
> the osd "2" is listed as "already probed". They are not in sync. I
> double checked that all the OSDs were up on all 3 times I ran the
> command.
>
> Now. my question to debug this to figure out if I want to
> "revert|delete", is what in the heck are these file(s)/object(s)
> associated with the pg? I assume this might be in the MDS, but I'd
> like to see a file name associated with this to make a further
> determination of what I should do. I don't have enough information at
> this point to figure out how I should recover.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: clearing unfound objects
2017-09-12 22:48 ` Sage Weil
@ 2017-09-13 0:07 ` Two Spirit
2017-09-13 1:54 ` Sage Weil
0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-13 0:07 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, ceph-devel
I attached the complete output with the previous email.
...
"objects": [
{
"oid": {
"oid": "200.0000052d",
"key": "",
"snapid": -2,
"hash": 2728386690,
"max": 0,
"pool": 6,
"namespace": ""
},
"need": "1496'15853",
"have": "0'0",
"flags": "none",
"locations": []
}
So it goes Filename -> OID -> PG -> OSD? So if I trace down
"200.0000052d" I should be able to clear the problem? I seem to get
files in the lost+found directory think from fsck. Does the deep
scrubbing eventually clear these after a week or will they always
require manual intervention?
On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> talked about how to clear it, but if you have a reference, would save
>> >> me a little time.
>> >
>> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>>
>> Thanks. That was the page I had read earlier.
>>
>> I've attached the full outputs to this mail and show just clips below.
>>
>> # ceph health detail
>> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>> pg 6.2 has 1 unfound objects
>>
>> There looks like one number that shouldn't be there...
>> # ceph pg 6.2 list_missing
>> {
>> "offset": {
>> ...
>> "pool": -9223372036854775808,
>> "namespace": ""
>> },
>> ...
>
> I think you've snipped out the bit that has the name of the unfound
> object?
>
> sage
>
>>
>> # ceph -s
>> osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>>
>> This shows under the pg query that something believes that osd "2" is
>> down, but all OSDs are up, as seen in the previous ceph -s command.
>> # ceph pg 6.2 query
>> "recovery_state": [
>> {
>> "name": "Started/Primary/Active",
>> "enter_time": "2017-09-12 10:33:11.193486",
>> "might_have_unfound": [
>> {
>> "osd": "0",
>> "status": "already probed"
>> },
>> {
>> "osd": "1",
>> "status": "already probed"
>> },
>> {
>> "osd": "2",
>> "status": "osd is down"
>> },
>> {
>> "osd": "4",
>> "status": "already probed"
>> },
>> {
>> "osd": "5",
>> "status": "already probed"
>> }
>>
>>
>> If i go to a couple other OSDs, and run the same command,
>> the osd "2" is listed as "already probed". They are not in sync. I
>> double checked that all the OSDs were up on all 3 times I ran the
>> command.
>>
>> Now. my question to debug this to figure out if I want to
>> "revert|delete", is what in the heck are these file(s)/object(s)
>> associated with the pg? I assume this might be in the MDS, but I'd
>> like to see a file name associated with this to make a further
>> determination of what I should do. I don't have enough information at
>> this point to figure out how I should recover.
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: clearing unfound objects
2017-09-13 0:07 ` Two Spirit
@ 2017-09-13 1:54 ` Sage Weil
2017-09-13 15:46 ` Two Spirit
0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2017-09-13 1:54 UTC (permalink / raw)
To: Two Spirit; +Cc: John Spray, ceph-devel
On Tue, 12 Sep 2017, Two Spirit wrote:
> I attached the complete output with the previous email.
>
> ...
> "objects": [
> {
> "oid": {
> "oid": "200.0000052d",
This is an MDS journal object.. so the MDS is stuck replaying its journal
because it is unfound.
In this case I would do 'revert'.
sage
> "key": "",
> "snapid": -2,
> "hash": 2728386690,
> "max": 0,
> "pool": 6,
> "namespace": ""
> },
> "need": "1496'15853",
> "have": "0'0",
> "flags": "none",
> "locations": []
> }
>
>
> So it goes Filename -> OID -> PG -> OSD? So if I trace down
> "200.0000052d" I should be able to clear the problem? I seem to get
> files in the lost+found directory think from fsck. Does the deep
> scrubbing eventually clear these after a week or will they always
> require manual intervention?
>
> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
> > On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
> >> >> needs to be manually cleared. I ran across a webpage a while ago that
> >> >> talked about how to clear it, but if you have a reference, would save
> >> >> me a little time.
> >> >
> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
> >>
> >> Thanks. That was the page I had read earlier.
> >>
> >> I've attached the full outputs to this mail and show just clips below.
> >>
> >> # ceph health detail
> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
> >> pg 6.2 has 1 unfound objects
> >>
> >> There looks like one number that shouldn't be there...
> >> # ceph pg 6.2 list_missing
> >> {
> >> "offset": {
> >> ...
> >> "pool": -9223372036854775808,
> >> "namespace": ""
> >> },
> >> ...
> >
> > I think you've snipped out the bit that has the name of the unfound
> > object?
> >
> > sage
> >
> >>
> >> # ceph -s
> >> osd: 6 osds: 6 up, 6 in; 10 remapped pgs
> >>
> >> This shows under the pg query that something believes that osd "2" is
> >> down, but all OSDs are up, as seen in the previous ceph -s command.
> >> # ceph pg 6.2 query
> >> "recovery_state": [
> >> {
> >> "name": "Started/Primary/Active",
> >> "enter_time": "2017-09-12 10:33:11.193486",
> >> "might_have_unfound": [
> >> {
> >> "osd": "0",
> >> "status": "already probed"
> >> },
> >> {
> >> "osd": "1",
> >> "status": "already probed"
> >> },
> >> {
> >> "osd": "2",
> >> "status": "osd is down"
> >> },
> >> {
> >> "osd": "4",
> >> "status": "already probed"
> >> },
> >> {
> >> "osd": "5",
> >> "status": "already probed"
> >> }
> >>
> >>
> >> If i go to a couple other OSDs, and run the same command,
> >> the osd "2" is listed as "already probed". They are not in sync. I
> >> double checked that all the OSDs were up on all 3 times I ran the
> >> command.
> >>
> >> Now. my question to debug this to figure out if I want to
> >> "revert|delete", is what in the heck are these file(s)/object(s)
> >> associated with the pg? I assume this might be in the MDS, but I'd
> >> like to see a file name associated with this to make a further
> >> determination of what I should do. I don't have enough information at
> >> this point to figure out how I should recover.
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: clearing unfound objects
2017-09-13 1:54 ` Sage Weil
@ 2017-09-13 15:46 ` Two Spirit
2017-09-13 21:34 ` Two Spirit
0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-13 15:46 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, ceph-devel
You the man. I'm not sure how you figured that out yet. I've got a
little reading to do. Is this considered a bug that the MDS is stuck
and unable to self heal?
On Tue, Sep 12, 2017 at 6:54 PM, Sage Weil <sweil@redhat.com> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> I attached the complete output with the previous email.
>>
>> ...
>> "objects": [
>> {
>> "oid": {
>> "oid": "200.0000052d",
>
> This is an MDS journal object.. so the MDS is stuck replaying its journal
> because it is unfound.
>
> In this case I would do 'revert'.
>
> sage
>
>
>> "key": "",
>> "snapid": -2,
>> "hash": 2728386690,
>> "max": 0,
>> "pool": 6,
>> "namespace": ""
>> },
>> "need": "1496'15853",
>> "have": "0'0",
>> "flags": "none",
>> "locations": []
>> }
>>
>>
>> So it goes Filename -> OID -> PG -> OSD? So if I trace down
>> "200.0000052d" I should be able to clear the problem? I seem to get
>> files in the lost+found directory think from fsck. Does the deep
>> scrubbing eventually clear these after a week or will they always
>> require manual intervention?
>>
>> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
>> > On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> >> talked about how to clear it, but if you have a reference, would save
>> >> >> me a little time.
>> >> >
>> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>> >>
>> >> Thanks. That was the page I had read earlier.
>> >>
>> >> I've attached the full outputs to this mail and show just clips below.
>> >>
>> >> # ceph health detail
>> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>> >> pg 6.2 has 1 unfound objects
>> >>
>> >> There looks like one number that shouldn't be there...
>> >> # ceph pg 6.2 list_missing
>> >> {
>> >> "offset": {
>> >> ...
>> >> "pool": -9223372036854775808,
>> >> "namespace": ""
>> >> },
>> >> ...
>> >
>> > I think you've snipped out the bit that has the name of the unfound
>> > object?
>> >
>> > sage
>> >
>> >>
>> >> # ceph -s
>> >> osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>> >>
>> >> This shows under the pg query that something believes that osd "2" is
>> >> down, but all OSDs are up, as seen in the previous ceph -s command.
>> >> # ceph pg 6.2 query
>> >> "recovery_state": [
>> >> {
>> >> "name": "Started/Primary/Active",
>> >> "enter_time": "2017-09-12 10:33:11.193486",
>> >> "might_have_unfound": [
>> >> {
>> >> "osd": "0",
>> >> "status": "already probed"
>> >> },
>> >> {
>> >> "osd": "1",
>> >> "status": "already probed"
>> >> },
>> >> {
>> >> "osd": "2",
>> >> "status": "osd is down"
>> >> },
>> >> {
>> >> "osd": "4",
>> >> "status": "already probed"
>> >> },
>> >> {
>> >> "osd": "5",
>> >> "status": "already probed"
>> >> }
>> >>
>> >>
>> >> If i go to a couple other OSDs, and run the same command,
>> >> the osd "2" is listed as "already probed". They are not in sync. I
>> >> double checked that all the OSDs were up on all 3 times I ran the
>> >> command.
>> >>
>> >> Now. my question to debug this to figure out if I want to
>> >> "revert|delete", is what in the heck are these file(s)/object(s)
>> >> associated with the pg? I assume this might be in the MDS, but I'd
>> >> like to see a file name associated with this to make a further
>> >> determination of what I should do. I don't have enough information at
>> >> this point to figure out how I should recover.
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: clearing unfound objects
2017-09-13 15:46 ` Two Spirit
@ 2017-09-13 21:34 ` Two Spirit
0 siblings, 0 replies; 6+ messages in thread
From: Two Spirit @ 2017-09-13 21:34 UTC (permalink / raw)
To: Sage Weil; +Cc: John Spray, ceph-devel
did not see that one coming. what to do?
# ceph health detail
FS_DEGRADED 1 filesystem is degraded
fs cephfs is degraded
OBJECT_UNFOUND 1/731509 unfound (0.000%)
pg 6.2 has 1 unfound objects
# ceph pg 6.2 query
"recovery_state": [
{
"name": "Started/Primary/Active",
"enter_time": "2017-09-13 14:22:26.384418",
"might_have_unfound": [
{
"osd": "0",
"status": "already probed"
},
{
"osd": "2",
"status": "already probed"
},
{
"osd": "3",
"status": "already probed"
},
{
"osd": "4",
"status": "already probed"
},
{
"osd": "5",
"status": "already probed"
}
# ceph pg 6.2 mark_unfound_lost revert
pg has no unfound objects
I wasn't expecting that error message. I think that is not correct.
I am a little crazy, so I tried the same thing, expecting a different result
# ceph pg 6.2 mark_unfound_lost revert
pg has 1 objects unfound and apparently lost marking
and I got one, but my filesystem is still degraded. I was expecting
my filesystem to be good again. Do I have to wait for active+clean?
or what do I do now?
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-09-13 21:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12 22:20 clearing unfound objects Two Spirit
2017-09-12 22:48 ` Sage Weil
2017-09-13 0:07 ` Two Spirit
2017-09-13 1:54 ` Sage Weil
2017-09-13 15:46 ` Two Spirit
2017-09-13 21:34 ` Two Spirit
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.