All of lore.kernel.org
 help / color / mirror / Atom feed
* clearing unfound objects
@ 2017-09-12 22:20 Two Spirit
  2017-09-12 22:48 ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-12 22:20 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, ceph-devel

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

>On Tue, 12 Sep 2017, Two Spirit wrote:
>> I don't have any OSDs that are down, so the 1 unfound object I think
>> needs to be manually cleared. I ran across a webpage a while ago that
>> talked about how to clear it, but if you have a reference, would save
>> me a little time.
>
>http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound

Thanks. That was the page I had read earlier.

I've attached the full outputs to this mail and show just clips below.

# ceph health detail
OBJECT_UNFOUND 1/731529 unfound (0.000%)
    pg 6.2 has 1 unfound objects

There looks like one number that shouldn't be there...
# ceph pg 6.2 list_missing
{
    "offset": {
...
        "pool": -9223372036854775808,
        "namespace": ""
    },
...

# ceph -s
    osd: 6 osds: 6 up, 6 in; 10 remapped pgs

This shows under the pg query that something believes that osd "2" is
down, but all OSDs are up, as seen in the previous ceph -s command.
# ceph pg 6.2 query
    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2017-09-12 10:33:11.193486",
            "might_have_unfound": [
                {
                    "osd": "0",
                    "status": "already probed"
                },
                {
                    "osd": "1",
                    "status": "already probed"
                },
                {
                    "osd": "2",
                    "status": "osd is down"
                },
                {
                    "osd": "4",
                    "status": "already probed"
                },
                {
                    "osd": "5",
                    "status": "already probed"
                }


If i go to a couple other OSDs, and run the same command,
the osd "2" is listed as "already probed". They are not in sync. I
double checked that all the OSDs were up on all 3 times I ran the
command.

Now. my question to debug this to figure out if I want to
"revert|delete", is what in the heck are these file(s)/object(s)
associated with the pg? I assume this might be in the MDS, but I'd
like to see a file name associated with this to make a further
determination of what I should do.  I don't have enough information at
this point to figure out how I should recover.

[-- Attachment #2: ceph_pg_6.2_list_missing.out --]
[-- Type: application/octet-stream, Size: 662 bytes --]

{
    "offset": {
        "oid": "",
        "key": "",
        "snapid": 0,
        "hash": 0,
        "max": 0,
        "pool": -9223372036854775808,
        "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
        {
            "oid": {
                "oid": "200.0000052d",
                "key": "",
                "snapid": -2,
                "hash": 2728386690,
                "max": 0,
                "pool": 6,
                "namespace": ""
            },
            "need": "1496'15853",
            "have": "0'0",
            "flags": "none",
            "locations": []
        }
    ],
    "more": false
}

[-- Attachment #3: ceph_pg_6.2_query.out --]
[-- Type: application/octet-stream, Size: 26148 bytes --]

{
    "state": "active+recovery_wait+degraded+remapped",
    "snap_trimq": "[]",
    "epoch": 1692,
    "up": [
        4,
        3
    ],
    "acting": [
        3,
        1
    ],
    "backfill_targets": [
        "4"
    ],
    "actingbackfill": [
        "1",
        "3",
        "4"
    ],
    "info": {
        "pgid": "6.2",
        "last_update": "1496'15853",
        "last_complete": "1496'15852",
        "log_tail": "1473'13058",
        "last_user_version": 15853,
        "last_backfill": "MAX",
        "last_backfill_bitwise": 1,
        "purged_snaps": [],
        "history": {
            "epoch_created": 22,
            "epoch_pool_created": 22,
            "last_epoch_started": 1654,
            "last_interval_started": 1653,
            "last_epoch_clean": 1482,
            "last_interval_clean": 1479,
            "last_epoch_split": 1191,
            "last_epoch_marked_full": 0,
            "same_up_since": 1653,
            "same_interval_since": 1653,
            "same_primary_since": 1651,
            "last_scrub": "1184'12948",
            "last_scrub_stamp": "2017-09-02 08:14:28.065869",
            "last_deep_scrub": "1184'12948",
            "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
            "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
        },
        "stats": {
            "version": "1496'15853",
            "reported_seq": "37466",
            "reported_epoch": "1692",
            "state": "active+recovery_wait+degraded+remapped",
            "last_fresh": "2017-09-12 14:40:44.555997",
            "last_change": "2017-09-12 14:40:44.555997",
            "last_active": "2017-09-12 14:40:44.555997",
            "last_peered": "2017-09-12 14:40:44.555997",
            "last_clean": "2017-09-02 23:51:42.040480",
            "last_became_active": "2017-09-12 10:33:11.231301",
            "last_became_peered": "2017-09-12 10:33:11.231301",
            "last_unstale": "2017-09-12 14:40:44.555997",
            "last_undegraded": "2017-09-12 10:33:11.193432",
            "last_fullsized": "2017-09-12 14:40:44.555997",
            "mapping_epoch": 1653,
            "log_start": "1473'13058",
            "ondisk_log_start": "1473'13058",
            "created": 22,
            "last_epoch_clean": 1482,
            "parent": "0.0",
            "parent_split_bits": 0,
            "last_scrub": "1184'12948",
            "last_scrub_stamp": "2017-09-02 08:14:28.065869",
            "last_deep_scrub": "1184'12948",
            "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
            "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869",
            "log_size": 2795,
            "ondisk_log_size": 2795,
            "stats_invalid": true,
            "dirty_stats_invalid": false,
            "omap_stats_invalid": false,
            "hitset_stats_invalid": false,
            "hitset_bytes_stats_invalid": false,
            "pin_stats_invalid": false,
            "stat_sum": {
                "num_bytes": 9177517,
                "num_objects": 649,
                "num_object_clones": 0,
                "num_object_copies": 1298,
                "num_objects_missing_on_primary": 1,
                "num_objects_missing": 0,
                "num_objects_degraded": 2,
                "num_objects_misplaced": 648,
                "num_objects_unfound": 1,
                "num_objects_dirty": 649,
                "num_whiteouts": 0,
                "num_read": 15129,
                "num_read_kb": 168601,
                "num_write": 10611,
                "num_write_kb": 152359,
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                "num_objects_recovered": 946,
                "num_bytes_recovered": 4454578,
                "num_keys_recovered": 25320,
                "num_objects_omap": 646,
                "num_objects_hit_set_archive": 0,
                "num_bytes_hit_set_archive": 0,
                "num_flush": 0,
                "num_flush_kb": 0,
                "num_evict": 0,
                "num_evict_kb": 0,
                "num_promote": 0,
                "num_flush_mode_high": 0,
                "num_flush_mode_low": 0,
                "num_evict_mode_some": 0,
                "num_evict_mode_full": 0,
                "num_objects_pinned": 0,
                "num_legacy_snapsets": 0
            },
            "up": [
                4,
                3
            ],
            "acting": [
                3,
                1
            ],
            "blocked_by": [],
            "up_primary": 4,
            "acting_primary": 3
        },
        "empty": 0,
        "dne": 0,
        "incomplete": 0,
        "last_epoch_started": 1654,
        "hit_set_history": {
            "current_last_update": "0'0",
            "history": []
        }
    },
    "peer_info": [
        {
            "peer": "0",
            "pgid": "6.2",
            "last_update": "1496'15853",
            "last_complete": "1496'15853",
            "log_tail": "1473'14353",
            "last_user_version": 0,
            "last_backfill": "MIN",
            "last_backfill_bitwise": 1,
            "purged_snaps": [],
            "history": {
                "epoch_created": 22,
                "epoch_pool_created": 22,
                "last_epoch_started": 1652,
                "last_interval_started": 1651,
                "last_epoch_clean": 1482,
                "last_interval_clean": 1479,
                "last_epoch_split": 1191,
                "last_epoch_marked_full": 0,
                "same_up_since": 1653,
                "same_interval_since": 1653,
                "same_primary_since": 1651,
                "last_scrub": "1184'12948",
                "last_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_deep_scrub": "1184'12948",
                "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
            },
            "stats": {
                "version": "0'0",
                "reported_seq": "0",
                "reported_epoch": "0",
                "state": "unknown",
                "last_fresh": "0.000000",
                "last_change": "0.000000",
                "last_active": "0.000000",
                "last_peered": "0.000000",
                "last_clean": "0.000000",
                "last_became_active": "0.000000",
                "last_became_peered": "0.000000",
                "last_unstale": "0.000000",
                "last_undegraded": "0.000000",
                "last_fullsized": "0.000000",
                "mapping_epoch": 1653,
                "log_start": "0'0",
                "ondisk_log_start": "0'0",
                "created": 0,
                "last_epoch_clean": 0,
                "parent": "0.0",
                "parent_split_bits": 0,
                "last_scrub": "0'0",
                "last_scrub_stamp": "0.000000",
                "last_deep_scrub": "0'0",
                "last_deep_scrub_stamp": "0.000000",
                "last_clean_scrub_stamp": "0.000000",
                "log_size": 0,
                "ondisk_log_size": 0,
                "stats_invalid": false,
                "dirty_stats_invalid": false,
                "omap_stats_invalid": false,
                "hitset_stats_invalid": false,
                "hitset_bytes_stats_invalid": false,
                "pin_stats_invalid": false,
                "stat_sum": {
                    "num_bytes": 0,
                    "num_objects": 0,
                    "num_object_clones": 0,
                    "num_object_copies": 0,
                    "num_objects_missing_on_primary": 0,
                    "num_objects_missing": 0,
                    "num_objects_degraded": 0,
                    "num_objects_misplaced": 0,
                    "num_objects_unfound": 0,
                    "num_objects_dirty": 0,
                    "num_whiteouts": 0,
                    "num_read": 0,
                    "num_read_kb": 0,
                    "num_write": 0,
                    "num_write_kb": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_objects_recovered": 0,
                    "num_bytes_recovered": 0,
                    "num_keys_recovered": 0,
                    "num_objects_omap": 0,
                    "num_objects_hit_set_archive": 0,
                    "num_bytes_hit_set_archive": 0,
                    "num_flush": 0,
                    "num_flush_kb": 0,
                    "num_evict": 0,
                    "num_evict_kb": 0,
                    "num_promote": 0,
                    "num_flush_mode_high": 0,
                    "num_flush_mode_low": 0,
                    "num_evict_mode_some": 0,
                    "num_evict_mode_full": 0,
                    "num_objects_pinned": 0,
                    "num_legacy_snapsets": 0
                },
                "up": [
                    4,
                    3
                ],
                "acting": [
                    3,
                    1
                ],
                "blocked_by": [],
                "up_primary": 4,
                "acting_primary": 3
            },
            "empty": 0,
            "dne": 0,
            "incomplete": 1,
            "last_epoch_started": 1607,
            "hit_set_history": {
                "current_last_update": "0'0",
                "history": []
            }
        },
        {
            "peer": "1",
            "pgid": "6.2",
            "last_update": "1496'15853",
            "last_complete": "1496'15852",
            "log_tail": "1473'13058",
            "last_user_version": 15853,
            "last_backfill": "MAX",
            "last_backfill_bitwise": 0,
            "purged_snaps": [],
            "history": {
                "epoch_created": 22,
                "epoch_pool_created": 22,
                "last_epoch_started": 1654,
                "last_interval_started": 1653,
                "last_epoch_clean": 1482,
                "last_interval_clean": 1479,
                "last_epoch_split": 1191,
                "last_epoch_marked_full": 0,
                "same_up_since": 1653,
                "same_interval_since": 1653,
                "same_primary_since": 1651,
                "last_scrub": "1184'12948",
                "last_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_deep_scrub": "1184'12948",
                "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
            },
            "stats": {
                "version": "1496'15853",
                "reported_seq": "37756",
                "reported_epoch": "1650",
                "state": "remapped",
                "last_fresh": "2017-09-12 10:31:48.226105",
                "last_change": "2017-09-12 10:31:48.226105",
                "last_active": "2017-09-12 10:31:48.155329",
                "last_peered": "2017-09-12 10:31:05.287521",
                "last_clean": "2017-09-02 23:51:42.040480",
                "last_became_active": "2017-09-12 10:31:05.287078",
                "last_became_peered": "2017-09-12 10:31:05.287078",
                "last_unstale": "2017-09-12 10:31:48.226105",
                "last_undegraded": "2017-09-12 10:31:48.226105",
                "last_fullsized": "2017-09-12 10:31:48.226105",
                "mapping_epoch": 1653,
                "log_start": "1473'13058",
                "ondisk_log_start": "1473'13058",
                "created": 22,
                "last_epoch_clean": 1482,
                "parent": "0.0",
                "parent_split_bits": 0,
                "last_scrub": "1184'12948",
                "last_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_deep_scrub": "1184'12948",
                "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869",
                "log_size": 2795,
                "ondisk_log_size": 2795,
                "stats_invalid": true,
                "dirty_stats_invalid": false,
                "omap_stats_invalid": false,
                "hitset_stats_invalid": false,
                "hitset_bytes_stats_invalid": false,
                "pin_stats_invalid": false,
                "stat_sum": {
                    "num_bytes": 9177517,
                    "num_objects": 649,
                    "num_object_clones": 0,
                    "num_object_copies": 1298,
                    "num_objects_missing_on_primary": 1,
                    "num_objects_missing": 1,
                    "num_objects_degraded": 0,
                    "num_objects_misplaced": 0,
                    "num_objects_unfound": 0,
                    "num_objects_dirty": 649,
                    "num_whiteouts": 0,
                    "num_read": 15129,
                    "num_read_kb": 168601,
                    "num_write": 10611,
                    "num_write_kb": 152359,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_objects_recovered": 946,
                    "num_bytes_recovered": 4454578,
                    "num_keys_recovered": 25320,
                    "num_objects_omap": 646,
                    "num_objects_hit_set_archive": 0,
                    "num_bytes_hit_set_archive": 0,
                    "num_flush": 0,
                    "num_flush_kb": 0,
                    "num_evict": 0,
                    "num_evict_kb": 0,
                    "num_promote": 0,
                    "num_flush_mode_high": 0,
                    "num_flush_mode_low": 0,
                    "num_evict_mode_some": 0,
                    "num_evict_mode_full": 0,
                    "num_objects_pinned": 0,
                    "num_legacy_snapsets": 0
                },
                "up": [
                    4,
                    3
                ],
                "acting": [
                    3,
                    1
                ],
                "blocked_by": [],
                "up_primary": 4,
                "acting_primary": 3
            },
            "empty": 0,
            "dne": 0,
            "incomplete": 0,
            "last_epoch_started": 1654,
            "hit_set_history": {
                "current_last_update": "0'0",
                "history": []
            }
        },
        {
            "peer": "4",
            "pgid": "6.2",
            "last_update": "1496'15853",
            "last_complete": "1496'15853",
            "log_tail": "1473'14353",
            "last_user_version": 0,
            "last_backfill": "MIN",
            "last_backfill_bitwise": 1,
            "purged_snaps": [],
            "history": {
                "epoch_created": 22,
                "epoch_pool_created": 22,
                "last_epoch_started": 1654,
                "last_interval_started": 1653,
                "last_epoch_clean": 1482,
                "last_interval_clean": 1479,
                "last_epoch_split": 1191,
                "last_epoch_marked_full": 0,
                "same_up_since": 1653,
                "same_interval_since": 1653,
                "same_primary_since": 1651,
                "last_scrub": "1184'12948",
                "last_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_deep_scrub": "1184'12948",
                "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
            },
            "stats": {
                "version": "0'0",
                "reported_seq": "0",
                "reported_epoch": "0",
                "state": "unknown",
                "last_fresh": "0.000000",
                "last_change": "0.000000",
                "last_active": "0.000000",
                "last_peered": "0.000000",
                "last_clean": "0.000000",
                "last_became_active": "0.000000",
                "last_became_peered": "0.000000",
                "last_unstale": "0.000000",
                "last_undegraded": "0.000000",
                "last_fullsized": "0.000000",
                "mapping_epoch": 1653,
                "log_start": "0'0",
                "ondisk_log_start": "0'0",
                "created": 0,
                "last_epoch_clean": 0,
                "parent": "0.0",
                "parent_split_bits": 0,
                "last_scrub": "0'0",
                "last_scrub_stamp": "0.000000",
                "last_deep_scrub": "0'0",
                "last_deep_scrub_stamp": "0.000000",
                "last_clean_scrub_stamp": "0.000000",
                "log_size": 0,
                "ondisk_log_size": 0,
                "stats_invalid": false,
                "dirty_stats_invalid": false,
                "omap_stats_invalid": false,
                "hitset_stats_invalid": false,
                "hitset_bytes_stats_invalid": false,
                "pin_stats_invalid": false,
                "stat_sum": {
                    "num_bytes": 0,
                    "num_objects": 0,
                    "num_object_clones": 0,
                    "num_object_copies": 0,
                    "num_objects_missing_on_primary": 0,
                    "num_objects_missing": 0,
                    "num_objects_degraded": 0,
                    "num_objects_misplaced": 0,
                    "num_objects_unfound": 0,
                    "num_objects_dirty": 0,
                    "num_whiteouts": 0,
                    "num_read": 0,
                    "num_read_kb": 0,
                    "num_write": 0,
                    "num_write_kb": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_objects_recovered": 0,
                    "num_bytes_recovered": 0,
                    "num_keys_recovered": 0,
                    "num_objects_omap": 0,
                    "num_objects_hit_set_archive": 0,
                    "num_bytes_hit_set_archive": 0,
                    "num_flush": 0,
                    "num_flush_kb": 0,
                    "num_evict": 0,
                    "num_evict_kb": 0,
                    "num_promote": 0,
                    "num_flush_mode_high": 0,
                    "num_flush_mode_low": 0,
                    "num_evict_mode_some": 0,
                    "num_evict_mode_full": 0,
                    "num_objects_pinned": 0,
                    "num_legacy_snapsets": 0
                },
                "up": [
                    4,
                    3
                ],
                "acting": [
                    3,
                    1
                ],
                "blocked_by": [],
                "up_primary": 4,
                "acting_primary": 3
            },
            "empty": 0,
            "dne": 0,
            "incomplete": 1,
            "last_epoch_started": 1654,
            "hit_set_history": {
                "current_last_update": "0'0",
                "history": []
            }
        },
        {
            "peer": "5",
            "pgid": "6.2",
            "last_update": "1496'15853",
            "last_complete": "1496'15853",
            "log_tail": "1473'14353",
            "last_user_version": 0,
            "last_backfill": "MIN",
            "last_backfill_bitwise": 1,
            "purged_snaps": [],
            "history": {
                "epoch_created": 22,
                "epoch_pool_created": 22,
                "last_epoch_started": 1652,
                "last_interval_started": 1651,
                "last_epoch_clean": 1482,
                "last_interval_clean": 1479,
                "last_epoch_split": 1191,
                "last_epoch_marked_full": 0,
                "same_up_since": 1653,
                "same_interval_since": 1653,
                "same_primary_since": 1651,
                "last_scrub": "1184'12948",
                "last_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_deep_scrub": "1184'12948",
                "last_deep_scrub_stamp": "2017-09-02 08:14:28.065869",
                "last_clean_scrub_stamp": "2017-09-02 08:14:28.065869"
            },
            "stats": {
                "version": "0'0",
                "reported_seq": "0",
                "reported_epoch": "0",
                "state": "unknown",
                "last_fresh": "0.000000",
                "last_change": "0.000000",
                "last_active": "0.000000",
                "last_peered": "0.000000",
                "last_clean": "0.000000",
                "last_became_active": "0.000000",
                "last_became_peered": "0.000000",
                "last_unstale": "0.000000",
                "last_undegraded": "0.000000",
                "last_fullsized": "0.000000",
                "mapping_epoch": 1653,
                "log_start": "0'0",
                "ondisk_log_start": "0'0",
                "created": 0,
                "last_epoch_clean": 0,
                "parent": "0.0",
                "parent_split_bits": 0,
                "last_scrub": "0'0",
                "last_scrub_stamp": "0.000000",
                "last_deep_scrub": "0'0",
                "last_deep_scrub_stamp": "0.000000",
                "last_clean_scrub_stamp": "0.000000",
                "log_size": 0,
                "ondisk_log_size": 0,
                "stats_invalid": false,
                "dirty_stats_invalid": false,
                "omap_stats_invalid": false,
                "hitset_stats_invalid": false,
                "hitset_bytes_stats_invalid": false,
                "pin_stats_invalid": false,
                "stat_sum": {
                    "num_bytes": 0,
                    "num_objects": 0,
                    "num_object_clones": 0,
                    "num_object_copies": 0,
                    "num_objects_missing_on_primary": 0,
                    "num_objects_missing": 0,
                    "num_objects_degraded": 0,
                    "num_objects_misplaced": 0,
                    "num_objects_unfound": 0,
                    "num_objects_dirty": 0,
                    "num_whiteouts": 0,
                    "num_read": 0,
                    "num_read_kb": 0,
                    "num_write": 0,
                    "num_write_kb": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_objects_recovered": 0,
                    "num_bytes_recovered": 0,
                    "num_keys_recovered": 0,
                    "num_objects_omap": 0,
                    "num_objects_hit_set_archive": 0,
                    "num_bytes_hit_set_archive": 0,
                    "num_flush": 0,
                    "num_flush_kb": 0,
                    "num_evict": 0,
                    "num_evict_kb": 0,
                    "num_promote": 0,
                    "num_flush_mode_high": 0,
                    "num_flush_mode_low": 0,
                    "num_evict_mode_some": 0,
                    "num_evict_mode_full": 0,
                    "num_objects_pinned": 0,
                    "num_legacy_snapsets": 0
                },
                "up": [
                    4,
                    3
                ],
                "acting": [
                    3,
                    1
                ],
                "blocked_by": [],
                "up_primary": 4,
                "acting_primary": 3
            },
            "empty": 0,
            "dne": 0,
            "incomplete": 1,
            "last_epoch_started": 1516,
            "hit_set_history": {
                "current_last_update": "0'0",
                "history": []
            }
        }
    ],
    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2017-09-12 10:33:11.193486",
            "might_have_unfound": [
                {
                    "osd": "0",
                    "status": "already probed"
                },
                {
                    "osd": "1",
                    "status": "already probed"
                },
                {
                    "osd": "2",
                    "status": "osd is down"
                },
                {
                    "osd": "4",
                    "status": "already probed"
                },
                {
                    "osd": "5",
                    "status": "already probed"
                }
            ],
            "recovery_progress": {
                "backfill_targets": [
                    "4"
                ],
                "waiting_on_backfill": [],
                "last_backfill_started": "MIN",
                "backfill_info": {
                    "begin": "MIN",
                    "end": "MIN",
                    "objects": []
                },
                "peer_backfill_info": [],
                "backfills_in_flight": [],
                "recovering": [],
                "pg_backend": {
                    "pull_from_peer": [],
                    "pushing": []
                }
            },
            "scrub": {
                "scrubber.epoch_start": "0",
                "scrubber.active": false,
                "scrubber.state": "INACTIVE",
                "scrubber.start": "MIN",
                "scrubber.end": "MIN",
                "scrubber.subset_last_update": "0'0",
                "scrubber.deep": false,
                "scrubber.seed": 0,
                "scrubber.waiting_on": 0,
                "scrubber.waiting_on_whom": []
            }
        },
        {
            "name": "Started",
            "enter_time": "2017-09-12 10:33:10.326608"
        }
    ],
    "agent_state": {}
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: clearing unfound objects
  2017-09-12 22:20 clearing unfound objects Two Spirit
@ 2017-09-12 22:48 ` Sage Weil
  2017-09-13  0:07   ` Two Spirit
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2017-09-12 22:48 UTC (permalink / raw)
  To: Two Spirit; +Cc: John Spray, ceph-devel

On Tue, 12 Sep 2017, Two Spirit wrote:
> >On Tue, 12 Sep 2017, Two Spirit wrote:
> >> I don't have any OSDs that are down, so the 1 unfound object I think
> >> needs to be manually cleared. I ran across a webpage a while ago that
> >> talked about how to clear it, but if you have a reference, would save
> >> me a little time.
> >
> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
> 
> Thanks. That was the page I had read earlier.
> 
> I've attached the full outputs to this mail and show just clips below.
> 
> # ceph health detail
> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>     pg 6.2 has 1 unfound objects
> 
> There looks like one number that shouldn't be there...
> # ceph pg 6.2 list_missing
> {
>     "offset": {
> ...
>         "pool": -9223372036854775808,
>         "namespace": ""
>     },
> ...

I think you've snipped out the bit that has the name of the unfound 
object?

sage

> 
> # ceph -s
>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
> 
> This shows under the pg query that something believes that osd "2" is
> down, but all OSDs are up, as seen in the previous ceph -s command.
> # ceph pg 6.2 query
>     "recovery_state": [
>         {
>             "name": "Started/Primary/Active",
>             "enter_time": "2017-09-12 10:33:11.193486",
>             "might_have_unfound": [
>                 {
>                     "osd": "0",
>                     "status": "already probed"
>                 },
>                 {
>                     "osd": "1",
>                     "status": "already probed"
>                 },
>                 {
>                     "osd": "2",
>                     "status": "osd is down"
>                 },
>                 {
>                     "osd": "4",
>                     "status": "already probed"
>                 },
>                 {
>                     "osd": "5",
>                     "status": "already probed"
>                 }
> 
> 
> If i go to a couple other OSDs, and run the same command,
> the osd "2" is listed as "already probed". They are not in sync. I
> double checked that all the OSDs were up on all 3 times I ran the
> command.
> 
> Now. my question to debug this to figure out if I want to
> "revert|delete", is what in the heck are these file(s)/object(s)
> associated with the pg? I assume this might be in the MDS, but I'd
> like to see a file name associated with this to make a further
> determination of what I should do.  I don't have enough information at
> this point to figure out how I should recover.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: clearing unfound objects
  2017-09-12 22:48 ` Sage Weil
@ 2017-09-13  0:07   ` Two Spirit
  2017-09-13  1:54     ` Sage Weil
  0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-13  0:07 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, ceph-devel

I attached the complete output with the previous email.

...
    "objects": [
        {
            "oid": {
                "oid": "200.0000052d",
                "key": "",
                "snapid": -2,
                "hash": 2728386690,
                "max": 0,
                "pool": 6,
                "namespace": ""
            },
            "need": "1496'15853",
            "have": "0'0",
            "flags": "none",
            "locations": []
        }


So it goes Filename -> OID -> PG -> OSD? So if I trace down
"200.0000052d" I should be able to clear the problem? I seem to get
files in the lost+found directory think from fsck. Does the deep
scrubbing eventually clear these after a week or will they always
require manual intervention?

On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> talked about how to clear it, but if you have a reference, would save
>> >> me a little time.
>> >
>> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>>
>> Thanks. That was the page I had read earlier.
>>
>> I've attached the full outputs to this mail and show just clips below.
>>
>> # ceph health detail
>> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>>     pg 6.2 has 1 unfound objects
>>
>> There looks like one number that shouldn't be there...
>> # ceph pg 6.2 list_missing
>> {
>>     "offset": {
>> ...
>>         "pool": -9223372036854775808,
>>         "namespace": ""
>>     },
>> ...
>
> I think you've snipped out the bit that has the name of the unfound
> object?
>
> sage
>
>>
>> # ceph -s
>>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>>
>> This shows under the pg query that something believes that osd "2" is
>> down, but all OSDs are up, as seen in the previous ceph -s command.
>> # ceph pg 6.2 query
>>     "recovery_state": [
>>         {
>>             "name": "Started/Primary/Active",
>>             "enter_time": "2017-09-12 10:33:11.193486",
>>             "might_have_unfound": [
>>                 {
>>                     "osd": "0",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "1",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "2",
>>                     "status": "osd is down"
>>                 },
>>                 {
>>                     "osd": "4",
>>                     "status": "already probed"
>>                 },
>>                 {
>>                     "osd": "5",
>>                     "status": "already probed"
>>                 }
>>
>>
>> If i go to a couple other OSDs, and run the same command,
>> the osd "2" is listed as "already probed". They are not in sync. I
>> double checked that all the OSDs were up on all 3 times I ran the
>> command.
>>
>> Now. my question to debug this to figure out if I want to
>> "revert|delete", is what in the heck are these file(s)/object(s)
>> associated with the pg? I assume this might be in the MDS, but I'd
>> like to see a file name associated with this to make a further
>> determination of what I should do.  I don't have enough information at
>> this point to figure out how I should recover.
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: clearing unfound objects
  2017-09-13  0:07   ` Two Spirit
@ 2017-09-13  1:54     ` Sage Weil
  2017-09-13 15:46       ` Two Spirit
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2017-09-13  1:54 UTC (permalink / raw)
  To: Two Spirit; +Cc: John Spray, ceph-devel

On Tue, 12 Sep 2017, Two Spirit wrote:
> I attached the complete output with the previous email.
> 
> ...
>     "objects": [
>         {
>             "oid": {
>                 "oid": "200.0000052d",

This is an MDS journal object.. so the MDS is stuck replaying its journal 
because it is unfound.

In this case I would do 'revert'.

sage


>                 "key": "",
>                 "snapid": -2,
>                 "hash": 2728386690,
>                 "max": 0,
>                 "pool": 6,
>                 "namespace": ""
>             },
>             "need": "1496'15853",
>             "have": "0'0",
>             "flags": "none",
>             "locations": []
>         }
> 
> 
> So it goes Filename -> OID -> PG -> OSD? So if I trace down
> "200.0000052d" I should be able to clear the problem? I seem to get
> files in the lost+found directory think from fsck. Does the deep
> scrubbing eventually clear these after a week or will they always
> require manual intervention?
> 
> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
> > On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
> >> >> needs to be manually cleared. I ran across a webpage a while ago that
> >> >> talked about how to clear it, but if you have a reference, would save
> >> >> me a little time.
> >> >
> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
> >>
> >> Thanks. That was the page I had read earlier.
> >>
> >> I've attached the full outputs to this mail and show just clips below.
> >>
> >> # ceph health detail
> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
> >>     pg 6.2 has 1 unfound objects
> >>
> >> There looks like one number that shouldn't be there...
> >> # ceph pg 6.2 list_missing
> >> {
> >>     "offset": {
> >> ...
> >>         "pool": -9223372036854775808,
> >>         "namespace": ""
> >>     },
> >> ...
> >
> > I think you've snipped out the bit that has the name of the unfound
> > object?
> >
> > sage
> >
> >>
> >> # ceph -s
> >>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
> >>
> >> This shows under the pg query that something believes that osd "2" is
> >> down, but all OSDs are up, as seen in the previous ceph -s command.
> >> # ceph pg 6.2 query
> >>     "recovery_state": [
> >>         {
> >>             "name": "Started/Primary/Active",
> >>             "enter_time": "2017-09-12 10:33:11.193486",
> >>             "might_have_unfound": [
> >>                 {
> >>                     "osd": "0",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "1",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "2",
> >>                     "status": "osd is down"
> >>                 },
> >>                 {
> >>                     "osd": "4",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "5",
> >>                     "status": "already probed"
> >>                 }
> >>
> >>
> >> If i go to a couple other OSDs, and run the same command,
> >> the osd "2" is listed as "already probed". They are not in sync. I
> >> double checked that all the OSDs were up on all 3 times I ran the
> >> command.
> >>
> >> Now. my question to debug this to figure out if I want to
> >> "revert|delete", is what in the heck are these file(s)/object(s)
> >> associated with the pg? I assume this might be in the MDS, but I'd
> >> like to see a file name associated with this to make a further
> >> determination of what I should do.  I don't have enough information at
> >> this point to figure out how I should recover.
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: clearing unfound objects
  2017-09-13  1:54     ` Sage Weil
@ 2017-09-13 15:46       ` Two Spirit
  2017-09-13 21:34         ` Two Spirit
  0 siblings, 1 reply; 6+ messages in thread
From: Two Spirit @ 2017-09-13 15:46 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, ceph-devel

You the man. I'm not sure how you figured that out yet. I've got a
little reading to do. Is this considered a bug that the MDS is stuck
and unable to self heal?

On Tue, Sep 12, 2017 at 6:54 PM, Sage Weil <sweil@redhat.com> wrote:
> On Tue, 12 Sep 2017, Two Spirit wrote:
>> I attached the complete output with the previous email.
>>
>> ...
>>     "objects": [
>>         {
>>             "oid": {
>>                 "oid": "200.0000052d",
>
> This is an MDS journal object.. so the MDS is stuck replaying its journal
> because it is unfound.
>
> In this case I would do 'revert'.
>
> sage
>
>
>>                 "key": "",
>>                 "snapid": -2,
>>                 "hash": 2728386690,
>>                 "max": 0,
>>                 "pool": 6,
>>                 "namespace": ""
>>             },
>>             "need": "1496'15853",
>>             "have": "0'0",
>>             "flags": "none",
>>             "locations": []
>>         }
>>
>>
>> So it goes Filename -> OID -> PG -> OSD? So if I trace down
>> "200.0000052d" I should be able to clear the problem? I seem to get
>> files in the lost+found directory think from fsck. Does the deep
>> scrubbing eventually clear these after a week or will they always
>> require manual intervention?
>>
>> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
>> > On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
>> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
>> >> >> needs to be manually cleared. I ran across a webpage a while ago that
>> >> >> talked about how to clear it, but if you have a reference, would save
>> >> >> me a little time.
>> >> >
>> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
>> >>
>> >> Thanks. That was the page I had read earlier.
>> >>
>> >> I've attached the full outputs to this mail and show just clips below.
>> >>
>> >> # ceph health detail
>> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
>> >>     pg 6.2 has 1 unfound objects
>> >>
>> >> There looks like one number that shouldn't be there...
>> >> # ceph pg 6.2 list_missing
>> >> {
>> >>     "offset": {
>> >> ...
>> >>         "pool": -9223372036854775808,
>> >>         "namespace": ""
>> >>     },
>> >> ...
>> >
>> > I think you've snipped out the bit that has the name of the unfound
>> > object?
>> >
>> > sage
>> >
>> >>
>> >> # ceph -s
>> >>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
>> >>
>> >> This shows under the pg query that something believes that osd "2" is
>> >> down, but all OSDs are up, as seen in the previous ceph -s command.
>> >> # ceph pg 6.2 query
>> >>     "recovery_state": [
>> >>         {
>> >>             "name": "Started/Primary/Active",
>> >>             "enter_time": "2017-09-12 10:33:11.193486",
>> >>             "might_have_unfound": [
>> >>                 {
>> >>                     "osd": "0",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "1",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "2",
>> >>                     "status": "osd is down"
>> >>                 },
>> >>                 {
>> >>                     "osd": "4",
>> >>                     "status": "already probed"
>> >>                 },
>> >>                 {
>> >>                     "osd": "5",
>> >>                     "status": "already probed"
>> >>                 }
>> >>
>> >>
>> >> If i go to a couple other OSDs, and run the same command,
>> >> the osd "2" is listed as "already probed". They are not in sync. I
>> >> double checked that all the OSDs were up on all 3 times I ran the
>> >> command.
>> >>
>> >> Now. my question to debug this to figure out if I want to
>> >> "revert|delete", is what in the heck are these file(s)/object(s)
>> >> associated with the pg? I assume this might be in the MDS, but I'd
>> >> like to see a file name associated with this to make a further
>> >> determination of what I should do.  I don't have enough information at
>> >> this point to figure out how I should recover.
>> >>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: clearing unfound objects
  2017-09-13 15:46       ` Two Spirit
@ 2017-09-13 21:34         ` Two Spirit
  0 siblings, 0 replies; 6+ messages in thread
From: Two Spirit @ 2017-09-13 21:34 UTC (permalink / raw)
  To: Sage Weil; +Cc: John Spray, ceph-devel

did not see that one coming. what to do?

# ceph health detail

FS_DEGRADED 1 filesystem is degraded
    fs cephfs is degraded
OBJECT_UNFOUND 1/731509 unfound (0.000%)
    pg 6.2 has 1 unfound objects

# ceph pg 6.2 query
    "recovery_state": [
        {
            "name": "Started/Primary/Active",
            "enter_time": "2017-09-13 14:22:26.384418",
            "might_have_unfound": [
                {
                    "osd": "0",
                    "status": "already probed"
                },
                {
                    "osd": "2",
                    "status": "already probed"
                },
                {
                    "osd": "3",
                    "status": "already probed"
                },
                {
                    "osd": "4",
                    "status": "already probed"
                },
                {
                    "osd": "5",
                    "status": "already probed"
                }

# ceph pg 6.2 mark_unfound_lost revert
pg has no unfound objects

I wasn't expecting that error message. I think that is not correct.
I am a little crazy, so I tried the same thing, expecting a different result

# ceph pg 6.2 mark_unfound_lost revert
pg has 1 objects unfound and apparently lost marking

and I got one, but my filesystem is still degraded. I was expecting
my filesystem to be good again. Do I have to wait for active+clean?
or what do I do now?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-13 21:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12 22:20 clearing unfound objects Two Spirit
2017-09-12 22:48 ` Sage Weil
2017-09-13  0:07   ` Two Spirit
2017-09-13  1:54     ` Sage Weil
2017-09-13 15:46       ` Two Spirit
2017-09-13 21:34         ` Two Spirit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.