All of lore.kernel.org
 help / color / mirror / Atom feed
* Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
@ 2017-11-28  2:46 Cary
  2017-11-28  3:09 ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-11-28  2:46 UTC (permalink / raw)
  To: ceph-devel

Hello,

 Could someone please help me complete my botched upgrade from Jewel
10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
2 OSDs each.

 My OSD servers were accidentally rebooted before the monitor servers
causing them to be running Luminous before the monitors. All services
have been restarted and running ceph versions gives the following:

# ceph versions
2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs

    "mon": {
        "ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
    },
    "mgr": {
        "ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
    },
    "osd": {},
    "mds": {
        "ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
    },
    "overall": {
        "ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8



For some reason the OSDs do not show what version they are running,
and a ceph osd tree shows all of the OSD as being down.

 # ceph osd tree
2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
-1       27.77998 root default
-3       27.77998     datacenter DC1
-6       27.77998         rack 1B06
-5        6.48000             host ceph3
 1        1.84000                 osd.1    down        0 1.00000
 3        4.64000                 osd.3    down        0 1.00000
-2        5.53999             host ceph4
 5        4.64000                 osd.5    down        0 1.00000
 8        0.89999                 osd.8    down        0 1.00000
-4        9.28000             host ceph6
 0        4.64000                 osd.0    down        0 1.00000
 2        4.64000                 osd.2    down        0 1.00000
-7        6.48000             host ceph7
 6        4.64000                 osd.6    down        0 1.00000
 7        1.84000                 osd.7    down        0 1.00000

The OSD logs all have this message:

20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.


When I try to set it with "ceph osd set require_jewel_osds" I get this error:

Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature



A "ceph features" returns:

    "mon": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        }
    },
    "mds": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 1
        }
    },
    "osd": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 8
        }
    },
    "client": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 3

 # ceph tell osd.* versions
2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
Error ENXIO: problem getting command descriptions from osd.0
osd.0: problem getting command descriptions from osd.0
Error ENXIO: problem getting command descriptions from osd.1
osd.1: problem getting command descriptions from osd.1
Error ENXIO: problem getting command descriptions from osd.2
osd.2: problem getting command descriptions from osd.2
Error ENXIO: problem getting command descriptions from osd.3
osd.3: problem getting command descriptions from osd.3
Error ENXIO: problem getting command descriptions from osd.5
osd.5: problem getting command descriptions from osd.5
Error ENXIO: problem getting command descriptions from osd.6
osd.6: problem getting command descriptions from osd.6
Error ENXIO: problem getting command descriptions from osd.7
osd.7: problem getting command descriptions from osd.7
Error ENXIO: problem getting command descriptions from osd.8
osd.8: problem getting command descriptions from osd.8

 # ceph daemon osd.1 status

    "cluster_fsid": "CENSORED",
    "osd_fsid": "CENSORED",
    "whoami": 1,
    "state": "preboot",
    "oldest_map": 19482,
    "newest_map": 20235,
    "num_pgs": 141

 # ceph -s
2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
  cluster:
    id:     CENSORED
    health: HEALTH_ERR
            513 pgs are stuck inactive for more than 60 seconds
            126 pgs backfill_wait
            52 pgs backfilling
            435 pgs degraded
            513 pgs stale
            435 pgs stuck degraded
            513 pgs stuck stale
            435 pgs stuck unclean
            435 pgs stuck undersized
            435 pgs undersized
            recovery 854719/3688140 objects degraded (23.175%)
            recovery 838607/3688140 objects misplaced (22.738%)
            mds cluster is degraded
            crush map has straw_calc_version=0

  services:
    mon: 4 daemons, quorum 0,1,3,2
    mgr: 0(active), standbys: 1, 5
    mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
    osd: 8 osds: 0 up, 0 in

  data:
    pools:   7 pools, 513 pgs
    objects: 1199k objects, 4510 GB
    usage:   13669 GB used, 15150 GB / 28876 GB avail
    pgs:     854719/3688140 objects degraded (23.175%)
             838607/3688140 objects misplaced (22.738%)
             257 stale+active+undersized+degraded
             126 stale+active+undersized+degraded+remapped+backfill_wait
             78  stale+active+clean
             52  stale+active+undersized+degraded+remapped+backfilling


I ran "ceph auth list", and client.admin has the following permissions.
auid: 0
caps: [mds] allow
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *

Thank you for your time.

Is there any way I can get these OSDs to join the cluster now, or
recover my data?

Cary
-Dynamic

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28  2:46 Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap Cary
@ 2017-11-28  3:09 ` Sage Weil
  2017-11-28  3:45   ` Cary
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2017-11-28  3:09 UTC (permalink / raw)
  To: Cary; +Cc: ceph-devel

On Tue, 28 Nov 2017, Cary wrote:
> Hello,
> 
>  Could someone please help me complete my botched upgrade from Jewel
> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
> 2 OSDs each.
> 
>  My OSD servers were accidentally rebooted before the monitor servers
> causing them to be running Luminous before the monitors. All services
> have been restarted and running ceph versions gives the following:
> 
> # ceph versions
> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 
>     "mon": {
>         "ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>     },
>     "mgr": {
>         "ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>     },
>     "osd": {},
>     "mds": {
>         "ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>     },
>     "overall": {
>         "ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
> 
> 
> 
> For some reason the OSDs do not show what version they are running,
> and a ceph osd tree shows all of the OSD as being down.
> 
>  # ceph osd tree
> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
> -1       27.77998 root default
> -3       27.77998     datacenter DC1
> -6       27.77998         rack 1B06
> -5        6.48000             host ceph3
>  1        1.84000                 osd.1    down        0 1.00000
>  3        4.64000                 osd.3    down        0 1.00000
> -2        5.53999             host ceph4
>  5        4.64000                 osd.5    down        0 1.00000
>  8        0.89999                 osd.8    down        0 1.00000
> -4        9.28000             host ceph6
>  0        4.64000                 osd.0    down        0 1.00000
>  2        4.64000                 osd.2    down        0 1.00000
> -7        6.48000             host ceph7
>  6        4.64000                 osd.6    down        0 1.00000
>  7        1.84000                 osd.7    down        0 1.00000
> 
> The OSD logs all have this message:
> 
> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.

THis is an annoying corner condition.  12.2.2 (out soon!)  will have a 
--force option to set the flag even tho no osds are up.  Until then, the 
workaround is to downgrade one host to jewel, start one jewel osd, then 
set the flag.  Then upgrade to luminous again and restart all osds.

sage


> 
> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
> 
> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> 
> 
> 
> A "ceph features" returns:
> 
>     "mon": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 4
>         }
>     },
>     "mds": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 1
>         }
>     },
>     "osd": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 8
>         }
>     },
>     "client": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 3
> 
>  # ceph tell osd.* versions
> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> Error ENXIO: problem getting command descriptions from osd.0
> osd.0: problem getting command descriptions from osd.0
> Error ENXIO: problem getting command descriptions from osd.1
> osd.1: problem getting command descriptions from osd.1
> Error ENXIO: problem getting command descriptions from osd.2
> osd.2: problem getting command descriptions from osd.2
> Error ENXIO: problem getting command descriptions from osd.3
> osd.3: problem getting command descriptions from osd.3
> Error ENXIO: problem getting command descriptions from osd.5
> osd.5: problem getting command descriptions from osd.5
> Error ENXIO: problem getting command descriptions from osd.6
> osd.6: problem getting command descriptions from osd.6
> Error ENXIO: problem getting command descriptions from osd.7
> osd.7: problem getting command descriptions from osd.7
> Error ENXIO: problem getting command descriptions from osd.8
> osd.8: problem getting command descriptions from osd.8
> 
>  # ceph daemon osd.1 status
> 
>     "cluster_fsid": "CENSORED",
>     "osd_fsid": "CENSORED",
>     "whoami": 1,
>     "state": "preboot",
>     "oldest_map": 19482,
>     "newest_map": 20235,
>     "num_pgs": 141
> 
>  # ceph -s
> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
>   cluster:
>     id:     CENSORED
>     health: HEALTH_ERR
>             513 pgs are stuck inactive for more than 60 seconds
>             126 pgs backfill_wait
>             52 pgs backfilling
>             435 pgs degraded
>             513 pgs stale
>             435 pgs stuck degraded
>             513 pgs stuck stale
>             435 pgs stuck unclean
>             435 pgs stuck undersized
>             435 pgs undersized
>             recovery 854719/3688140 objects degraded (23.175%)
>             recovery 838607/3688140 objects misplaced (22.738%)
>             mds cluster is degraded
>             crush map has straw_calc_version=0
> 
>   services:
>     mon: 4 daemons, quorum 0,1,3,2
>     mgr: 0(active), standbys: 1, 5
>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>     osd: 8 osds: 0 up, 0 in
> 
>   data:
>     pools:   7 pools, 513 pgs
>     objects: 1199k objects, 4510 GB
>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>     pgs:     854719/3688140 objects degraded (23.175%)
>              838607/3688140 objects misplaced (22.738%)
>              257 stale+active+undersized+degraded
>              126 stale+active+undersized+degraded+remapped+backfill_wait
>              78  stale+active+clean
>              52  stale+active+undersized+degraded+remapped+backfilling
> 
> 
> I ran "ceph auth list", and client.admin has the following permissions.
> auid: 0
> caps: [mds] allow
> caps: [mgr] allow *
> caps: [mon] allow *
> caps: [osd] allow *
> 
> Thank you for your time.
> 
> Is there any way I can get these OSDs to join the cluster now, or
> recover my data?
> 
> Cary
> -Dynamic
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28  3:09 ` Sage Weil
@ 2017-11-28  3:45   ` Cary
  2017-11-28 13:09     ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-11-28  3:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

I get this error when I try to start the OSD that has been downgraded
to 10.2.3-r2.

2017-11-28 03:42:35.989754 7fa5e6429940  1
filestore(/var/lib/ceph/osd/ceph-3) upgrade
2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
features unsupported by the executable.
2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
disk! Missing features: compat={},rocompat={},incompat={14=explicit
missing set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
/dev/disk/by-partlabel/ceph-3
2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
(95) Operation not supported

Cary

On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 28 Nov 2017, Cary wrote:
>> Hello,
>>
>>  Could someone please help me complete my botched upgrade from Jewel
>> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
>> 2 OSDs each.
>>
>>  My OSD servers were accidentally rebooted before the monitor servers
>> causing them to be running Luminous before the monitors. All services
>> have been restarted and running ceph versions gives the following:
>>
>> # ceph versions
>> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>>
>>     "mon": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>>     },
>>     "mgr": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>>     },
>>     "osd": {},
>>     "mds": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>>     },
>>     "overall": {
>>         "ceph version 12.2.1
>> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
>>
>>
>>
>> For some reason the OSDs do not show what version they are running,
>> and a ceph osd tree shows all of the OSD as being down.
>>
>>  # ceph osd tree
>> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
>> -1       27.77998 root default
>> -3       27.77998     datacenter DC1
>> -6       27.77998         rack 1B06
>> -5        6.48000             host ceph3
>>  1        1.84000                 osd.1    down        0 1.00000
>>  3        4.64000                 osd.3    down        0 1.00000
>> -2        5.53999             host ceph4
>>  5        4.64000                 osd.5    down        0 1.00000
>>  8        0.89999                 osd.8    down        0 1.00000
>> -4        9.28000             host ceph6
>>  0        4.64000                 osd.0    down        0 1.00000
>>  2        4.64000                 osd.2    down        0 1.00000
>> -7        6.48000             host ceph7
>>  6        4.64000                 osd.6    down        0 1.00000
>>  7        1.84000                 osd.7    down        0 1.00000
>>
>> The OSD logs all have this message:
>>
>> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
>
> THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
> --force option to set the flag even tho no osds are up.  Until then, the
> workaround is to downgrade one host to jewel, start one jewel osd, then
> set the flag.  Then upgrade to luminous again and restart all osds.
>
> sage
>
>
>>
>> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
>>
>> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>>
>>
>>
>> A "ceph features" returns:
>>
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>         }
>>     },
>>     "mds": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 1
>>         }
>>     },
>>     "osd": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 8
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 3
>>
>>  # ceph tell osd.* versions
>> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> Error ENXIO: problem getting command descriptions from osd.0
>> osd.0: problem getting command descriptions from osd.0
>> Error ENXIO: problem getting command descriptions from osd.1
>> osd.1: problem getting command descriptions from osd.1
>> Error ENXIO: problem getting command descriptions from osd.2
>> osd.2: problem getting command descriptions from osd.2
>> Error ENXIO: problem getting command descriptions from osd.3
>> osd.3: problem getting command descriptions from osd.3
>> Error ENXIO: problem getting command descriptions from osd.5
>> osd.5: problem getting command descriptions from osd.5
>> Error ENXIO: problem getting command descriptions from osd.6
>> osd.6: problem getting command descriptions from osd.6
>> Error ENXIO: problem getting command descriptions from osd.7
>> osd.7: problem getting command descriptions from osd.7
>> Error ENXIO: problem getting command descriptions from osd.8
>> osd.8: problem getting command descriptions from osd.8
>>
>>  # ceph daemon osd.1 status
>>
>>     "cluster_fsid": "CENSORED",
>>     "osd_fsid": "CENSORED",
>>     "whoami": 1,
>>     "state": "preboot",
>>     "oldest_map": 19482,
>>     "newest_map": 20235,
>>     "num_pgs": 141
>>
>>  # ceph -s
>> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>>   cluster:
>>     id:     CENSORED
>>     health: HEALTH_ERR
>>             513 pgs are stuck inactive for more than 60 seconds
>>             126 pgs backfill_wait
>>             52 pgs backfilling
>>             435 pgs degraded
>>             513 pgs stale
>>             435 pgs stuck degraded
>>             513 pgs stuck stale
>>             435 pgs stuck unclean
>>             435 pgs stuck undersized
>>             435 pgs undersized
>>             recovery 854719/3688140 objects degraded (23.175%)
>>             recovery 838607/3688140 objects misplaced (22.738%)
>>             mds cluster is degraded
>>             crush map has straw_calc_version=0
>>
>>   services:
>>     mon: 4 daemons, quorum 0,1,3,2
>>     mgr: 0(active), standbys: 1, 5
>>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>>     osd: 8 osds: 0 up, 0 in
>>
>>   data:
>>     pools:   7 pools, 513 pgs
>>     objects: 1199k objects, 4510 GB
>>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>>     pgs:     854719/3688140 objects degraded (23.175%)
>>              838607/3688140 objects misplaced (22.738%)
>>              257 stale+active+undersized+degraded
>>              126 stale+active+undersized+degraded+remapped+backfill_wait
>>              78  stale+active+clean
>>              52  stale+active+undersized+degraded+remapped+backfilling
>>
>>
>> I ran "ceph auth list", and client.admin has the following permissions.
>> auid: 0
>> caps: [mds] allow
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>>
>> Thank you for your time.
>>
>> Is there any way I can get these OSDs to join the cluster now, or
>> recover my data?
>>
>> Cary
>> -Dynamic
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28  3:45   ` Cary
@ 2017-11-28 13:09     ` Sage Weil
  2017-11-28 18:11       ` Cary
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2017-11-28 13:09 UTC (permalink / raw)
  To: Cary; +Cc: ceph-devel

On Tue, 28 Nov 2017, Cary wrote:
> I get this error when I try to start the OSD that has been downgraded
> to 10.2.3-r2.
> 
> 2017-11-28 03:42:35.989754 7fa5e6429940  1
> filestore(/var/lib/ceph/osd/ceph-3) upgrade
> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
> features unsupported by the executable.
> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> missing set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
> /dev/disk/by-partlabel/ceph-3
> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
> (95) Operation not supported

Oh, right.  In that case, install the 'luminous' branch[1] on the monitors 
(or just the primary monitor if you're being conservative), restrart it, 
and you'll be able to do

 ceph osd set require_jewel_osds --yes-i-really-mean-it

sage


[1] ceph-deploy install --dev luminous HOST




> Cary
> 
> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
> > On Tue, 28 Nov 2017, Cary wrote:
> >> Hello,
> >>
> >>  Could someone please help me complete my botched upgrade from Jewel
> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
> >> 2 OSDs each.
> >>
> >>  My OSD servers were accidentally rebooted before the monitor servers
> >> causing them to be running Luminous before the monitors. All services
> >> have been restarted and running ceph versions gives the following:
> >>
> >> # ceph versions
> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >>
> >>     "mon": {
> >>         "ceph version 12.2.1
> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
> >>     },
> >>     "mgr": {
> >>         "ceph version 12.2.1
> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
> >>     },
> >>     "osd": {},
> >>     "mds": {
> >>         "ceph version 12.2.1
> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
> >>     },
> >>     "overall": {
> >>         "ceph version 12.2.1
> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
> >>
> >>
> >>
> >> For some reason the OSDs do not show what version they are running,
> >> and a ceph osd tree shows all of the OSD as being down.
> >>
> >>  # ceph osd tree
> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
> >> -1       27.77998 root default
> >> -3       27.77998     datacenter DC1
> >> -6       27.77998         rack 1B06
> >> -5        6.48000             host ceph3
> >>  1        1.84000                 osd.1    down        0 1.00000
> >>  3        4.64000                 osd.3    down        0 1.00000
> >> -2        5.53999             host ceph4
> >>  5        4.64000                 osd.5    down        0 1.00000
> >>  8        0.89999                 osd.8    down        0 1.00000
> >> -4        9.28000             host ceph6
> >>  0        4.64000                 osd.0    down        0 1.00000
> >>  2        4.64000                 osd.2    down        0 1.00000
> >> -7        6.48000             host ceph7
> >>  6        4.64000                 osd.6    down        0 1.00000
> >>  7        1.84000                 osd.7    down        0 1.00000
> >>
> >> The OSD logs all have this message:
> >>
> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
> >
> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
> > --force option to set the flag even tho no osds are up.  Until then, the
> > workaround is to downgrade one host to jewel, start one jewel osd, then
> > set the flag.  Then upgrade to luminous again and restart all osds.
> >
> > sage
> >
> >
> >>
> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
> >>
> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> >>
> >>
> >>
> >> A "ceph features" returns:
> >>
> >>     "mon": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 4
> >>         }
> >>     },
> >>     "mds": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 1
> >>         }
> >>     },
> >>     "osd": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 8
> >>         }
> >>     },
> >>     "client": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 3
> >>
> >>  # ceph tell osd.* versions
> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> Error ENXIO: problem getting command descriptions from osd.0
> >> osd.0: problem getting command descriptions from osd.0
> >> Error ENXIO: problem getting command descriptions from osd.1
> >> osd.1: problem getting command descriptions from osd.1
> >> Error ENXIO: problem getting command descriptions from osd.2
> >> osd.2: problem getting command descriptions from osd.2
> >> Error ENXIO: problem getting command descriptions from osd.3
> >> osd.3: problem getting command descriptions from osd.3
> >> Error ENXIO: problem getting command descriptions from osd.5
> >> osd.5: problem getting command descriptions from osd.5
> >> Error ENXIO: problem getting command descriptions from osd.6
> >> osd.6: problem getting command descriptions from osd.6
> >> Error ENXIO: problem getting command descriptions from osd.7
> >> osd.7: problem getting command descriptions from osd.7
> >> Error ENXIO: problem getting command descriptions from osd.8
> >> osd.8: problem getting command descriptions from osd.8
> >>
> >>  # ceph daemon osd.1 status
> >>
> >>     "cluster_fsid": "CENSORED",
> >>     "osd_fsid": "CENSORED",
> >>     "whoami": 1,
> >>     "state": "preboot",
> >>     "oldest_map": 19482,
> >>     "newest_map": 20235,
> >>     "num_pgs": 141
> >>
> >>  # ceph -s
> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >>   cluster:
> >>     id:     CENSORED
> >>     health: HEALTH_ERR
> >>             513 pgs are stuck inactive for more than 60 seconds
> >>             126 pgs backfill_wait
> >>             52 pgs backfilling
> >>             435 pgs degraded
> >>             513 pgs stale
> >>             435 pgs stuck degraded
> >>             513 pgs stuck stale
> >>             435 pgs stuck unclean
> >>             435 pgs stuck undersized
> >>             435 pgs undersized
> >>             recovery 854719/3688140 objects degraded (23.175%)
> >>             recovery 838607/3688140 objects misplaced (22.738%)
> >>             mds cluster is degraded
> >>             crush map has straw_calc_version=0
> >>
> >>   services:
> >>     mon: 4 daemons, quorum 0,1,3,2
> >>     mgr: 0(active), standbys: 1, 5
> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
> >>     osd: 8 osds: 0 up, 0 in
> >>
> >>   data:
> >>     pools:   7 pools, 513 pgs
> >>     objects: 1199k objects, 4510 GB
> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
> >>     pgs:     854719/3688140 objects degraded (23.175%)
> >>              838607/3688140 objects misplaced (22.738%)
> >>              257 stale+active+undersized+degraded
> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
> >>              78  stale+active+clean
> >>              52  stale+active+undersized+degraded+remapped+backfilling
> >>
> >>
> >> I ran "ceph auth list", and client.admin has the following permissions.
> >> auid: 0
> >> caps: [mds] allow
> >> caps: [mgr] allow *
> >> caps: [mon] allow *
> >> caps: [osd] allow *
> >>
> >> Thank you for your time.
> >>
> >> Is there any way I can get these OSDs to join the cluster now, or
> >> recover my data?
> >>
> >> Cary
> >> -Dynamic
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28 13:09     ` Sage Weil
@ 2017-11-28 18:11       ` Cary
  2017-11-28 18:45         ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-11-28 18:11 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

 I am getting an error when I run "ceph osd set require_jewel_osds
--yes-i-really-mean-it".

Error ENOENT: unknown feature '--yes-i-really-mean-it'

 So I ran, "ceph osd set require_jewel_osds", and got this error:

Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature

 I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop".
Then verified each was down with "ceph osd down N". When setting them
down, each replied "osd.N is already down".  I started one of the OSDs
on a host that was downgraded to 10.2.3-r2 I then attempted to set
"ceph osd set require_jewel_osds", and get the same error.


 The log for the OSD is showing this error:

2017-11-28 17:40:08.928446 7f47b082f940  1
filestore(/var/lib/ceph/osd/ceph-1) upgrade
2017-11-28 17:40:08.928475 7f47b082f940  2 osd.1 0 boot
2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses
features unsupported by the executable.
2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to
disk! Missing features: compat={},rocompat={},incompat={14=explicit
missing set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-28 17:40:08.929353 7f47b082f940  1 journal close
/dev/disk/by-partlabel/ceph-1
2017-11-28 17:40:08.930488 7f47b082f940 -1  ** ERROR: osd init failed:
(95) Operation not supported

So the OSD is not starting because of missing features. It does not
show up in "ceph features" output.

 Ceph features output:
ceph features
2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs
2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following
dangerous and experimental features are enabled: btrfs

    "mon": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        }
    },
    "mds": {
        "group": {
            "features": "0x7fddff8ee84bffb",
            "release": "jewel",
            "num": 1
        }
    },
    "client": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4

I attempted to set require_jewel_osds with the MGRs stopped, and had
the same results.

 Output from ceph tell osd.1 version. I get the same error from all OSDs.

# ceph tell osd.1 versions
Error ENXIO: problem getting command descriptions from osd.1

Any thoughts?

Cary
-Dynamic

On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 28 Nov 2017, Cary wrote:
>> I get this error when I try to start the OSD that has been downgraded
>> to 10.2.3-r2.
>>
>> 2017-11-28 03:42:35.989754 7fa5e6429940  1
>> filestore(/var/lib/ceph/osd/ceph-3) upgrade
>> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
>> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
>> features unsupported by the executable.
>> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> attr,16=deletes in missing set}
>> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> objects,12=transaction hints,13=pg meta object}
>> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
>> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
>> /dev/disk/by-partlabel/ceph-3
>> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
>> (95) Operation not supported
>
> Oh, right.  In that case, install the 'luminous' branch[1] on the monitors
> (or just the primary monitor if you're being conservative), restrart it,
> and you'll be able to do
>
>  ceph osd set require_jewel_osds --yes-i-really-mean-it
>
> sage
>
>
> [1] ceph-deploy install --dev luminous HOST
>
>
>
>
>> Cary
>>
>> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
>> > On Tue, 28 Nov 2017, Cary wrote:
>> >> Hello,
>> >>
>> >>  Could someone please help me complete my botched upgrade from Jewel
>> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
>> >> 2 OSDs each.
>> >>
>> >>  My OSD servers were accidentally rebooted before the monitor servers
>> >> causing them to be running Luminous before the monitors. All services
>> >> have been restarted and running ceph versions gives the following:
>> >>
>> >> # ceph versions
>> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >>
>> >>     "mon": {
>> >>         "ceph version 12.2.1
>> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>> >>     },
>> >>     "mgr": {
>> >>         "ceph version 12.2.1
>> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>> >>     },
>> >>     "osd": {},
>> >>     "mds": {
>> >>         "ceph version 12.2.1
>> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>> >>     },
>> >>     "overall": {
>> >>         "ceph version 12.2.1
>> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
>> >>
>> >>
>> >>
>> >> For some reason the OSDs do not show what version they are running,
>> >> and a ceph osd tree shows all of the OSD as being down.
>> >>
>> >>  # ceph osd tree
>> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
>> >> -1       27.77998 root default
>> >> -3       27.77998     datacenter DC1
>> >> -6       27.77998         rack 1B06
>> >> -5        6.48000             host ceph3
>> >>  1        1.84000                 osd.1    down        0 1.00000
>> >>  3        4.64000                 osd.3    down        0 1.00000
>> >> -2        5.53999             host ceph4
>> >>  5        4.64000                 osd.5    down        0 1.00000
>> >>  8        0.89999                 osd.8    down        0 1.00000
>> >> -4        9.28000             host ceph6
>> >>  0        4.64000                 osd.0    down        0 1.00000
>> >>  2        4.64000                 osd.2    down        0 1.00000
>> >> -7        6.48000             host ceph7
>> >>  6        4.64000                 osd.6    down        0 1.00000
>> >>  7        1.84000                 osd.7    down        0 1.00000
>> >>
>> >> The OSD logs all have this message:
>> >>
>> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
>> >
>> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
>> > --force option to set the flag even tho no osds are up.  Until then, the
>> > workaround is to downgrade one host to jewel, start one jewel osd, then
>> > set the flag.  Then upgrade to luminous again and restart all osds.
>> >
>> > sage
>> >
>> >
>> >>
>> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
>> >>
>> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>> >>
>> >>
>> >>
>> >> A "ceph features" returns:
>> >>
>> >>     "mon": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 4
>> >>         }
>> >>     },
>> >>     "mds": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 1
>> >>         }
>> >>     },
>> >>     "osd": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 8
>> >>         }
>> >>     },
>> >>     "client": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 3
>> >>
>> >>  # ceph tell osd.* versions
>> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> Error ENXIO: problem getting command descriptions from osd.0
>> >> osd.0: problem getting command descriptions from osd.0
>> >> Error ENXIO: problem getting command descriptions from osd.1
>> >> osd.1: problem getting command descriptions from osd.1
>> >> Error ENXIO: problem getting command descriptions from osd.2
>> >> osd.2: problem getting command descriptions from osd.2
>> >> Error ENXIO: problem getting command descriptions from osd.3
>> >> osd.3: problem getting command descriptions from osd.3
>> >> Error ENXIO: problem getting command descriptions from osd.5
>> >> osd.5: problem getting command descriptions from osd.5
>> >> Error ENXIO: problem getting command descriptions from osd.6
>> >> osd.6: problem getting command descriptions from osd.6
>> >> Error ENXIO: problem getting command descriptions from osd.7
>> >> osd.7: problem getting command descriptions from osd.7
>> >> Error ENXIO: problem getting command descriptions from osd.8
>> >> osd.8: problem getting command descriptions from osd.8
>> >>
>> >>  # ceph daemon osd.1 status
>> >>
>> >>     "cluster_fsid": "CENSORED",
>> >>     "osd_fsid": "CENSORED",
>> >>     "whoami": 1,
>> >>     "state": "preboot",
>> >>     "oldest_map": 19482,
>> >>     "newest_map": 20235,
>> >>     "num_pgs": 141
>> >>
>> >>  # ceph -s
>> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >>   cluster:
>> >>     id:     CENSORED
>> >>     health: HEALTH_ERR
>> >>             513 pgs are stuck inactive for more than 60 seconds
>> >>             126 pgs backfill_wait
>> >>             52 pgs backfilling
>> >>             435 pgs degraded
>> >>             513 pgs stale
>> >>             435 pgs stuck degraded
>> >>             513 pgs stuck stale
>> >>             435 pgs stuck unclean
>> >>             435 pgs stuck undersized
>> >>             435 pgs undersized
>> >>             recovery 854719/3688140 objects degraded (23.175%)
>> >>             recovery 838607/3688140 objects misplaced (22.738%)
>> >>             mds cluster is degraded
>> >>             crush map has straw_calc_version=0
>> >>
>> >>   services:
>> >>     mon: 4 daemons, quorum 0,1,3,2
>> >>     mgr: 0(active), standbys: 1, 5
>> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>> >>     osd: 8 osds: 0 up, 0 in
>> >>
>> >>   data:
>> >>     pools:   7 pools, 513 pgs
>> >>     objects: 1199k objects, 4510 GB
>> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>> >>     pgs:     854719/3688140 objects degraded (23.175%)
>> >>              838607/3688140 objects misplaced (22.738%)
>> >>              257 stale+active+undersized+degraded
>> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
>> >>              78  stale+active+clean
>> >>              52  stale+active+undersized+degraded+remapped+backfilling
>> >>
>> >>
>> >> I ran "ceph auth list", and client.admin has the following permissions.
>> >> auid: 0
>> >> caps: [mds] allow
>> >> caps: [mgr] allow *
>> >> caps: [mon] allow *
>> >> caps: [osd] allow *
>> >>
>> >> Thank you for your time.
>> >>
>> >> Is there any way I can get these OSDs to join the cluster now, or
>> >> recover my data?
>> >>
>> >> Cary
>> >> -Dynamic
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28 18:11       ` Cary
@ 2017-11-28 18:45         ` Sage Weil
  2017-11-30  0:48           ` Cary
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2017-11-28 18:45 UTC (permalink / raw)
  To: Cary; +Cc: ceph-devel

On Tue, 28 Nov 2017, Cary wrote:
> Hello,
> 
>  I am getting an error when I run "ceph osd set require_jewel_osds
> --yes-i-really-mean-it".
> 
> Error ENOENT: unknown feature '--yes-i-really-mean-it'

I just tested on the latest luminous branch and this works.  Did you 
upgrade the mons to the latest luminous build and restart them?  
(ceph-deploy install --dev luminous HOST, then restart mon daemon(s)).

sage


 > 
>  So I ran, "ceph osd set require_jewel_osds", and got this error:
> 
> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> 
>  I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop".
> Then verified each was down with "ceph osd down N". When setting them
> down, each replied "osd.N is already down".  I started one of the OSDs
> on a host that was downgraded to 10.2.3-r2 I then attempted to set
> "ceph osd set require_jewel_osds", and get the same error.
> 
> 
>  The log for the OSD is showing this error:
> 
> 2017-11-28 17:40:08.928446 7f47b082f940  1
> filestore(/var/lib/ceph/osd/ceph-1) upgrade
> 2017-11-28 17:40:08.928475 7f47b082f940  2 osd.1 0 boot
> 2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses
> features unsupported by the executable.
> 2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to
> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> missing set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-28 17:40:08.929353 7f47b082f940  1 journal close
> /dev/disk/by-partlabel/ceph-1
> 2017-11-28 17:40:08.930488 7f47b082f940 -1  ** ERROR: osd init failed:
> (95) Operation not supported
> 
> So the OSD is not starting because of missing features. It does not
> show up in "ceph features" output.
> 
>  Ceph features output:
> ceph features
> 2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following
> dangerous and experimental features are enabled: btrfs
> 
>     "mon": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 4
>         }
>     },
>     "mds": {
>         "group": {
>             "features": "0x7fddff8ee84bffb",
>             "release": "jewel",
>             "num": 1
>         }
>     },
>     "client": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 4
> 
> I attempted to set require_jewel_osds with the MGRs stopped, and had
> the same results.
> 
>  Output from ceph tell osd.1 version. I get the same error from all OSDs.
> 
> # ceph tell osd.1 versions
> Error ENXIO: problem getting command descriptions from osd.1
> 
> Any thoughts?
> 
> Cary
> -Dynamic
> 
> On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@newdream.net> wrote:
> > On Tue, 28 Nov 2017, Cary wrote:
> >> I get this error when I try to start the OSD that has been downgraded
> >> to 10.2.3-r2.
> >>
> >> 2017-11-28 03:42:35.989754 7fa5e6429940  1
> >> filestore(/var/lib/ceph/osd/ceph-3) upgrade
> >> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
> >> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
> >> features unsupported by the executable.
> >> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> object,3=object
> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> >> attr,16=deletes in missing set}
> >> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> object,3=object
> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> >> objects,12=transaction hints,13=pg meta object}
> >> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
> >> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
> >> /dev/disk/by-partlabel/ceph-3
> >> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
> >> (95) Operation not supported
> >
> > Oh, right.  In that case, install the 'luminous' branch[1] on the monitors
> > (or just the primary monitor if you're being conservative), restrart it,
> > and you'll be able to do
> >
> >  ceph osd set require_jewel_osds --yes-i-really-mean-it
> >
> > sage
> >
> >
> > [1] ceph-deploy install --dev luminous HOST
> >
> >
> >
> >
> >> Cary
> >>
> >> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
> >> > On Tue, 28 Nov 2017, Cary wrote:
> >> >> Hello,
> >> >>
> >> >>  Could someone please help me complete my botched upgrade from Jewel
> >> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
> >> >> 2 OSDs each.
> >> >>
> >> >>  My OSD servers were accidentally rebooted before the monitor servers
> >> >> causing them to be running Luminous before the monitors. All services
> >> >> have been restarted and running ceph versions gives the following:
> >> >>
> >> >> # ceph versions
> >> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >>
> >> >>     "mon": {
> >> >>         "ceph version 12.2.1
> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
> >> >>     },
> >> >>     "mgr": {
> >> >>         "ceph version 12.2.1
> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
> >> >>     },
> >> >>     "osd": {},
> >> >>     "mds": {
> >> >>         "ceph version 12.2.1
> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
> >> >>     },
> >> >>     "overall": {
> >> >>         "ceph version 12.2.1
> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
> >> >>
> >> >>
> >> >>
> >> >> For some reason the OSDs do not show what version they are running,
> >> >> and a ceph osd tree shows all of the OSD as being down.
> >> >>
> >> >>  # ceph osd tree
> >> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
> >> >> -1       27.77998 root default
> >> >> -3       27.77998     datacenter DC1
> >> >> -6       27.77998         rack 1B06
> >> >> -5        6.48000             host ceph3
> >> >>  1        1.84000                 osd.1    down        0 1.00000
> >> >>  3        4.64000                 osd.3    down        0 1.00000
> >> >> -2        5.53999             host ceph4
> >> >>  5        4.64000                 osd.5    down        0 1.00000
> >> >>  8        0.89999                 osd.8    down        0 1.00000
> >> >> -4        9.28000             host ceph6
> >> >>  0        4.64000                 osd.0    down        0 1.00000
> >> >>  2        4.64000                 osd.2    down        0 1.00000
> >> >> -7        6.48000             host ceph7
> >> >>  6        4.64000                 osd.6    down        0 1.00000
> >> >>  7        1.84000                 osd.7    down        0 1.00000
> >> >>
> >> >> The OSD logs all have this message:
> >> >>
> >> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
> >> >
> >> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
> >> > --force option to set the flag even tho no osds are up.  Until then, the
> >> > workaround is to downgrade one host to jewel, start one jewel osd, then
> >> > set the flag.  Then upgrade to luminous again and restart all osds.
> >> >
> >> > sage
> >> >
> >> >
> >> >>
> >> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
> >> >>
> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> >> >>
> >> >>
> >> >>
> >> >> A "ceph features" returns:
> >> >>
> >> >>     "mon": {
> >> >>         "group": {
> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >>             "release": "luminous",
> >> >>             "num": 4
> >> >>         }
> >> >>     },
> >> >>     "mds": {
> >> >>         "group": {
> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >>             "release": "luminous",
> >> >>             "num": 1
> >> >>         }
> >> >>     },
> >> >>     "osd": {
> >> >>         "group": {
> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >>             "release": "luminous",
> >> >>             "num": 8
> >> >>         }
> >> >>     },
> >> >>     "client": {
> >> >>         "group": {
> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >>             "release": "luminous",
> >> >>             "num": 3
> >> >>
> >> >>  # ceph tell osd.* versions
> >> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> Error ENXIO: problem getting command descriptions from osd.0
> >> >> osd.0: problem getting command descriptions from osd.0
> >> >> Error ENXIO: problem getting command descriptions from osd.1
> >> >> osd.1: problem getting command descriptions from osd.1
> >> >> Error ENXIO: problem getting command descriptions from osd.2
> >> >> osd.2: problem getting command descriptions from osd.2
> >> >> Error ENXIO: problem getting command descriptions from osd.3
> >> >> osd.3: problem getting command descriptions from osd.3
> >> >> Error ENXIO: problem getting command descriptions from osd.5
> >> >> osd.5: problem getting command descriptions from osd.5
> >> >> Error ENXIO: problem getting command descriptions from osd.6
> >> >> osd.6: problem getting command descriptions from osd.6
> >> >> Error ENXIO: problem getting command descriptions from osd.7
> >> >> osd.7: problem getting command descriptions from osd.7
> >> >> Error ENXIO: problem getting command descriptions from osd.8
> >> >> osd.8: problem getting command descriptions from osd.8
> >> >>
> >> >>  # ceph daemon osd.1 status
> >> >>
> >> >>     "cluster_fsid": "CENSORED",
> >> >>     "osd_fsid": "CENSORED",
> >> >>     "whoami": 1,
> >> >>     "state": "preboot",
> >> >>     "oldest_map": 19482,
> >> >>     "newest_map": 20235,
> >> >>     "num_pgs": 141
> >> >>
> >> >>  # ceph -s
> >> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
> >> >> dangerous and experimental features are enabled: btrfs
> >> >>   cluster:
> >> >>     id:     CENSORED
> >> >>     health: HEALTH_ERR
> >> >>             513 pgs are stuck inactive for more than 60 seconds
> >> >>             126 pgs backfill_wait
> >> >>             52 pgs backfilling
> >> >>             435 pgs degraded
> >> >>             513 pgs stale
> >> >>             435 pgs stuck degraded
> >> >>             513 pgs stuck stale
> >> >>             435 pgs stuck unclean
> >> >>             435 pgs stuck undersized
> >> >>             435 pgs undersized
> >> >>             recovery 854719/3688140 objects degraded (23.175%)
> >> >>             recovery 838607/3688140 objects misplaced (22.738%)
> >> >>             mds cluster is degraded
> >> >>             crush map has straw_calc_version=0
> >> >>
> >> >>   services:
> >> >>     mon: 4 daemons, quorum 0,1,3,2
> >> >>     mgr: 0(active), standbys: 1, 5
> >> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
> >> >>     osd: 8 osds: 0 up, 0 in
> >> >>
> >> >>   data:
> >> >>     pools:   7 pools, 513 pgs
> >> >>     objects: 1199k objects, 4510 GB
> >> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
> >> >>     pgs:     854719/3688140 objects degraded (23.175%)
> >> >>              838607/3688140 objects misplaced (22.738%)
> >> >>              257 stale+active+undersized+degraded
> >> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
> >> >>              78  stale+active+clean
> >> >>              52  stale+active+undersized+degraded+remapped+backfilling
> >> >>
> >> >>
> >> >> I ran "ceph auth list", and client.admin has the following permissions.
> >> >> auid: 0
> >> >> caps: [mds] allow
> >> >> caps: [mgr] allow *
> >> >> caps: [mon] allow *
> >> >> caps: [osd] allow *
> >> >>
> >> >> Thank you for your time.
> >> >>
> >> >> Is there any way I can get these OSDs to join the cluster now, or
> >> >> recover my data?
> >> >>
> >> >> Cary
> >> >> -Dynamic
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-28 18:45         ` Sage Weil
@ 2017-11-30  0:48           ` Cary
  2017-11-30  0:50             ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-11-30  0:48 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hello,

 I have emerged a 9999 build of Luminous 2.2.1 on one of my monitor
nodes. I made sure only one Jewel OSD was being started. The log for
the OSD:
017-11-30 00:30:27.786793 7f9200a598c0  1
filestore(/var/lib/ceph/osd/ceph-1) upgrade
2017-11-30 00:30:27.786821 7f9200a598c0  2 osd.1 0 boot
2017-11-30 00:30:27.787101 7f9200a598c0 -1 osd.1 0 The disk uses
features unsupported by the executable.
2017-11-30 00:30:27.787110 7f9200a598c0 -1 osd.1 0  ondisk features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
attr,16=deletes in missing set}
2017-11-30 00:30:27.787120 7f9200a598c0 -1 osd.1 0  daemon features
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints,13=pg meta object}
2017-11-30 00:30:27.787129 7f9200a598c0 -1 osd.1 0 Cannot write to
disk! Missing features: compat={},rocompat={},incompat={14=explicit
missing set,15=fastinfo pg attr,16=deletes in missing set}
2017-11-30 00:30:27.787355 7f9200a598c0  1 journal close
/dev/disk/by-partlabel/ceph-1
2017-11-30 00:30:27.795077 7f9200a598c0 -1  ** ERROR: osd init failed:
(95) Operation not supported

 The OSD is not starting because of missing features. So the next
command still fails.

 "ceph osd set require_jewel_osds --yes-i-really-really-mean-it"
returns the error

Invalid command:  unused arguments: [u'--yes-i-really-really-mean-it']

I guess ceph-dencoder may be needed tp change disk features. Does
anyone know what may need done here? Thank you,


Cary
-Dynamic

On Tue, Nov 28, 2017 at 6:45 PM, Sage Weil <sage@newdream.net> wrote:
> On Tue, 28 Nov 2017, Cary wrote:
>> Hello,
>>
>>  I am getting an error when I run "ceph osd set require_jewel_osds
>> --yes-i-really-mean-it".
>>
>> Error ENOENT: unknown feature '--yes-i-really-mean-it'
>
> I just tested on the latest luminous branch and this works.  Did you
> upgrade the mons to the latest luminous build and restart them?
> (ceph-deploy install --dev luminous HOST, then restart mon daemon(s)).
>
> sage
>
>
>  >
>>  So I ran, "ceph osd set require_jewel_osds", and got this error:
>>
>> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>>
>>  I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop".
>> Then verified each was down with "ceph osd down N". When setting them
>> down, each replied "osd.N is already down".  I started one of the OSDs
>> on a host that was downgraded to 10.2.3-r2 I then attempted to set
>> "ceph osd set require_jewel_osds", and get the same error.
>>
>>
>>  The log for the OSD is showing this error:
>>
>> 2017-11-28 17:40:08.928446 7f47b082f940  1
>> filestore(/var/lib/ceph/osd/ceph-1) upgrade
>> 2017-11-28 17:40:08.928475 7f47b082f940  2 osd.1 0 boot
>> 2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses
>> features unsupported by the executable.
>> 2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0  ondisk features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> attr,16=deletes in missing set}
>> 2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0  daemon features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> objects,12=transaction hints,13=pg meta object}
>> 2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to
>> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> 2017-11-28 17:40:08.929353 7f47b082f940  1 journal close
>> /dev/disk/by-partlabel/ceph-1
>> 2017-11-28 17:40:08.930488 7f47b082f940 -1  ** ERROR: osd init failed:
>> (95) Operation not supported
>>
>> So the OSD is not starting because of missing features. It does not
>> show up in "ceph features" output.
>>
>>  Ceph features output:
>> ceph features
>> 2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>> 2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following
>> dangerous and experimental features are enabled: btrfs
>>
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>         }
>>     },
>>     "mds": {
>>         "group": {
>>             "features": "0x7fddff8ee84bffb",
>>             "release": "jewel",
>>             "num": 1
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>
>> I attempted to set require_jewel_osds with the MGRs stopped, and had
>> the same results.
>>
>>  Output from ceph tell osd.1 version. I get the same error from all OSDs.
>>
>> # ceph tell osd.1 versions
>> Error ENXIO: problem getting command descriptions from osd.1
>>
>> Any thoughts?
>>
>> Cary
>> -Dynamic
>>
>> On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@newdream.net> wrote:
>> > On Tue, 28 Nov 2017, Cary wrote:
>> >> I get this error when I try to start the OSD that has been downgraded
>> >> to 10.2.3-r2.
>> >>
>> >> 2017-11-28 03:42:35.989754 7fa5e6429940  1
>> >> filestore(/var/lib/ceph/osd/ceph-3) upgrade
>> >> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
>> >> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
>> >> features unsupported by the executable.
>> >> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
>> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> object,3=object
>> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> >> attr,16=deletes in missing set}
>> >> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
>> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> object,3=object
>> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> >> objects,12=transaction hints,13=pg meta object}
>> >> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
>> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> >> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
>> >> /dev/disk/by-partlabel/ceph-3
>> >> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
>> >> (95) Operation not supported
>> >
>> > Oh, right.  In that case, install the 'luminous' branch[1] on the monitors
>> > (or just the primary monitor if you're being conservative), restrart it,
>> > and you'll be able to do
>> >
>> >  ceph osd set require_jewel_osds --yes-i-really-mean-it
>> >
>> > sage
>> >
>> >
>> > [1] ceph-deploy install --dev luminous HOST
>> >
>> >
>> >
>> >
>> >> Cary
>> >>
>> >> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
>> >> > On Tue, 28 Nov 2017, Cary wrote:
>> >> >> Hello,
>> >> >>
>> >> >>  Could someone please help me complete my botched upgrade from Jewel
>> >> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
>> >> >> 2 OSDs each.
>> >> >>
>> >> >>  My OSD servers were accidentally rebooted before the monitor servers
>> >> >> causing them to be running Luminous before the monitors. All services
>> >> >> have been restarted and running ceph versions gives the following:
>> >> >>
>> >> >> # ceph versions
>> >> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >>
>> >> >>     "mon": {
>> >> >>         "ceph version 12.2.1
>> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>> >> >>     },
>> >> >>     "mgr": {
>> >> >>         "ceph version 12.2.1
>> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>> >> >>     },
>> >> >>     "osd": {},
>> >> >>     "mds": {
>> >> >>         "ceph version 12.2.1
>> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>> >> >>     },
>> >> >>     "overall": {
>> >> >>         "ceph version 12.2.1
>> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
>> >> >>
>> >> >>
>> >> >>
>> >> >> For some reason the OSDs do not show what version they are running,
>> >> >> and a ceph osd tree shows all of the OSD as being down.
>> >> >>
>> >> >>  # ceph osd tree
>> >> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
>> >> >> -1       27.77998 root default
>> >> >> -3       27.77998     datacenter DC1
>> >> >> -6       27.77998         rack 1B06
>> >> >> -5        6.48000             host ceph3
>> >> >>  1        1.84000                 osd.1    down        0 1.00000
>> >> >>  3        4.64000                 osd.3    down        0 1.00000
>> >> >> -2        5.53999             host ceph4
>> >> >>  5        4.64000                 osd.5    down        0 1.00000
>> >> >>  8        0.89999                 osd.8    down        0 1.00000
>> >> >> -4        9.28000             host ceph6
>> >> >>  0        4.64000                 osd.0    down        0 1.00000
>> >> >>  2        4.64000                 osd.2    down        0 1.00000
>> >> >> -7        6.48000             host ceph7
>> >> >>  6        4.64000                 osd.6    down        0 1.00000
>> >> >>  7        1.84000                 osd.7    down        0 1.00000
>> >> >>
>> >> >> The OSD logs all have this message:
>> >> >>
>> >> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
>> >> >
>> >> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
>> >> > --force option to set the flag even tho no osds are up.  Until then, the
>> >> > workaround is to downgrade one host to jewel, start one jewel osd, then
>> >> > set the flag.  Then upgrade to luminous again and restart all osds.
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> >>
>> >> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
>> >> >>
>> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>> >> >>
>> >> >>
>> >> >>
>> >> >> A "ceph features" returns:
>> >> >>
>> >> >>     "mon": {
>> >> >>         "group": {
>> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >>             "release": "luminous",
>> >> >>             "num": 4
>> >> >>         }
>> >> >>     },
>> >> >>     "mds": {
>> >> >>         "group": {
>> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >>             "release": "luminous",
>> >> >>             "num": 1
>> >> >>         }
>> >> >>     },
>> >> >>     "osd": {
>> >> >>         "group": {
>> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >>             "release": "luminous",
>> >> >>             "num": 8
>> >> >>         }
>> >> >>     },
>> >> >>     "client": {
>> >> >>         "group": {
>> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >>             "release": "luminous",
>> >> >>             "num": 3
>> >> >>
>> >> >>  # ceph tell osd.* versions
>> >> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> Error ENXIO: problem getting command descriptions from osd.0
>> >> >> osd.0: problem getting command descriptions from osd.0
>> >> >> Error ENXIO: problem getting command descriptions from osd.1
>> >> >> osd.1: problem getting command descriptions from osd.1
>> >> >> Error ENXIO: problem getting command descriptions from osd.2
>> >> >> osd.2: problem getting command descriptions from osd.2
>> >> >> Error ENXIO: problem getting command descriptions from osd.3
>> >> >> osd.3: problem getting command descriptions from osd.3
>> >> >> Error ENXIO: problem getting command descriptions from osd.5
>> >> >> osd.5: problem getting command descriptions from osd.5
>> >> >> Error ENXIO: problem getting command descriptions from osd.6
>> >> >> osd.6: problem getting command descriptions from osd.6
>> >> >> Error ENXIO: problem getting command descriptions from osd.7
>> >> >> osd.7: problem getting command descriptions from osd.7
>> >> >> Error ENXIO: problem getting command descriptions from osd.8
>> >> >> osd.8: problem getting command descriptions from osd.8
>> >> >>
>> >> >>  # ceph daemon osd.1 status
>> >> >>
>> >> >>     "cluster_fsid": "CENSORED",
>> >> >>     "osd_fsid": "CENSORED",
>> >> >>     "whoami": 1,
>> >> >>     "state": "preboot",
>> >> >>     "oldest_map": 19482,
>> >> >>     "newest_map": 20235,
>> >> >>     "num_pgs": 141
>> >> >>
>> >> >>  # ceph -s
>> >> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
>> >> >> dangerous and experimental features are enabled: btrfs
>> >> >>   cluster:
>> >> >>     id:     CENSORED
>> >> >>     health: HEALTH_ERR
>> >> >>             513 pgs are stuck inactive for more than 60 seconds
>> >> >>             126 pgs backfill_wait
>> >> >>             52 pgs backfilling
>> >> >>             435 pgs degraded
>> >> >>             513 pgs stale
>> >> >>             435 pgs stuck degraded
>> >> >>             513 pgs stuck stale
>> >> >>             435 pgs stuck unclean
>> >> >>             435 pgs stuck undersized
>> >> >>             435 pgs undersized
>> >> >>             recovery 854719/3688140 objects degraded (23.175%)
>> >> >>             recovery 838607/3688140 objects misplaced (22.738%)
>> >> >>             mds cluster is degraded
>> >> >>             crush map has straw_calc_version=0
>> >> >>
>> >> >>   services:
>> >> >>     mon: 4 daemons, quorum 0,1,3,2
>> >> >>     mgr: 0(active), standbys: 1, 5
>> >> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>> >> >>     osd: 8 osds: 0 up, 0 in
>> >> >>
>> >> >>   data:
>> >> >>     pools:   7 pools, 513 pgs
>> >> >>     objects: 1199k objects, 4510 GB
>> >> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>> >> >>     pgs:     854719/3688140 objects degraded (23.175%)
>> >> >>              838607/3688140 objects misplaced (22.738%)
>> >> >>              257 stale+active+undersized+degraded
>> >> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
>> >> >>              78  stale+active+clean
>> >> >>              52  stale+active+undersized+degraded+remapped+backfilling
>> >> >>
>> >> >>
>> >> >> I ran "ceph auth list", and client.admin has the following permissions.
>> >> >> auid: 0
>> >> >> caps: [mds] allow
>> >> >> caps: [mgr] allow *
>> >> >> caps: [mon] allow *
>> >> >> caps: [osd] allow *
>> >> >>
>> >> >> Thank you for your time.
>> >> >>
>> >> >> Is there any way I can get these OSDs to join the cluster now, or
>> >> >> recover my data?
>> >> >>
>> >> >> Cary
>> >> >> -Dynamic
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-30  0:48           ` Cary
@ 2017-11-30  0:50             ` Sage Weil
  2017-11-30  1:13               ` Cary
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2017-11-30  0:50 UTC (permalink / raw)
  To: Cary; +Cc: ceph-devel

On Thu, 30 Nov 2017, Cary wrote:
> Hello,
> 
>  I have emerged a 9999 build of Luminous 2.2.1 on one of my monitor

The latest luminous mon will allow you to do the

 ceph osd set require_jewel_osds --yes-i-really-mean-it

command without starting old osds.  Once the flag is set the luminous osds 
will start normally..

s


> nodes. I made sure only one Jewel OSD was being started. The log for
> the OSD:
> 017-11-30 00:30:27.786793 7f9200a598c0  1
> filestore(/var/lib/ceph/osd/ceph-1) upgrade
> 2017-11-30 00:30:27.786821 7f9200a598c0  2 osd.1 0 boot
> 2017-11-30 00:30:27.787101 7f9200a598c0 -1 osd.1 0 The disk uses
> features unsupported by the executable.
> 2017-11-30 00:30:27.787110 7f9200a598c0 -1 osd.1 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-30 00:30:27.787120 7f9200a598c0 -1 osd.1 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-30 00:30:27.787129 7f9200a598c0 -1 osd.1 0 Cannot write to
> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> missing set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-30 00:30:27.787355 7f9200a598c0  1 journal close
> /dev/disk/by-partlabel/ceph-1
> 2017-11-30 00:30:27.795077 7f9200a598c0 -1  ** ERROR: osd init failed:
> (95) Operation not supported
> 
>  The OSD is not starting because of missing features. So the next
> command still fails.
> 
>  "ceph osd set require_jewel_osds --yes-i-really-really-mean-it"
> returns the error
> 
> Invalid command:  unused arguments: [u'--yes-i-really-really-mean-it']
> 
> I guess ceph-dencoder may be needed tp change disk features. Does
> anyone know what may need done here? Thank you,
> 
> 
> Cary
> -Dynamic
> 
> On Tue, Nov 28, 2017 at 6:45 PM, Sage Weil <sage@newdream.net> wrote:
> > On Tue, 28 Nov 2017, Cary wrote:
> >> Hello,
> >>
> >>  I am getting an error when I run "ceph osd set require_jewel_osds
> >> --yes-i-really-mean-it".
> >>
> >> Error ENOENT: unknown feature '--yes-i-really-mean-it'
> >
> > I just tested on the latest luminous branch and this works.  Did you
> > upgrade the mons to the latest luminous build and restart them?
> > (ceph-deploy install --dev luminous HOST, then restart mon daemon(s)).
> >
> > sage
> >
> >
> >  >
> >>  So I ran, "ceph osd set require_jewel_osds", and got this error:
> >>
> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> >>
> >>  I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop".
> >> Then verified each was down with "ceph osd down N". When setting them
> >> down, each replied "osd.N is already down".  I started one of the OSDs
> >> on a host that was downgraded to 10.2.3-r2 I then attempted to set
> >> "ceph osd set require_jewel_osds", and get the same error.
> >>
> >>
> >>  The log for the OSD is showing this error:
> >>
> >> 2017-11-28 17:40:08.928446 7f47b082f940  1
> >> filestore(/var/lib/ceph/osd/ceph-1) upgrade
> >> 2017-11-28 17:40:08.928475 7f47b082f940  2 osd.1 0 boot
> >> 2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses
> >> features unsupported by the executable.
> >> 2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0  ondisk features
> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> object,3=object
> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> >> attr,16=deletes in missing set}
> >> 2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0  daemon features
> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> object,3=object
> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> >> objects,12=transaction hints,13=pg meta object}
> >> 2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to
> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
> >> 2017-11-28 17:40:08.929353 7f47b082f940  1 journal close
> >> /dev/disk/by-partlabel/ceph-1
> >> 2017-11-28 17:40:08.930488 7f47b082f940 -1  ** ERROR: osd init failed:
> >> (95) Operation not supported
> >>
> >> So the OSD is not starting because of missing features. It does not
> >> show up in "ceph features" output.
> >>
> >>  Ceph features output:
> >> ceph features
> >> 2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >> 2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following
> >> dangerous and experimental features are enabled: btrfs
> >>
> >>     "mon": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 4
> >>         }
> >>     },
> >>     "mds": {
> >>         "group": {
> >>             "features": "0x7fddff8ee84bffb",
> >>             "release": "jewel",
> >>             "num": 1
> >>         }
> >>     },
> >>     "client": {
> >>         "group": {
> >>             "features": "0x1ffddff8eea4fffb",
> >>             "release": "luminous",
> >>             "num": 4
> >>
> >> I attempted to set require_jewel_osds with the MGRs stopped, and had
> >> the same results.
> >>
> >>  Output from ceph tell osd.1 version. I get the same error from all OSDs.
> >>
> >> # ceph tell osd.1 versions
> >> Error ENXIO: problem getting command descriptions from osd.1
> >>
> >> Any thoughts?
> >>
> >> Cary
> >> -Dynamic
> >>
> >> On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@newdream.net> wrote:
> >> > On Tue, 28 Nov 2017, Cary wrote:
> >> >> I get this error when I try to start the OSD that has been downgraded
> >> >> to 10.2.3-r2.
> >> >>
> >> >> 2017-11-28 03:42:35.989754 7fa5e6429940  1
> >> >> filestore(/var/lib/ceph/osd/ceph-3) upgrade
> >> >> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
> >> >> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
> >> >> features unsupported by the executable.
> >> >> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> >> object,3=object
> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> >> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> >> >> attr,16=deletes in missing set}
> >> >> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> >> >> object,3=object
> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> >> >> objects,12=transaction hints,13=pg meta object}
> >> >> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
> >> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
> >> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
> >> >> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
> >> >> /dev/disk/by-partlabel/ceph-3
> >> >> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
> >> >> (95) Operation not supported
> >> >
> >> > Oh, right.  In that case, install the 'luminous' branch[1] on the monitors
> >> > (or just the primary monitor if you're being conservative), restrart it,
> >> > and you'll be able to do
> >> >
> >> >  ceph osd set require_jewel_osds --yes-i-really-mean-it
> >> >
> >> > sage
> >> >
> >> >
> >> > [1] ceph-deploy install --dev luminous HOST
> >> >
> >> >
> >> >
> >> >
> >> >> Cary
> >> >>
> >> >> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
> >> >> > On Tue, 28 Nov 2017, Cary wrote:
> >> >> >> Hello,
> >> >> >>
> >> >> >>  Could someone please help me complete my botched upgrade from Jewel
> >> >> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
> >> >> >> 2 OSDs each.
> >> >> >>
> >> >> >>  My OSD servers were accidentally rebooted before the monitor servers
> >> >> >> causing them to be running Luminous before the monitors. All services
> >> >> >> have been restarted and running ceph versions gives the following:
> >> >> >>
> >> >> >> # ceph versions
> >> >> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >>
> >> >> >>     "mon": {
> >> >> >>         "ceph version 12.2.1
> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
> >> >> >>     },
> >> >> >>     "mgr": {
> >> >> >>         "ceph version 12.2.1
> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
> >> >> >>     },
> >> >> >>     "osd": {},
> >> >> >>     "mds": {
> >> >> >>         "ceph version 12.2.1
> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
> >> >> >>     },
> >> >> >>     "overall": {
> >> >> >>         "ceph version 12.2.1
> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> For some reason the OSDs do not show what version they are running,
> >> >> >> and a ceph osd tree shows all of the OSD as being down.
> >> >> >>
> >> >> >>  # ceph osd tree
> >> >> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
> >> >> >> -1       27.77998 root default
> >> >> >> -3       27.77998     datacenter DC1
> >> >> >> -6       27.77998         rack 1B06
> >> >> >> -5        6.48000             host ceph3
> >> >> >>  1        1.84000                 osd.1    down        0 1.00000
> >> >> >>  3        4.64000                 osd.3    down        0 1.00000
> >> >> >> -2        5.53999             host ceph4
> >> >> >>  5        4.64000                 osd.5    down        0 1.00000
> >> >> >>  8        0.89999                 osd.8    down        0 1.00000
> >> >> >> -4        9.28000             host ceph6
> >> >> >>  0        4.64000                 osd.0    down        0 1.00000
> >> >> >>  2        4.64000                 osd.2    down        0 1.00000
> >> >> >> -7        6.48000             host ceph7
> >> >> >>  6        4.64000                 osd.6    down        0 1.00000
> >> >> >>  7        1.84000                 osd.7    down        0 1.00000
> >> >> >>
> >> >> >> The OSD logs all have this message:
> >> >> >>
> >> >> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
> >> >> >
> >> >> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
> >> >> > --force option to set the flag even tho no osds are up.  Until then, the
> >> >> > workaround is to downgrade one host to jewel, start one jewel osd, then
> >> >> > set the flag.  Then upgrade to luminous again and restart all osds.
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
> >> >> >>
> >> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> A "ceph features" returns:
> >> >> >>
> >> >> >>     "mon": {
> >> >> >>         "group": {
> >> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >> >>             "release": "luminous",
> >> >> >>             "num": 4
> >> >> >>         }
> >> >> >>     },
> >> >> >>     "mds": {
> >> >> >>         "group": {
> >> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >> >>             "release": "luminous",
> >> >> >>             "num": 1
> >> >> >>         }
> >> >> >>     },
> >> >> >>     "osd": {
> >> >> >>         "group": {
> >> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >> >>             "release": "luminous",
> >> >> >>             "num": 8
> >> >> >>         }
> >> >> >>     },
> >> >> >>     "client": {
> >> >> >>         "group": {
> >> >> >>             "features": "0x1ffddff8eea4fffb",
> >> >> >>             "release": "luminous",
> >> >> >>             "num": 3
> >> >> >>
> >> >> >>  # ceph tell osd.* versions
> >> >> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> Error ENXIO: problem getting command descriptions from osd.0
> >> >> >> osd.0: problem getting command descriptions from osd.0
> >> >> >> Error ENXIO: problem getting command descriptions from osd.1
> >> >> >> osd.1: problem getting command descriptions from osd.1
> >> >> >> Error ENXIO: problem getting command descriptions from osd.2
> >> >> >> osd.2: problem getting command descriptions from osd.2
> >> >> >> Error ENXIO: problem getting command descriptions from osd.3
> >> >> >> osd.3: problem getting command descriptions from osd.3
> >> >> >> Error ENXIO: problem getting command descriptions from osd.5
> >> >> >> osd.5: problem getting command descriptions from osd.5
> >> >> >> Error ENXIO: problem getting command descriptions from osd.6
> >> >> >> osd.6: problem getting command descriptions from osd.6
> >> >> >> Error ENXIO: problem getting command descriptions from osd.7
> >> >> >> osd.7: problem getting command descriptions from osd.7
> >> >> >> Error ENXIO: problem getting command descriptions from osd.8
> >> >> >> osd.8: problem getting command descriptions from osd.8
> >> >> >>
> >> >> >>  # ceph daemon osd.1 status
> >> >> >>
> >> >> >>     "cluster_fsid": "CENSORED",
> >> >> >>     "osd_fsid": "CENSORED",
> >> >> >>     "whoami": 1,
> >> >> >>     "state": "preboot",
> >> >> >>     "oldest_map": 19482,
> >> >> >>     "newest_map": 20235,
> >> >> >>     "num_pgs": 141
> >> >> >>
> >> >> >>  # ceph -s
> >> >> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
> >> >> >> dangerous and experimental features are enabled: btrfs
> >> >> >>   cluster:
> >> >> >>     id:     CENSORED
> >> >> >>     health: HEALTH_ERR
> >> >> >>             513 pgs are stuck inactive for more than 60 seconds
> >> >> >>             126 pgs backfill_wait
> >> >> >>             52 pgs backfilling
> >> >> >>             435 pgs degraded
> >> >> >>             513 pgs stale
> >> >> >>             435 pgs stuck degraded
> >> >> >>             513 pgs stuck stale
> >> >> >>             435 pgs stuck unclean
> >> >> >>             435 pgs stuck undersized
> >> >> >>             435 pgs undersized
> >> >> >>             recovery 854719/3688140 objects degraded (23.175%)
> >> >> >>             recovery 838607/3688140 objects misplaced (22.738%)
> >> >> >>             mds cluster is degraded
> >> >> >>             crush map has straw_calc_version=0
> >> >> >>
> >> >> >>   services:
> >> >> >>     mon: 4 daemons, quorum 0,1,3,2
> >> >> >>     mgr: 0(active), standbys: 1, 5
> >> >> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
> >> >> >>     osd: 8 osds: 0 up, 0 in
> >> >> >>
> >> >> >>   data:
> >> >> >>     pools:   7 pools, 513 pgs
> >> >> >>     objects: 1199k objects, 4510 GB
> >> >> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
> >> >> >>     pgs:     854719/3688140 objects degraded (23.175%)
> >> >> >>              838607/3688140 objects misplaced (22.738%)
> >> >> >>              257 stale+active+undersized+degraded
> >> >> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
> >> >> >>              78  stale+active+clean
> >> >> >>              52  stale+active+undersized+degraded+remapped+backfilling
> >> >> >>
> >> >> >>
> >> >> >> I ran "ceph auth list", and client.admin has the following permissions.
> >> >> >> auid: 0
> >> >> >> caps: [mds] allow
> >> >> >> caps: [mgr] allow *
> >> >> >> caps: [mon] allow *
> >> >> >> caps: [osd] allow *
> >> >> >>
> >> >> >> Thank you for your time.
> >> >> >>
> >> >> >> Is there any way I can get these OSDs to join the cluster now, or
> >> >> >> recover my data?
> >> >> >>
> >> >> >> Cary
> >> >> >> -Dynamic
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-30  0:50             ` Sage Weil
@ 2017-11-30  1:13               ` Cary
  2017-11-30  3:10                 ` Sage Weil
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-11-30  1:13 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

I believe I see what I was doing wrong. I had to run "ceph-osd set
require_jewel_osds --yes-i-really-mean-it"
This is the error I am getting now.
2017-11-30 01:11:19.691 7fc171dbd5c0 -1 unrecognized arg set
src/tcmalloc.cc:284] Attempt to free invalid pointer 0x56262bacf4c0
*** Caught signal (Aborted) **
 in thread 7fc171dbd5c0 thread_name:ceph-osd
 ceph version 13.0.0-3574-gb1378b343a
(b1378b343add5134ab881b38a93f47f3f9cb40bb) mimic (dev)
 1: (()+0xa6be0e) [0x56262159be0e]
 2: (()+0x13a40) [0x7fc16f5aca40]
 3: (gsignal()+0x145) [0x7fc16e8ede95]
 4: (abort()+0x17a) [0x7fc16e8efb9a]
 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int,
tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem,
tcmalloc::LogItem)+0x234) [0x7fc170cf3084]
 6: (()+0x1784b) [0x7fc170ce784b]
 7: (rocksdb::LRUCache::~LRUCache()+0x65) [0x562621909795]
 8: (std::_Sp_counted_ptr<rocksdb::BlockBasedTableFactory*,
(__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x1ca) [0x5626219fdb7a]
 9: (rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions()+0x385)
[0x5626214d6685]
 10: (()+0x382f0) [0x7fc16e8f12f0]
 11: (()+0x3835a) [0x7fc16e8f135a]
 12: (()+0xbad9c8) [0x5626216dd9c8]
 13: (main()+0x3b5) [0x562620ece9f5]
 14: (__libc_start_main()+0xf0) [0x7fc16e8d94f0]
 15: (_start()+0x2a) [0x562620faa3fa]
2017-11-30 01:11:19.694 7fc171dbd5c0 -1 *** Caught signal (Aborted) **
 in thread 7fc171dbd5c0 thread_name:ceph-osd

 ceph version 13.0.0-3574-gb1378b343a
(b1378b343add5134ab881b38a93f47f3f9cb40bb) mimic (dev)
 1: (()+0xa6be0e) [0x56262159be0e]
 2: (()+0x13a40) [0x7fc16f5aca40]
 3: (gsignal()+0x145) [0x7fc16e8ede95]
 4: (abort()+0x17a) [0x7fc16e8efb9a]
 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int,
tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem,
tcmalloc::LogItem)+0x234) [0x7fc170cf3084]
 6: (()+0x1784b) [0x7fc170ce784b]
 7: (rocksdb::LRUCache::~LRUCache()+0x65) [0x562621909795]
 8: (std::_Sp_counted_ptr<rocksdb::BlockBasedTableFactory*,
(__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x1ca) [0x5626219fdb7a]
 9: (rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions()+0x385)
[0x5626214d6685]
 10: (()+0x382f0) [0x7fc16e8f12f0]
 11: (()+0x3835a) [0x7fc16e8f135a]
 12: (()+0xbad9c8) [0x5626216dd9c8]
 13: (main()+0x3b5) [0x562620ece9f5]
 14: (__libc_start_main()+0xf0) [0x7fc16e8d94f0]
 15: (_start()+0x2a) [0x562620faa3fa]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Cary
-Dynamic

On Thu, Nov 30, 2017 at 12:50 AM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 30 Nov 2017, Cary wrote:
>> Hello,
>>
>>  I have emerged a 9999 build of Luminous 2.2.1 on one of my monitor
>
> The latest luminous mon will allow you to do the
>
>  ceph osd set require_jewel_osds --yes-i-really-mean-it
>
> command without starting old osds.  Once the flag is set the luminous osds
> will start normally..
>
> s
>
>
>> nodes. I made sure only one Jewel OSD was being started. The log for
>> the OSD:
>> 017-11-30 00:30:27.786793 7f9200a598c0  1
>> filestore(/var/lib/ceph/osd/ceph-1) upgrade
>> 2017-11-30 00:30:27.786821 7f9200a598c0  2 osd.1 0 boot
>> 2017-11-30 00:30:27.787101 7f9200a598c0 -1 osd.1 0 The disk uses
>> features unsupported by the executable.
>> 2017-11-30 00:30:27.787110 7f9200a598c0 -1 osd.1 0  ondisk features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> attr,16=deletes in missing set}
>> 2017-11-30 00:30:27.787120 7f9200a598c0 -1 osd.1 0  daemon features
>> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> object,3=object
>> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> objects,12=transaction hints,13=pg meta object}
>> 2017-11-30 00:30:27.787129 7f9200a598c0 -1 osd.1 0 Cannot write to
>> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> 2017-11-30 00:30:27.787355 7f9200a598c0  1 journal close
>> /dev/disk/by-partlabel/ceph-1
>> 2017-11-30 00:30:27.795077 7f9200a598c0 -1  ** ERROR: osd init failed:
>> (95) Operation not supported
>>
>>  The OSD is not starting because of missing features. So the next
>> command still fails.
>>
>>  "ceph osd set require_jewel_osds --yes-i-really-really-mean-it"
>> returns the error
>>
>> Invalid command:  unused arguments: [u'--yes-i-really-really-mean-it']
>>
>> I guess ceph-dencoder may be needed tp change disk features. Does
>> anyone know what may need done here? Thank you,
>>
>>
>> Cary
>> -Dynamic
>>
>> On Tue, Nov 28, 2017 at 6:45 PM, Sage Weil <sage@newdream.net> wrote:
>> > On Tue, 28 Nov 2017, Cary wrote:
>> >> Hello,
>> >>
>> >>  I am getting an error when I run "ceph osd set require_jewel_osds
>> >> --yes-i-really-mean-it".
>> >>
>> >> Error ENOENT: unknown feature '--yes-i-really-mean-it'
>> >
>> > I just tested on the latest luminous branch and this works.  Did you
>> > upgrade the mons to the latest luminous build and restart them?
>> > (ceph-deploy install --dev luminous HOST, then restart mon daemon(s)).
>> >
>> > sage
>> >
>> >
>> >  >
>> >>  So I ran, "ceph osd set require_jewel_osds", and got this error:
>> >>
>> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>> >>
>> >>  I verified all OSDs were stopped with "/etc/init.d/ceph-osd.N stop".
>> >> Then verified each was down with "ceph osd down N". When setting them
>> >> down, each replied "osd.N is already down".  I started one of the OSDs
>> >> on a host that was downgraded to 10.2.3-r2 I then attempted to set
>> >> "ceph osd set require_jewel_osds", and get the same error.
>> >>
>> >>
>> >>  The log for the OSD is showing this error:
>> >>
>> >> 2017-11-28 17:40:08.928446 7f47b082f940  1
>> >> filestore(/var/lib/ceph/osd/ceph-1) upgrade
>> >> 2017-11-28 17:40:08.928475 7f47b082f940  2 osd.1 0 boot
>> >> 2017-11-28 17:40:08.928788 7f47b082f940 -1 osd.1 0 The disk uses
>> >> features unsupported by the executable.
>> >> 2017-11-28 17:40:08.928810 7f47b082f940 -1 osd.1 0  ondisk features
>> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> object,3=object
>> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> >> attr,16=deletes in missing set}
>> >> 2017-11-28 17:40:08.928818 7f47b082f940 -1 osd.1 0  daemon features
>> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> object,3=object
>> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> >> objects,12=transaction hints,13=pg meta object}
>> >> 2017-11-28 17:40:08.928827 7f47b082f940 -1 osd.1 0 Cannot write to
>> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> >> 2017-11-28 17:40:08.929353 7f47b082f940  1 journal close
>> >> /dev/disk/by-partlabel/ceph-1
>> >> 2017-11-28 17:40:08.930488 7f47b082f940 -1  ** ERROR: osd init failed:
>> >> (95) Operation not supported
>> >>
>> >> So the OSD is not starting because of missing features. It does not
>> >> show up in "ceph features" output.
>> >>
>> >>  Ceph features output:
>> >> ceph features
>> >> 2017-11-28 17:51:31.213636 7f6a2140a700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >> 2017-11-28 17:51:31.223068 7f6a2140a700 -1 WARNING: the following
>> >> dangerous and experimental features are enabled: btrfs
>> >>
>> >>     "mon": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 4
>> >>         }
>> >>     },
>> >>     "mds": {
>> >>         "group": {
>> >>             "features": "0x7fddff8ee84bffb",
>> >>             "release": "jewel",
>> >>             "num": 1
>> >>         }
>> >>     },
>> >>     "client": {
>> >>         "group": {
>> >>             "features": "0x1ffddff8eea4fffb",
>> >>             "release": "luminous",
>> >>             "num": 4
>> >>
>> >> I attempted to set require_jewel_osds with the MGRs stopped, and had
>> >> the same results.
>> >>
>> >>  Output from ceph tell osd.1 version. I get the same error from all OSDs.
>> >>
>> >> # ceph tell osd.1 versions
>> >> Error ENXIO: problem getting command descriptions from osd.1
>> >>
>> >> Any thoughts?
>> >>
>> >> Cary
>> >> -Dynamic
>> >>
>> >> On Tue, Nov 28, 2017 at 1:09 PM, Sage Weil <sage@newdream.net> wrote:
>> >> > On Tue, 28 Nov 2017, Cary wrote:
>> >> >> I get this error when I try to start the OSD that has been downgraded
>> >> >> to 10.2.3-r2.
>> >> >>
>> >> >> 2017-11-28 03:42:35.989754 7fa5e6429940  1
>> >> >> filestore(/var/lib/ceph/osd/ceph-3) upgrade
>> >> >> 2017-11-28 03:42:35.989788 7fa5e6429940  2 osd.3 0 boot
>> >> >> 2017-11-28 03:42:35.990132 7fa5e6429940 -1 osd.3 0 The disk uses
>> >> >> features unsupported by the executable.
>> >> >> 2017-11-28 03:42:35.990142 7fa5e6429940 -1 osd.3 0  ondisk features
>> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> >> object,3=object
>> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
>> >> >> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
>> >> >> attr,16=deletes in missing set}
>> >> >> 2017-11-28 03:42:35.990150 7fa5e6429940 -1 osd.3 0  daemon features
>> >> >> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
>> >> >> object,3=object
>> >> >> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
>> >> >> objects,12=transaction hints,13=pg meta object}
>> >> >> 2017-11-28 03:42:35.990160 7fa5e6429940 -1 osd.3 0 Cannot write to
>> >> >> disk! Missing features: compat={},rocompat={},incompat={14=explicit
>> >> >> missing set,15=fastinfo pg attr,16=deletes in missing set}
>> >> >> 2017-11-28 03:42:35.990775 7fa5e6429940  1 journal close
>> >> >> /dev/disk/by-partlabel/ceph-3
>> >> >> 2017-11-28 03:42:35.992960 7fa5e6429940 -1  ** ERROR: osd init failed:
>> >> >> (95) Operation not supported
>> >> >
>> >> > Oh, right.  In that case, install the 'luminous' branch[1] on the monitors
>> >> > (or just the primary monitor if you're being conservative), restrart it,
>> >> > and you'll be able to do
>> >> >
>> >> >  ceph osd set require_jewel_osds --yes-i-really-mean-it
>> >> >
>> >> > sage
>> >> >
>> >> >
>> >> > [1] ceph-deploy install --dev luminous HOST
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> Cary
>> >> >>
>> >> >> On Tue, Nov 28, 2017 at 3:09 AM, Sage Weil <sage@newdream.net> wrote:
>> >> >> > On Tue, 28 Nov 2017, Cary wrote:
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >>  Could someone please help me complete my botched upgrade from Jewel
>> >> >> >> 10.2.3-r1 to Luminous 12.2.1. I have 9 Gentoo servers, 4 of which have
>> >> >> >> 2 OSDs each.
>> >> >> >>
>> >> >> >>  My OSD servers were accidentally rebooted before the monitor servers
>> >> >> >> causing them to be running Luminous before the monitors. All services
>> >> >> >> have been restarted and running ceph versions gives the following:
>> >> >> >>
>> >> >> >> # ceph versions
>> >> >> >> 2017-11-27 21:27:24.356940 7fed67efe700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> 2017-11-27 21:27:24.368469 7fed67efe700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >>
>> >> >> >>     "mon": {
>> >> >> >>         "ceph version 12.2.1
>> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 4
>> >> >> >>     },
>> >> >> >>     "mgr": {
>> >> >> >>         "ceph version 12.2.1
>> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
>> >> >> >>     },
>> >> >> >>     "osd": {},
>> >> >> >>     "mds": {
>> >> >> >>         "ceph version 12.2.1
>> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 1
>> >> >> >>     },
>> >> >> >>     "overall": {
>> >> >> >>         "ceph version 12.2.1
>> >> >> >> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 8
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> For some reason the OSDs do not show what version they are running,
>> >> >> >> and a ceph osd tree shows all of the OSD as being down.
>> >> >> >>
>> >> >> >>  # ceph osd tree
>> >> >> >> 2017-11-27 21:32:51.969335 7f483d9c2700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> 2017-11-27 21:32:51.980976 7f483d9c2700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
>> >> >> >> -1       27.77998 root default
>> >> >> >> -3       27.77998     datacenter DC1
>> >> >> >> -6       27.77998         rack 1B06
>> >> >> >> -5        6.48000             host ceph3
>> >> >> >>  1        1.84000                 osd.1    down        0 1.00000
>> >> >> >>  3        4.64000                 osd.3    down        0 1.00000
>> >> >> >> -2        5.53999             host ceph4
>> >> >> >>  5        4.64000                 osd.5    down        0 1.00000
>> >> >> >>  8        0.89999                 osd.8    down        0 1.00000
>> >> >> >> -4        9.28000             host ceph6
>> >> >> >>  0        4.64000                 osd.0    down        0 1.00000
>> >> >> >>  2        4.64000                 osd.2    down        0 1.00000
>> >> >> >> -7        6.48000             host ceph7
>> >> >> >>  6        4.64000                 osd.6    down        0 1.00000
>> >> >> >>  7        1.84000                 osd.7    down        0 1.00000
>> >> >> >>
>> >> >> >> The OSD logs all have this message:
>> >> >> >>
>> >> >> >> 20235 osdmap REQUIRE_JEWEL OSDMap flag is NOT set; please set it.
>> >> >> >
>> >> >> > THis is an annoying corner condition.  12.2.2 (out soon!)  will have a
>> >> >> > --force option to set the flag even tho no osds are up.  Until then, the
>> >> >> > workaround is to downgrade one host to jewel, start one jewel osd, then
>> >> >> > set the flag.  Then upgrade to luminous again and restart all osds.
>> >> >> >
>> >> >> > sage
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> When I try to set it with "ceph osd set require_jewel_osds" I get this error:
>> >> >> >>
>> >> >> >> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_JEWEL feature
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> A "ceph features" returns:
>> >> >> >>
>> >> >> >>     "mon": {
>> >> >> >>         "group": {
>> >> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >> >>             "release": "luminous",
>> >> >> >>             "num": 4
>> >> >> >>         }
>> >> >> >>     },
>> >> >> >>     "mds": {
>> >> >> >>         "group": {
>> >> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >> >>             "release": "luminous",
>> >> >> >>             "num": 1
>> >> >> >>         }
>> >> >> >>     },
>> >> >> >>     "osd": {
>> >> >> >>         "group": {
>> >> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >> >>             "release": "luminous",
>> >> >> >>             "num": 8
>> >> >> >>         }
>> >> >> >>     },
>> >> >> >>     "client": {
>> >> >> >>         "group": {
>> >> >> >>             "features": "0x1ffddff8eea4fffb",
>> >> >> >>             "release": "luminous",
>> >> >> >>             "num": 3
>> >> >> >>
>> >> >> >>  # ceph tell osd.* versions
>> >> >> >> 2017-11-28 02:29:28.565943 7f99c6aee700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> 2017-11-28 02:29:28.578956 7f99c6aee700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.0
>> >> >> >> osd.0: problem getting command descriptions from osd.0
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.1
>> >> >> >> osd.1: problem getting command descriptions from osd.1
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.2
>> >> >> >> osd.2: problem getting command descriptions from osd.2
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.3
>> >> >> >> osd.3: problem getting command descriptions from osd.3
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.5
>> >> >> >> osd.5: problem getting command descriptions from osd.5
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.6
>> >> >> >> osd.6: problem getting command descriptions from osd.6
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.7
>> >> >> >> osd.7: problem getting command descriptions from osd.7
>> >> >> >> Error ENXIO: problem getting command descriptions from osd.8
>> >> >> >> osd.8: problem getting command descriptions from osd.8
>> >> >> >>
>> >> >> >>  # ceph daemon osd.1 status
>> >> >> >>
>> >> >> >>     "cluster_fsid": "CENSORED",
>> >> >> >>     "osd_fsid": "CENSORED",
>> >> >> >>     "whoami": 1,
>> >> >> >>     "state": "preboot",
>> >> >> >>     "oldest_map": 19482,
>> >> >> >>     "newest_map": 20235,
>> >> >> >>     "num_pgs": 141
>> >> >> >>
>> >> >> >>  # ceph -s
>> >> >> >> 2017-11-27 22:04:10.372471 7f89a3935700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >> 2017-11-27 22:04:10.375709 7f89a3935700 -1 WARNING: the following
>> >> >> >> dangerous and experimental features are enabled: btrfs
>> >> >> >>   cluster:
>> >> >> >>     id:     CENSORED
>> >> >> >>     health: HEALTH_ERR
>> >> >> >>             513 pgs are stuck inactive for more than 60 seconds
>> >> >> >>             126 pgs backfill_wait
>> >> >> >>             52 pgs backfilling
>> >> >> >>             435 pgs degraded
>> >> >> >>             513 pgs stale
>> >> >> >>             435 pgs stuck degraded
>> >> >> >>             513 pgs stuck stale
>> >> >> >>             435 pgs stuck unclean
>> >> >> >>             435 pgs stuck undersized
>> >> >> >>             435 pgs undersized
>> >> >> >>             recovery 854719/3688140 objects degraded (23.175%)
>> >> >> >>             recovery 838607/3688140 objects misplaced (22.738%)
>> >> >> >>             mds cluster is degraded
>> >> >> >>             crush map has straw_calc_version=0
>> >> >> >>
>> >> >> >>   services:
>> >> >> >>     mon: 4 daemons, quorum 0,1,3,2
>> >> >> >>     mgr: 0(active), standbys: 1, 5
>> >> >> >>     mds: cephfs-1/1/1 up  {0=a=up:replay}, 1 up:standby
>> >> >> >>     osd: 8 osds: 0 up, 0 in
>> >> >> >>
>> >> >> >>   data:
>> >> >> >>     pools:   7 pools, 513 pgs
>> >> >> >>     objects: 1199k objects, 4510 GB
>> >> >> >>     usage:   13669 GB used, 15150 GB / 28876 GB avail
>> >> >> >>     pgs:     854719/3688140 objects degraded (23.175%)
>> >> >> >>              838607/3688140 objects misplaced (22.738%)
>> >> >> >>              257 stale+active+undersized+degraded
>> >> >> >>              126 stale+active+undersized+degraded+remapped+backfill_wait
>> >> >> >>              78  stale+active+clean
>> >> >> >>              52  stale+active+undersized+degraded+remapped+backfilling
>> >> >> >>
>> >> >> >>
>> >> >> >> I ran "ceph auth list", and client.admin has the following permissions.
>> >> >> >> auid: 0
>> >> >> >> caps: [mds] allow
>> >> >> >> caps: [mgr] allow *
>> >> >> >> caps: [mon] allow *
>> >> >> >> caps: [osd] allow *
>> >> >> >>
>> >> >> >> Thank you for your time.
>> >> >> >>
>> >> >> >> Is there any way I can get these OSDs to join the cluster now, or
>> >> >> >> recover my data?
>> >> >> >>
>> >> >> >> Cary
>> >> >> >> -Dynamic
>> >> >> >> --
>> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-30  1:13               ` Cary
@ 2017-11-30  3:10                 ` Sage Weil
  2017-12-04  5:36                   ` Cary
  0 siblings, 1 reply; 12+ messages in thread
From: Sage Weil @ 2017-11-30  3:10 UTC (permalink / raw)
  To: Cary; +Cc: ceph-devel

On Thu, 30 Nov 2017, Cary wrote:
> I believe I see what I was doing wrong. I had to run "ceph-osd set
> require_jewel_osds --yes-i-really-mean-it"

'ceph osd set ...', not 'ceph-osd set ...'.

> This is the error I am getting now.
> 2017-11-30 01:11:19.691 7fc171dbd5c0 -1 unrecognized arg set
> src/tcmalloc.cc:284] Attempt to free invalid pointer 0x56262bacf4c0
> *** Caught signal (Aborted) **
> [...]

...and that is an embarassing error from bad arguments on the command 
line!

sage


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap
  2017-11-30  3:10                 ` Sage Weil
@ 2017-12-04  5:36                   ` Cary
  2017-12-04  7:47                     ` Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap [how to avoid in Gentoo in future] Robin H. Johnson
  0 siblings, 1 reply; 12+ messages in thread
From: Cary @ 2017-12-04  5:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Sage,

 I accidentally upgraded one of my monitors to Mimic when installing
the 9999 ebuild. That ebuild pulled the latest code from
https://github.com/ceph/ceph.git which was Mimic. I uninstalled Mimic
and installed Luminous 12.2.2. Then I was able to run "ceph osd set
require_jewel_osds --yes-i-really-mean-it", and get my cluster to a
healthy state running Luminous 12.2.1. I will update the rest of the
cluster to Luminous 12.2.2 later.

 Thank you for you time and helping me with that!

Cary
-Dynamic


On Thu, Nov 30, 2017 at 3:10 AM, Sage Weil <sage@newdream.net> wrote:
> On Thu, 30 Nov 2017, Cary wrote:
>> I believe I see what I was doing wrong. I had to run "ceph-osd set
>> require_jewel_osds --yes-i-really-mean-it"
>
> 'ceph osd set ...', not 'ceph-osd set ...'.
>
>> This is the error I am getting now.
>> 2017-11-30 01:11:19.691 7fc171dbd5c0 -1 unrecognized arg set
>> src/tcmalloc.cc:284] Attempt to free invalid pointer 0x56262bacf4c0
>> *** Caught signal (Aborted) **
>> [...]
>
> ...and that is an embarassing error from bad arguments on the command
> line!
>
> sage
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap [how to avoid in Gentoo in future]
  2017-12-04  5:36                   ` Cary
@ 2017-12-04  7:47                     ` Robin H. Johnson
  0 siblings, 0 replies; 12+ messages in thread
From: Robin H. Johnson @ 2017-12-04  7:47 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Mon, Dec 04, 2017 at 05:36:13AM +0000, Cary wrote:
> Sage,
> 
>  I accidentally upgraded one of my monitors to Mimic when installing
> the 9999 ebuild. That ebuild pulled the latest code from
> https://github.com/ceph/ceph.git which was Mimic. I uninstalled Mimic
> and installed Luminous 12.2.2. Then I was able to run "ceph osd set
> require_jewel_osds --yes-i-really-mean-it", and get my cluster to a
> healthy state running Luminous 12.2.1. I will update the rest of the
> cluster to Luminous 12.2.2 later.
Gentoo-specific:
Should the Gentoo maintainers restructure the generic -9999 to track tip of branches?
Something like:
10.2.9999 - Jewel
12.2.9999 - Luminous
13.2.9999 - Mimic

Would this have helped avoid you accidentally running Mimic code on a
Luminous cluster?

I do have ebuilds for the above, since I worked on them...

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-04  7:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28  2:46 Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap Cary
2017-11-28  3:09 ` Sage Weil
2017-11-28  3:45   ` Cary
2017-11-28 13:09     ` Sage Weil
2017-11-28 18:11       ` Cary
2017-11-28 18:45         ` Sage Weil
2017-11-30  0:48           ` Cary
2017-11-30  0:50             ` Sage Weil
2017-11-30  1:13               ` Cary
2017-11-30  3:10                 ` Sage Weil
2017-12-04  5:36                   ` Cary
2017-12-04  7:47                     ` Upgrade from Jewel to Luminous. REQUIRE_JEWEL OSDMap [how to avoid in Gentoo in future] Robin H. Johnson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.