hard disk failure monitor issue: ceph down, please help

* hard disk failure monitor issue: ceph down, please help
@ 2021-08-17 15:01 Ignacio García
  0 siblings, 0 replies; only message in thread
From: Ignacio García @ 2021-08-17 15:01 UTC (permalink / raw)
  To: ceph-devel, ceph-users

Hi friends, we are a SME company that mounted a ceph storage system 
several months ago as a proof of concept, then, as we liked it, started 
to use it in production applications and as a corporative filesystem, 
postponing taking the adequate measures to have a well deployed ceph 
system (3 servers instead of 2, 3 object replica instead of 2, 3 
monitors instead of 1...). The disaster has happened before than that 
and we are desperately asking for your help in order to know whether we 
can recover the system or at least the data.

In short, the boot disk of the server where the only monitor was running 
has failed, containing as well the deamon monitor data (monitor map...). 
We will appreciate any help you can offer us before we break anything 
that could be recoverable trying non expert solutions.

Following are the details, thank you very much in advance:

* system overview:

2 commodity servers, 4 HD each, 6 HDs for ceph osds

2 replica; 1 only monitor

server 1: 1 mon, 1 mgr, 1 mds, 3 osds

server 2: 1 mgr, 1 mds, 3 osds

ceph octopus 15.2.11 containerized docker deamons; cephadm deployed

used for libvirt VMs rbd images, and 1 cephfs

* the problems:

--> HD 1.i failed, then server 1 is down: no monitors, server 2 osds 
unable to start, ceph down

--> client.admin keyring lost

* hard disk structure details:

- server 1:            MODEL    SERIAL    WWN

1.i)    /dev/sda    1.8T  WDC_WD2002FYPS-0     WD-WCAVY7030179 
0x50014ee205e40c09

--> server 1 boot disk, root, and ceph deamons data (/var/lib/ceph, etc) 
--> FAILED

1.ii)    /dev/sdc    7.3T  WDC_WD80EFAX-68L    7HKG3MEF 0x5000cca257f0b152

--> Osd.2

1.iii)    /dev/sdb    7.3T WDC_WD80EFAX-68L    7HKG6H3F 0x5000cca257f0bc0f

--> Osd.1

1.iv)    /dev/sdd    1.8T WDC_WD2002FYPS-0     WD-WCAVY6926130 
0x50014ee25b180bf3

--> Osd.0

- server 2            MODEL    SERIAL    WWN

2.i)    /dev/sda    223,6G  INTEL_SSDSC2KB24 BTYF90350ENF240AGN    
0x55cd2e4150390704

--> server 2 boot disk, root, and ceph deamons data (/var/lib/ceph, etc)

2.ii)    /dev/sdb    7,3T  HGST_HUS728T8TAL    VAGUR01L 0x5000cca099cbafde

--> Osd.3

2.iii)    /dev/sdc    7,3T  HGST_HUS728T8TAL    VGG2G7LG 0x5000cca0bec11e37

->  Osd.4

2.iv)    /dev/sdd    1,8T  WDC_WD2002FYPS-0    WD-WCAVY7261411 
0x50014ee2064414f2

-->  Osd.5

Ignacio G,

Live-Med Iberia

^ permalink raw reply	[flat|nested] only message in thread