All of lore.kernel.org
 help / color / mirror / Atom feed
* ceph rdma + IB network error
@ 2018-07-19  2:16 Will Zhao
  0 siblings, 0 replies; only message in thread
From: Will Zhao @ 2018-07-19  2:16 UTC (permalink / raw)
  To: Ceph Users, ceph-devel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 10689 bytes --]

Hi all:



By following the instructions:

(https://community.mellanox.com/docs/DOC-2721)

(https://community.mellanox.com/docs/DOC-2693)

(http://hwchiu.com/2017-05-03-ceph-with-rdma.html)



I'm trying to configure CEPH with RDMA feature on environments as follows:



CentOS Linux release 7.2.1511 (Core)

MLNX_OFED_LINUX-4.4-1.0.0.0:

Mellanox Technologies MT27500 Family [ConnectX-3]



rping works between all nodes and add these lines to ceph.conf to enable
RDMA:



public_network = 10.10.121.0/24

cluster_network = 10.10.121.0/24

ms_type = async+rdma

ms_async_rdma_device_name = mlx4_0

ms_async_rdma_port_num = 2



IB network is using 10.10.121.0/24 addresses and "ibdev2netdev" command
shows port 2 is up.

Error occurs when running "ceph-deploy --overwrite-conf mon
create-initial", ceph-deploy log details:



[2018-07-12 17:53:48,943][ceph_deploy.conf][DEBUG ] found configuration
file at: /home/user1/.cephdeploy.conf

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ] Invoked (1.5.37):
/usr/bin/ceph-deploy --overwrite-conf mon create-initial

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ] ceph-deploy options:

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
username                      : None

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
verbose                       : False

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
overwrite_conf                : True

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
subcommand                    : create-initial

[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]  quiet
              : False

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf object at
0x27e6210>

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
cluster                       : ceph

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
func                          : <function mon at 0x2a7d2a8>

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
ceph_conf                     : None

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
default_release               : False

[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
keyrings                      : None

[2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] Deploying mon, cluster
ceph hosts node1

[2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] detecting platform for
host node1 ...

[2018-07-12 17:53:49,005][node1][DEBUG ] connection detected need for sudo

[2018-07-12 17:53:49,039][node1][DEBUG ] connected to host: node1

[2018-07-12 17:53:49,040][node1][DEBUG ] detect platform information from
remote host

[2018-07-12 17:53:49,073][node1][DEBUG ] detect machine type

[2018-07-12 17:53:49,078][node1][DEBUG ] find the location of an executable

[2018-07-12 17:53:49,079][ceph_deploy.mon][INFO  ] distro info: CentOS
Linux 7.2.1511 Core

[2018-07-12 17:53:49,079][node1][DEBUG ] determining if provided host has
same hostname in remote

[2018-07-12 17:53:49,079][node1][DEBUG ] get remote short hostname

[2018-07-12 17:53:49,080][node1][DEBUG ] deploying mon to node1

[2018-07-12 17:53:49,080][node1][DEBUG ] get remote short hostname

[2018-07-12 17:53:49,081][node1][DEBUG ] remote hostname: node1

[2018-07-12 17:53:49,083][node1][DEBUG ] write cluster configuration to
/etc/ceph/{cluster}.conf

[2018-07-12 17:53:49,084][node1][DEBUG ] create the mon path if it does not
exist

[2018-07-12 17:53:49,085][node1][DEBUG ] checking for done path:
/var/lib/ceph/mon/ceph-node1/done

[2018-07-12 17:53:49,085][node1][DEBUG ] create a done file to avoid
re-doing the mon deployment

[2018-07-12 17:53:49,086][node1][DEBUG ] create the init path if it does
not exist

[2018-07-12 17:53:49,089][node1][INFO  ] Running command: sudo systemctl
enable ceph.target

[2018-07-12 17:53:49,365][node1][INFO  ] Running command: sudo systemctl
enable ceph-mon@node1

[2018-07-12 17:53:49,588][node1][INFO  ] Running command: sudo systemctl
start ceph-mon@node1

[2018-07-12 17:53:51,762][node1][INFO  ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status

[2018-07-12 17:53:51,979][node1][DEBUG ]
********************************************************************************

[2018-07-12 17:53:51,979][node1][DEBUG ] status for monitor: mon.node1

[2018-07-12 17:53:51,980][node1][DEBUG ] {

[2018-07-12 17:53:51,980][node1][DEBUG ]   "election_epoch": 3,

[2018-07-12 17:53:51,980][node1][DEBUG ]   "extra_probe_peers": [],

[2018-07-12 17:53:51,980][node1][DEBUG ]   "feature_map": {

[2018-07-12 17:53:51,981][node1][DEBUG ]     "mon": {

[2018-07-12 17:53:51,981][node1][DEBUG ]       "group": {

[2018-07-12 17:53:51,981][node1][DEBUG ]         "features":
"0x1ffddff8eea4fffb",

[2018-07-12 17:53:51,981][node1][DEBUG ]         "num": 1,

[2018-07-12 17:53:51,981][node1][DEBUG ]         "release": "luminous"

[2018-07-12 17:53:51,981][node1][DEBUG ]       }

[2018-07-12 17:53:51,981][node1][DEBUG ]     }

[2018-07-12 17:53:51,982][node1][DEBUG ]   },

[2018-07-12 17:53:51,982][node1][DEBUG ]   "features": {

[2018-07-12 17:53:51,982][node1][DEBUG ]     "quorum_con":
"2305244844532236283",

[2018-07-12 17:53:51,982][node1][DEBUG ]     "quorum_mon": [

[2018-07-12 17:53:51,982][node1][DEBUG ]       "kraken",

[2018-07-12 17:53:51,982][node1][DEBUG ]       "luminous"

[2018-07-12 17:53:51,982][node1][DEBUG ]     ],

[2018-07-12 17:53:51,982][node1][DEBUG ]     "required_con":
"153140804152475648",

[2018-07-12 17:53:51,983][node1][DEBUG ]     "required_mon": [

[2018-07-12 17:53:51,983][node1][DEBUG ]       "kraken",

[2018-07-12 17:53:51,983][node1][DEBUG ]       "luminous"

[2018-07-12 17:53:51,983][node1][DEBUG ]     ]

[2018-07-12 17:53:51,983][node1][DEBUG ]   },

[2018-07-12 17:53:51,983][node1][DEBUG ]   "monmap": {

[2018-07-12 17:53:51,983][node1][DEBUG ]     "created": "2018-07-12
17:41:24.243749",

[2018-07-12 17:53:51,984][node1][DEBUG ]     "epoch": 1,

[2018-07-12 17:53:51,984][node1][DEBUG ]     "features": {

[2018-07-12 17:53:51,984][node1][DEBUG ]       "optional": [],

[2018-07-12 17:53:51,984][node1][DEBUG ]       "persistent": [

[2018-07-12 17:53:51,984][node1][DEBUG ]         "kraken",

[2018-07-12 17:53:51,984][node1][DEBUG ]         "luminous"

[2018-07-12 17:53:51,984][node1][DEBUG ]       ]

[2018-07-12 17:53:51,984][node1][DEBUG ]     },

[2018-07-12 17:53:51,985][node1][DEBUG ]     "fsid":
"9317bc6a-ea20-4376-a390-52afa0b81353",

[2018-07-12 17:53:51,985][node1][DEBUG ]     "modified": "2018-07-12
17:41:24.243749",

[2018-07-12 17:53:51,985][node1][DEBUG ]     "mons": [

[2018-07-12 17:53:51,985][node1][DEBUG ]       {

[2018-07-12 17:53:51,985][node1][DEBUG ]         "addr": "
10.10.121.25:6789/0",

[2018-07-12 17:53:51,985][node1][DEBUG ]         "name": "node1",

[2018-07-12 17:53:51,985][node1][DEBUG ]         "public_addr": "
10.10.121.25:6789/0",

[2018-07-12 17:53:51,986][node1][DEBUG ]         "rank": 0

[2018-07-12 17:53:51,986][node1][DEBUG ]       }

[2018-07-12 17:53:51,986][node1][DEBUG ]     ]

[2018-07-12 17:53:51,986][node1][DEBUG ]   },

[2018-07-12 17:53:51,986][node1][DEBUG ]   "name": "node1",

[2018-07-12 17:53:51,986][node1][DEBUG ]   "outside_quorum": [],

[2018-07-12 17:53:51,986][node1][DEBUG ]   "quorum": [

[2018-07-12 17:53:51,986][node1][DEBUG ]     0

[2018-07-12 17:53:51,987][node1][DEBUG ]   ],

[2018-07-12 17:53:51,987][node1][DEBUG ]   "rank": 0,

[2018-07-12 17:53:51,987][node1][DEBUG ]   "state": "leader",

[2018-07-12 17:53:51,987][node1][DEBUG ]   "sync_provider": []

[2018-07-12 17:53:51,987][node1][DEBUG ] }

[2018-07-12 17:53:51,987][node1][DEBUG ]
********************************************************************************

[2018-07-12 17:53:51,987][node1][INFO  ] monitor: mon.node1 is running

[2018-07-12 17:53:51,989][node1][INFO  ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status

[2018-07-12 17:53:52,156][ceph_deploy.mon][INFO  ] processing monitor
mon.node1

[2018-07-12 17:53:52,194][node1][DEBUG ] connection detected need for sudo

[2018-07-12 17:53:52,230][node1][DEBUG ] connected to host: node1

[2018-07-12 17:53:52,231][node1][DEBUG ] detect platform information from
remote host

[2018-07-12 17:53:52,265][node1][DEBUG ] detect machine type

[2018-07-12 17:53:52,270][node1][DEBUG ] find the location of an executable

[2018-07-12 17:53:52,273][node1][INFO  ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status

[2018-07-12 17:53:52,439][ceph_deploy.mon][INFO  ] mon.node1 monitor has
reached quorum!

[2018-07-12 17:53:52,440][ceph_deploy.mon][INFO  ] all initial monitors are
running and have formed quorum

[2018-07-12 17:53:52,440][ceph_deploy.mon][INFO  ] Running gatherkeys...

[2018-07-12 17:53:52,441][ceph_deploy.gatherkeys][INFO  ] Storing keys in
temp directory /tmp/tmp8bdYT6

[2018-07-12 17:53:52,477][node1][DEBUG ] connection detected need for sudo

[2018-07-12 17:53:52,510][node1][DEBUG ] connected to host: node1

[2018-07-12 17:53:52,511][node1][DEBUG ] detect platform information from
remote host

[2018-07-12 17:53:52,552][node1][DEBUG ] detect machine type

[2018-07-12 17:53:52,558][node1][DEBUG ] get remote short hostname

[2018-07-12 17:53:52,559][node1][DEBUG ] fetch remote file

[2018-07-12 17:53:52,562][node1][INFO  ] Running command: sudo
/usr/bin/ceph --connect-timeout=25 --cluster=ceph
--admin-daemon=/var/run/ceph/ceph-mon.node1.asok mon_status

[2018-07-12 17:53:52,731][node1][INFO  ] Running command: sudo
/usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon.
--keyring=/var/lib/ceph/mon/ceph-node1/keyring auth get client.admin

[2018-07-12 17:54:18,059][node1][ERROR ] "ceph auth get-or-create for
keytype admin returned 1

[2018-07-12 17:54:18,059][node1][DEBUG ] Cluster connection interrupted or
timed out

[2018-07-12 17:54:18,059][node1][ERROR ] Failed to return 'admin' key from
host node1

[2018-07-12 17:54:18,059][ceph_deploy.gatherkeys][ERROR ] Failed to connect
to host:node1

[2018-07-12 17:54:18,060][ceph_deploy.gatherkeys][INFO  ] Destroy temp
directory /tmp/tmp8bdYT6

[2018-07-12 17:54:18,060][ceph_deploy][ERROR ] RuntimeError: Failed to
connect any mon



ceph-mon service is up but cannot be connected to reach, "ceph -s" also
returns same types of error:



2018-07-13 10:44:21.169536 7fa570d4e700  0 monclient(hunting): authenticate
timed out after 300

2018-07-13 10:44:21.169579 7fa570d4e700  0 librados: client.admin
authentication error (110) Connection timed out

[errno 110] error connecting to the cluster


I'am running the ceph version 12.2.4 luminous stable, anyone  has any
suggestion about this issue?



Thx

[-- Attachment #1.2: Type: text/html, Size: 42003 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2018-07-19  2:16 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19  2:16 ceph rdma + IB network error Will Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.