All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Auster and no facet /usr/sbin/lctl
@ 2019-07-16 12:11 Baptiste Gerondeau
  2019-07-16 21:09 ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Baptiste Gerondeau @ 2019-07-16 12:11 UTC (permalink / raw)
  To: lustre-devel

Hi,

I'm currently in the process of bringing up the "3 node" x86 cluster and
running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS 7.6
x86 client & server, installed from repos), I keep getting "MDS: No host
defined for facet /usr/sbin/lctl".

Auster then prints out some pdsh stuff, "Failures : 0" and exits after 16s
obviously without running any tests.

Any suggestions?
Thanks a lot,


PS : My multinode config is attached
PPS: I posted to the devel list because it concerned auster, if I need to
post it elsewhere please let me know
-- 
Baptiste Gerondeau
Engineer - HPC SIG - LDCG - Linaro
#irc : BaptisteGer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190716/d4740c3f/attachment.html>
-------------- next part --------------
[root at x8602 tests]# cat cfg/multinode.sh 
FSNAME=master
FSTYPE=ldiskfs
MOUNT=/mnt/lustre
MOUNT2=/mnt/master2
# MDS and MDT configuration
MDSCOUNT=1

mds_HOST="x86ohpc"
mgs_HOST="x86ohpc"
mdt_HOST="x86ohpc"
MDSDEV1="/dev/sda2"

# OSS and OST configuration
OSTCOUNT=1

ost_HOST="x8601"
OSTDEV1="/dev/sda2"

# Client configuration
CLIENTCOUNT=1
RCLIENTS="x8602"

export PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /root/.ssh/cluster"
PDSH="/usr/bin/pdsh -l root -S -Rssh -d -w"

SHARED_DIRECTORY=${SHARED_DIRECTORY:-/opt/ohpc/pub}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-07-16 12:11 [lustre-devel] Auster and no facet /usr/sbin/lctl Baptiste Gerondeau
@ 2019-07-16 21:09 ` Andreas Dilger
  2019-07-18 10:29   ` Baptiste Gerondeau
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2019-07-16 21:09 UTC (permalink / raw)
  To: lustre-devel

On Jul 16, 2019, at 06:11, Baptiste Gerondeau <baptiste.gerondeau@linaro.org> wrote:
> 
> Hi,
> 
> I'm currently in the process of bringing up the "3 node" x86 cluster and running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS 7.6 x86 client & server, installed from repos), I keep getting "MDS: No host defined for facet /usr/sbin/lctl".
> 
> Auster then prints out some pdsh stuff, "Failures : 0" and exits after 16s obviously without running any tests.
> 
> Any suggestions?
> Thanks a lot,
> 
> 
> PS : My multinode config is attached
> PPS: I posted to the devel list because it concerned auster, if I need to post it elsewhere please let me know

Before running auster, which tries to launch a lot of tests, start with just a plain mount to see if that is working:

master.sh:
> MOUNT=/mnt/lustre
> MOUNT2=/mnt/master2

This is a bit odd for tests, which normally have e.g. /mnt/master and /mnt/master2, but I'm
not sure i there will be a problem or not.

### assume modules/utils are built
### modules/utils are installed or you are running out of the build directory
### ssh to the MDS and OSS nodes works without a password
### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is correct

all# modprobe ptlrpc		### on client and OSS and MDS to start LNet
x8602# lctl ping x86ohpc	### should print NID(s) of x860hpc
x8602# lctl ping x8601		### should print NID(s) of x8601
x8602# export NAME=master	### get config from lustre/tests/cfg/master.sh
x8602# sh llmount.sh		### should format x86ohpc:/dev/sda2 and x8601:/dev/sda2
x8602# lfs df			### should show master-MDT0000 and master-OST0000

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-07-16 21:09 ` Andreas Dilger
@ 2019-07-18 10:29   ` Baptiste Gerondeau
  2019-07-18 18:56     ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Baptiste Gerondeau @ 2019-07-18 10:29 UTC (permalink / raw)
  To: lustre-devel

Thank you very much for your quick help !
I reformatted and remounted everything from scratch and can confirm that
mounting works, and that the client can communicate with the MDS (210, OSS
is 211 and client 212):

[root at x8602 tests]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.40.24.212 at tcp
          status: up
          interfaces:
              0: eno1
[root at x8602 tests]# lnetctl peer show -v
peer:
    - primary nid: 10.40.24.210 at tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.40.24.210 at tcp
          state: NA
          max_ni_tx_credits: 8
          available_tx_credits: 8
          min_tx_credits: 6
          tx_q_num_of_buf: 0
          available_rtr_credits: 8
          min_rtr_credits: 8
          refcount: 1
          statistics:
              send_count: 137546
              recv_count: 137545
              drop_count: 0
    - primary nid: 10.40.24.212 at tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.40.24.212 at tcp
          state: NA
          max_ni_tx_credits: 8
          available_tx_credits: 8
          min_tx_credits: -84
          tx_q_num_of_buf: 0
          available_rtr_credits: 8
          min_rtr_credits: 8
          refcount: 1
          statistics:
              send_count: 291726
              recv_count: 291726
              drop_count: 0
    - primary nid: 10.40.24.211 at tcp
      Multi-Rail: True
      peer ni:
        - nid: 10.40.24.211 at tcp
          state: NA
          max_ni_tx_credits: 8
          available_tx_credits: 8
          min_tx_credits: 7
          tx_q_num_of_buf: 0
          available_rtr_credits: 8
          min_rtr_credits: 8
          refcount: 1
          statistics:
              send_count: 56
              recv_count: 56
              drop_count: 0
[root at x8602 tests]# lctl which_nid 10.40.24.210 at tcp
10.40.24.210 at tcp
[root at x8602 tests]# lfs df -ih
UUID                      Inodes       IUsed       IFree IUse% Mounted on
test-MDT0000_UUID           4.0M         272        4.0M   1% /lustre[MDT:0]
test-OST0000_UUID         640.0K         267      639.7K   0% /lustre[OST:0]

filesystem_summary:       640.0K         272      639.7K   0% /lustre

[root at x8602 tests]#  ls -lsah /lustre/
total 12K
4.0K drwxr-xr-x   3 root root 4.0K Jul 18 11:03 .
4.0K dr-xr-xr-x. 19 root root 4.0K Jun 28 11:43 ..
4.0K -rw-r--r--   1 root root   14 Jul 18 11:03 test.txt

I get the same output from auster though:
Client: Lustre version: 2.12.0
MDS: No host defined for facet /usr/sbin/lctl
OSS: Lustre version: 2.12.0

From the client I can ssh into the other nodes (and from each node I can
ssh into the others).
I had tried to debug the scripts behind the above auster output but was
unable to track down where it failed...

On Tue, 16 Jul 2019 at 23:09, Andreas Dilger <adilger@whamcloud.com> wrote:

> On Jul 16, 2019, at 06:11, Baptiste Gerondeau <
> baptiste.gerondeau at linaro.org> wrote:
> >
> > Hi,
> >
> > I'm currently in the process of bringing up the "3 node" x86 cluster and
> running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS 7.6
> x86 client & server, installed from repos), I keep getting "MDS: No host
> defined for facet /usr/sbin/lctl".
> >
> > Auster then prints out some pdsh stuff, "Failures : 0" and exits after
> 16s obviously without running any tests.
> >
> > Any suggestions?
> > Thanks a lot,
> >
> >
> > PS : My multinode config is attached
> > PPS: I posted to the devel list because it concerned auster, if I need
> to post it elsewhere please let me know
>
> Before running auster, which tries to launch a lot of tests, start with
> just a plain mount to see if that is working:
>
> master.sh:
> > MOUNT=/mnt/lustre
> > MOUNT2=/mnt/master2
>
> This is a bit odd for tests, which normally have e.g. /mnt/master and
> /mnt/master2, but I'm
> not sure i there will be a problem or not.
>
> ### assume modules/utils are built
> ### modules/utils are installed or you are running out of the build
> directory
> ### ssh to the MDS and OSS nodes works without a password
> ### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is
> correct
>
> all# modprobe ptlrpc            ### on client and OSS and MDS to start LNet
> x8602# lctl ping x86ohpc        ### should print NID(s) of x860hpc
> x8602# lctl ping x8601          ### should print NID(s) of x8601
> x8602# export NAME=master       ### get config from
> lustre/tests/cfg/master.sh
> x8602# sh llmount.sh            ### should format x86ohpc:/dev/sda2 and
> x8601:/dev/sda2
> x8602# lfs df                   ### should show master-MDT0000 and
> master-OST0000
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>

-- 
Baptiste Gerondeau
Engineer - HPC SIG - LDCG - Linaro
#irc : BaptisteGer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190718/72468e97/attachment.html>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-07-18 10:29   ` Baptiste Gerondeau
@ 2019-07-18 18:56     ` Andreas Dilger
  2019-07-23  8:33       ` Baptiste Gerondeau
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2019-07-18 18:56 UTC (permalink / raw)
  To: lustre-devel

On Jul 18, 2019, at 04:29, Baptiste Gerondeau <baptiste.gerondeau@linaro.org> wrote:
> 
> Thank you very much for your quick help !
> I reformatted and remounted everything from scratch and can confirm that mounting works, and that the client can communicate with the MDS (210, OSS is 211 and client 212):
[snip]
> [root at x8602 tests]# lctl which_nid 10.40.24.210 at tcp
> 10.40.24.210 at tcp
> [root at x8602 tests]# lfs df -ih
> UUID                      Inodes       IUsed       IFree IUse% Mounted on
> test-MDT0000_UUID           4.0M         272        4.0M   1% /lustre[MDT:0]
> test-OST0000_UUID         640.0K         267      639.7K   0% /lustre[OST:0]
> 
> filesystem_summary:       640.0K         272      639.7K   0% /lustre
> 
> [root at x8602 tests]#  ls -lsah /lustre/
> total 12K
> 4.0K drwxr-xr-x   3 root root 4.0K Jul 18 11:03 .
> 4.0K dr-xr-xr-x. 19 root root 4.0K Jun 28 11:43 ..
> 4.0K -rw-r--r--   1 root root   14 Jul 18 11:03 test.txt
> 
> I get the same output from auster though:
> Client: Lustre version: 2.12.0
> MDS: No host defined for facet /usr/sbin/lctl

This looks like some kind of problem with the test configuration file, where an environment variable is not set (e.g. mds_HOST) and it is interpreting the next argument (the lctl command) as the target facet when calling do_facet() or similar?

If "llmount.sh" works, then you are also able to run tests directly like:

client# cd lustre/tests
client# sh sanity.sh

I don't use auster myself (it is just a wrapper around lower-level scripts), so I can't really comment where the problem might be.

Cheers, Andreas

> OSS: Lustre version: 2.12.0
> 
> From the client I can ssh into the other nodes (and from each node I can ssh into the others).
> I had tried to debug the scripts behind the above auster output but was unable to track down where it failed...
> 
> On Tue, 16 Jul 2019 at 23:09, Andreas Dilger <adilger@whamcloud.com> wrote:
> On Jul 16, 2019, at 06:11, Baptiste Gerondeau <baptiste.gerondeau@linaro.org> wrote:
> > 
> > Hi,
> > 
> > I'm currently in the process of bringing up the "3 node" x86 cluster and running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS 7.6 x86 client & server, installed from repos), I keep getting "MDS: No host defined for facet /usr/sbin/lctl".
> > 
> > Auster then prints out some pdsh stuff, "Failures : 0" and exits after 16s obviously without running any tests.
> > 
> > Any suggestions?
> > Thanks a lot,
> > 
> > 
> > PS : My multinode config is attached
> > PPS: I posted to the devel list because it concerned auster, if I need to post it elsewhere please let me know
> 
> Before running auster, which tries to launch a lot of tests, start with just a plain mount to see if that is working:
> 
> master.sh:
> > MOUNT=/mnt/lustre
> > MOUNT2=/mnt/master2
> 
> This is a bit odd for tests, which normally have e.g. /mnt/master and /mnt/master2, but I'm
> not sure i there will be a problem or not.
> 
> ### assume modules/utils are built
> ### modules/utils are installed or you are running out of the build directory
> ### ssh to the MDS and OSS nodes works without a password
> ### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is correct
> 
> all# modprobe ptlrpc            ### on client and OSS and MDS to start LNet
> x8602# lctl ping x86ohpc        ### should print NID(s) of x860hpc
> x8602# lctl ping x8601          ### should print NID(s) of x8601
> x8602# export NAME=master       ### get config from lustre/tests/cfg/master.sh
> x8602# sh llmount.sh            ### should format x86ohpc:/dev/sda2 and x8601:/dev/sda2
> x8602# lfs df                   ### should show master-MDT0000 and master-OST0000
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Baptiste Gerondeau
> Engineer - HPC SIG - LDCG - Linaro
> #irc : BaptisteGer

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-07-18 18:56     ` Andreas Dilger
@ 2019-07-23  8:33       ` Baptiste Gerondeau
  2019-08-02 11:04         ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Baptiste Gerondeau @ 2019-07-23  8:33 UTC (permalink / raw)
  To: lustre-devel

After testing it out on an ARM64 client (hostname : lustrerhel, running
RHEL8, compiled from master), it seems it has the same problem.

I can *successfully* llmount.sh and llmountcleanup.sh and write and read
files from the client.
That said, sanity.sh is *not* working for me : it never gets to the tests
part, it just stops at 'cat /proc/mounts on OSS'.
dmesg says nothing more, and I can't seem to get a more info (an error)
from the logs.
I have confirmed that I can 'cat /proc/mounts' just fine on all the
machines.

Client: Lustre version: 2.12.0
MDS: No host defined for facet /usr/sbin/lctl
OSS: Lustre version: 2.12.0
CMD: lustrerhel,x8602
PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/sbin::/sbin:/bin:/usr/sbin:
NAME=local bash rpc.sh check_config_client /lustre
x8602: x8602: executing check_config_client /lustre
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null
||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null
||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null
||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null
||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
x8602: Checking config lustre mounted on /lustre
lustrerhel: lustrerhel: executing check_config_client /lustre
lustrerhel: Checking config lustre mounted on /lustre
Checking servers environments
[...]
CMD: x86ohpc e2label /dev/sda2 2>/dev/null
x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the
list of known hosts.
CMD: x86ohpc cat /proc/mounts
x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the
list of known hosts.
CMD: x8601 e2label /dev/sda2 2>/dev/null
CMD: x8601 cat /proc/mounts

Thanks a lot for your support,
Best regards,

On Thu, 18 Jul 2019 at 20:56, Andreas Dilger <adilger@whamcloud.com> wrote:

> On Jul 18, 2019, at 04:29, Baptiste Gerondeau <
> baptiste.gerondeau at linaro.org> wrote:
> >
> > Thank you very much for your quick help !
> > I reformatted and remounted everything from scratch and can confirm that
> mounting works, and that the client can communicate with the MDS (210, OSS
> is 211 and client 212):
> [snip]
> > [root at x8602 tests]# lctl which_nid 10.40.24.210 at tcp
> > 10.40.24.210 at tcp
> > [root at x8602 tests]# lfs df -ih
> > UUID                      Inodes       IUsed       IFree IUse% Mounted on
> > test-MDT0000_UUID           4.0M         272        4.0M   1%
> /lustre[MDT:0]
> > test-OST0000_UUID         640.0K         267      639.7K   0%
> /lustre[OST:0]
> >
> > filesystem_summary:       640.0K         272      639.7K   0% /lustre
> >
> > [root at x8602 tests]#  ls -lsah /lustre/
> > total 12K
> > 4.0K drwxr-xr-x   3 root root 4.0K Jul 18 11:03 .
> > 4.0K dr-xr-xr-x. 19 root root 4.0K Jun 28 11:43 ..
> > 4.0K -rw-r--r--   1 root root   14 Jul 18 11:03 test.txt
> >
> > I get the same output from auster though:
> > Client: Lustre version: 2.12.0
> > MDS: No host defined for facet /usr/sbin/lctl
>
> This looks like some kind of problem with the test configuration file,
> where an environment variable is not set (e.g. mds_HOST) and it is
> interpreting the next argument (the lctl command) as the target facet when
> calling do_facet() or similar?
>
> If "llmount.sh" works, then you are also able to run tests directly like:
>
> client# cd lustre/tests
> client# sh sanity.sh
>
> I don't use auster myself (it is just a wrapper around lower-level
> scripts), so I can't really comment where the problem might be.
>
> Cheers, Andreas
>
> > OSS: Lustre version: 2.12.0
> >
> > From the client I can ssh into the other nodes (and from each node I can
> ssh into the others).
> > I had tried to debug the scripts behind the above auster output but was
> unable to track down where it failed...
> >
> > On Tue, 16 Jul 2019 at 23:09, Andreas Dilger <adilger@whamcloud.com>
> wrote:
> > On Jul 16, 2019, at 06:11, Baptiste Gerondeau <
> baptiste.gerondeau at linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > I'm currently in the process of bringing up the "3 node" x86 cluster
> and running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS
> 7.6 x86 client & server, installed from repos), I keep getting "MDS: No
> host defined for facet /usr/sbin/lctl".
> > >
> > > Auster then prints out some pdsh stuff, "Failures : 0" and exits after
> 16s obviously without running any tests.
> > >
> > > Any suggestions?
> > > Thanks a lot,
> > >
> > >
> > > PS : My multinode config is attached
> > > PPS: I posted to the devel list because it concerned auster, if I need
> to post it elsewhere please let me know
> >
> > Before running auster, which tries to launch a lot of tests, start with
> just a plain mount to see if that is working:
> >
> > master.sh:
> > > MOUNT=/mnt/lustre
> > > MOUNT2=/mnt/master2
> >
> > This is a bit odd for tests, which normally have e.g. /mnt/master and
> /mnt/master2, but I'm
> > not sure i there will be a problem or not.
> >
> > ### assume modules/utils are built
> > ### modules/utils are installed or you are running out of the build
> directory
> > ### ssh to the MDS and OSS nodes works without a password
> > ### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is
> correct
> >
> > all# modprobe ptlrpc            ### on client and OSS and MDS to start
> LNet
> > x8602# lctl ping x86ohpc        ### should print NID(s) of x860hpc
> > x8602# lctl ping x8601          ### should print NID(s) of x8601
> > x8602# export NAME=master       ### get config from
> lustre/tests/cfg/master.sh
> > x8602# sh llmount.sh            ### should format x86ohpc:/dev/sda2 and
> x8601:/dev/sda2
> > x8602# lfs df                   ### should show master-MDT0000 and
> master-OST0000
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Principal Lustre Architect
> > Whamcloud
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Baptiste Gerondeau
> > Engineer - HPC SIG - LDCG - Linaro
> > #irc : BaptisteGer
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>

-- 
Baptiste Gerondeau
Engineer - HPC SIG - LDCG - Linaro
#irc : BaptisteGer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190723/616b28e3/attachment.html>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-07-23  8:33       ` Baptiste Gerondeau
@ 2019-08-02 11:04         ` Andreas Dilger
  2019-08-22  8:36           ` Baptiste Gerondeau
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2019-08-02 11:04 UTC (permalink / raw)
  To: lustre-devel

I thought I replied to this email, but maybe it was lost.

It looks like you have "$SINGLEMDS" unset in your test config. It should
just be "mds1".  That is causing the error:

    MDS: No host defined for facet /usr/sbin/lctl

I don't know if that is causing your other problem or something else,
but may as well fix it and see.

You could also run with "sh -vx" to get all the gory details from bash
to see what is being executed.

Cheers, Andreas

On Jul 23, 2019, at 02:33, Baptiste Gerondeau <baptiste.gerondeau at linaro.org<mailto:baptiste.gerondeau@linaro.org>> wrote:

After testing it out on an ARM64 client (hostname : lustrerhel, running RHEL8, compiled from master), it seems it has the same problem.

I can successfully llmount.sh and llmountcleanup.sh and write and read files from the client.
That said, sanity.sh is not working for me : it never gets to the tests part, it just stops at 'cat /proc/mounts on OSS'.
dmesg says nothing more, and I can't seem to get a more info (an error) from the logs.
I have confirmed that I can 'cat /proc/mounts' just fine on all the machines.

Client: Lustre version: 2.12.0
MDS: No host defined for facet /usr/sbin/lctl
OSS: Lustre version: 2.12.0
CMD: lustrerhel,x8602 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/sbin::/sbin:/bin:/usr/sbin: NAME=local bash rpc.sh check_config_client /lustre
x8602: x8602: executing check_config_client /lustre
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
x8602: Checking config lustre mounted on /lustre
lustrerhel: lustrerhel: executing check_config_client /lustre
lustrerhel: Checking config lustre mounted on /lustre
Checking servers environments
[...]
CMD: x86ohpc e2label /dev/sda2 2>/dev/null
x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the list of known hosts.
CMD: x86ohpc cat /proc/mounts
x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the list of known hosts.
CMD: x8601 e2label /dev/sda2 2>/dev/null
CMD: x8601 cat /proc/mounts

Thanks a lot for your support,
Best regards,

On Thu, 18 Jul 2019 at 20:56, Andreas Dilger <adilger at whamcloud.com<mailto:adilger@whamcloud.com>> wrote:
On Jul 18, 2019, at 04:29, Baptiste Gerondeau <baptiste.gerondeau at linaro.org<mailto:baptiste.gerondeau@linaro.org>> wrote:
>
> Thank you very much for your quick help !
> I reformatted and remounted everything from scratch and can confirm that mounting works, and that the client can communicate with the MDS (210, OSS is 211 and client 212):
[snip]
> [root at x8602 tests]# lctl which_nid 10.40.24.210 at tcp
> 10.40.24.210 at tcp
> [root at x8602 tests]# lfs df -ih
> UUID                      Inodes       IUsed       IFree IUse% Mounted on
> test-MDT0000_UUID           4.0M         272        4.0M   1% /lustre[MDT:0]
> test-OST0000_UUID         640.0K         267      639.7K   0% /lustre[OST:0]
>
> filesystem_summary:       640.0K         272      639.7K   0% /lustre
>
> [root at x8602 tests]#  ls -lsah /lustre/
> total 12K
> 4.0K drwxr-xr-x   3 root root 4.0K Jul 18 11:03 .
> 4.0K dr-xr-xr-x. 19 root root 4.0K Jun 28 11:43 ..
> 4.0K -rw-r--r--   1 root root   14 Jul 18 11:03 test.txt
>
> I get the same output from auster though:
> Client: Lustre version: 2.12.0
> MDS: No host defined for facet /usr/sbin/lctl

This looks like some kind of problem with the test configuration file, where an environment variable is not set (e.g. mds_HOST) and it is interpreting the next argument (the lctl command) as the target facet when calling do_facet() or similar?

If "llmount.sh" works, then you are also able to run tests directly like:

client# cd lustre/tests
client# sh sanity.sh

I don't use auster myself (it is just a wrapper around lower-level scripts), so I can't really comment where the problem might be.

Cheers, Andreas

> OSS: Lustre version: 2.12.0
>
> From the client I can ssh into the other nodes (and from each node I can ssh into the others).
> I had tried to debug the scripts behind the above auster output but was unable to track down where it failed...
>
> On Tue, 16 Jul 2019 at 23:09, Andreas Dilger <adilger at whamcloud.com<mailto:adilger@whamcloud.com>> wrote:
> On Jul 16, 2019, at 06:11, Baptiste Gerondeau <baptiste.gerondeau at linaro.org<mailto:baptiste.gerondeau@linaro.org>> wrote:
> >
> > Hi,
> >
> > I'm currently in the process of bringing up the "3 node" x86 cluster and running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS 7.6 x86 client & server, installed from repos), I keep getting "MDS: No host defined for facet /usr/sbin/lctl".
> >
> > Auster then prints out some pdsh stuff, "Failures : 0" and exits after 16s obviously without running any tests.
> >
> > Any suggestions?
> > Thanks a lot,
> >
> >
> > PS : My multinode config is attached
> > PPS: I posted to the devel list because it concerned auster, if I need to post it elsewhere please let me know
>
> Before running auster, which tries to launch a lot of tests, start with just a plain mount to see if that is working:
>
> master.sh:
> > MOUNT=/mnt/lustre
> > MOUNT2=/mnt/master2
>
> This is a bit odd for tests, which normally have e.g. /mnt/master and /mnt/master2, but I'm
> not sure i there will be a problem or not.
>
> ### assume modules/utils are built
> ### modules/utils are installed or you are running out of the build directory
> ### ssh to the MDS and OSS nodes works without a password
> ### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is correct
>
> all# modprobe ptlrpc            ### on client and OSS and MDS to start LNet
> x8602# lctl ping x86ohpc        ### should print NID(s) of x860hpc
> x8602# lctl ping x8601          ### should print NID(s) of x8601
> x8602# export NAME=master       ### get config from lustre/tests/cfg/master.sh
> x8602# sh llmount.sh            ### should format x86ohpc:/dev/sda2 and x8601:/dev/sda2
> x8602# lfs df                   ### should show master-MDT0000 and master-OST0000
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>
>
> --
> Baptiste Gerondeau
> Engineer - HPC SIG - LDCG - Linaro
> #irc : BaptisteGer

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud








--
Baptiste Gerondeau
Engineer - HPC SIG - LDCG - Linaro
#irc : BaptisteGer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190802/17bd8a13/attachment-0001.html>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [lustre-devel] Auster and no facet /usr/sbin/lctl
  2019-08-02 11:04         ` Andreas Dilger
@ 2019-08-22  8:36           ` Baptiste Gerondeau
  0 siblings, 0 replies; 7+ messages in thread
From: Baptiste Gerondeau @ 2019-08-22  8:36 UTC (permalink / raw)
  To: lustre-devel

Hi Andreas,

Thanks again for the help, "SINGLEMDS=mds1" does the trick !
We have hit some issues with tainted kernel modules and hanging now
(working on RHEL8 on ARM64),
but we need to update to latest kernel update, and install latest Lustre
master and we'll see !

Sorry for the late reply, was off/away from Lustre !

Cheers,


On Fri, 2 Aug 2019 at 13:04, Andreas Dilger <adilger@whamcloud.com> wrote:

> I thought I replied to this email, but maybe it was lost.
>
> It looks like you have "$SINGLEMDS" unset in your test config. It should
> just be "mds1".  That is causing the error:
>
>     MDS: No host defined for facet /usr/sbin/lctl
>
> I don't know if that is causing your other problem or something else,
> but may as well fix it and see.
>
> You could also run with "sh -vx" to get all the gory details from bash
> to see what is being executed.
>
> Cheers, Andreas
>
> On Jul 23, 2019, at 02:33, Baptiste Gerondeau <
> baptiste.gerondeau at linaro.org> wrote:
>
> After testing it out on an ARM64 client (hostname : lustrerhel, running
> RHEL8, compiled from master), it seems it has the same problem.
>
> I can *successfully* llmount.sh and llmountcleanup.sh and write and read
> files from the client.
> That said, sanity.sh is *not* working for me : it never gets to the tests
> part, it just stops at 'cat /proc/mounts on OSS'.
> dmesg says nothing more, and I can't seem to get a more info (an error)
> from the logs.
> I have confirmed that I can 'cat /proc/mounts' just fine on all the
> machines.
>
> Client: Lustre version: 2.12.0
> MDS: No host defined for facet /usr/sbin/lctl
> OSS: Lustre version: 2.12.0
> CMD: lustrerhel,x8602
> PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/sbin::/sbin:/bin:/usr/sbin:
> NAME=local bash rpc.sh check_config_client /lustre
> x8602: x8602: executing check_config_client /lustre
> lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version
> 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
> lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version
> 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
> lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version
> 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
> lustrerhel: CMD: lustrerhel /usr/sbin/lctl get_param -n version
> 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl lustre_build_version 2>/dev/null ||
> lustrerhel: /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
> x8602: Checking config lustre mounted on /lustre
> lustrerhel: lustrerhel: executing check_config_client /lustre
> lustrerhel: Checking config lustre mounted on /lustre
> Checking servers environments
> [...]
> CMD: x86ohpc e2label /dev/sda2 2>/dev/null
> x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the
> list of known hosts.
> CMD: x86ohpc cat /proc/mounts
> x86ohpc: Warning: Permanently added 'x86ohpc,10.40.24.210' (ECDSA) to the
> list of known hosts.
> CMD: x8601 e2label /dev/sda2 2>/dev/null
> CMD: x8601 cat /proc/mounts
>
> Thanks a lot for your support,
> Best regards,
>
> On Thu, 18 Jul 2019 at 20:56, Andreas Dilger <adilger@whamcloud.com>
> wrote:
>
>> On Jul 18, 2019, at 04:29, Baptiste Gerondeau <
>> baptiste.gerondeau at linaro.org> wrote:
>> >
>> > Thank you very much for your quick help !
>> > I reformatted and remounted everything from scratch and can confirm
>> that mounting works, and that the client can communicate with the MDS (210,
>> OSS is 211 and client 212):
>> [snip]
>> > [root at x8602 tests]# lctl which_nid 10.40.24.210 at tcp
>> > 10.40.24.210 at tcp
>> > [root at x8602 tests]# lfs df -ih
>> > UUID                      Inodes       IUsed       IFree IUse% Mounted
>> on
>> > test-MDT0000_UUID           4.0M         272        4.0M   1%
>> /lustre[MDT:0]
>> > test-OST0000_UUID         640.0K         267      639.7K   0%
>> /lustre[OST:0]
>> >
>> > filesystem_summary:       640.0K         272      639.7K   0% /lustre
>> >
>> > [root at x8602 tests]#  ls -lsah /lustre/
>> > total 12K
>> > 4.0K drwxr-xr-x   3 root root 4.0K Jul 18 11:03 .
>> > 4.0K dr-xr-xr-x. 19 root root 4.0K Jun 28 11:43 ..
>> > 4.0K -rw-r--r--   1 root root   14 Jul 18 11:03 test.txt
>> >
>> > I get the same output from auster though:
>> > Client: Lustre version: 2.12.0
>> > MDS: No host defined for facet /usr/sbin/lctl
>>
>> This looks like some kind of problem with the test configuration file,
>> where an environment variable is not set (e.g. mds_HOST) and it is
>> interpreting the next argument (the lctl command) as the target facet when
>> calling do_facet() or similar?
>>
>> If "llmount.sh" works, then you are also able to run tests directly like:
>>
>> client# cd lustre/tests
>> client# sh sanity.sh
>>
>> I don't use auster myself (it is just a wrapper around lower-level
>> scripts), so I can't really comment where the problem might be.
>>
>> Cheers, Andreas
>>
>> > OSS: Lustre version: 2.12.0
>> >
>> > From the client I can ssh into the other nodes (and from each node I
>> can ssh into the others).
>> > I had tried to debug the scripts behind the above auster output but was
>> unable to track down where it failed...
>> >
>> > On Tue, 16 Jul 2019 at 23:09, Andreas Dilger <adilger@whamcloud.com>
>> wrote:
>> > On Jul 16, 2019, at 06:11, Baptiste Gerondeau <
>> baptiste.gerondeau at linaro.org> wrote:
>> > >
>> > > Hi,
>> > >
>> > > I'm currently in the process of bringing up the "3 node" x86 cluster
>> and running "verbose=true ./auster -f multinode -rsv runtests" (on CentOS
>> 7.6 x86 client & server, installed from repos), I keep getting "MDS: No
>> host defined for facet /usr/sbin/lctl".
>> > >
>> > > Auster then prints out some pdsh stuff, "Failures : 0" and exits
>> after 16s obviously without running any tests.
>> > >
>> > > Any suggestions?
>> > > Thanks a lot,
>> > >
>> > >
>> > > PS : My multinode config is attached
>> > > PPS: I posted to the devel list because it concerned auster, if I
>> need to post it elsewhere please let me know
>> >
>> > Before running auster, which tries to launch a lot of tests, start with
>> just a plain mount to see if that is working:
>> >
>> > master.sh:
>> > > MOUNT=/mnt/lustre
>> > > MOUNT2=/mnt/master2
>> >
>> > This is a bit odd for tests, which normally have e.g. /mnt/master and
>> /mnt/master2, but I'm
>> > not sure i there will be a problem or not.
>> >
>> > ### assume modules/utils are built
>> > ### modules/utils are installed or you are running out of the build
>> directory
>> > ### ssh to the MDS and OSS nodes works without a password
>> > ### if you are not using @tcp0 for LNet, /etc/modprobe.d/lnet.conf is
>> correct
>> >
>> > all# modprobe ptlrpc            ### on client and OSS and MDS to start
>> LNet
>> > x8602# lctl ping x86ohpc        ### should print NID(s) of x860hpc
>> > x8602# lctl ping x8601          ### should print NID(s) of x8601
>> > x8602# export NAME=master       ### get config from
>> lustre/tests/cfg/master.sh
>> > x8602# sh llmount.sh            ### should format x86ohpc:/dev/sda2 and
>> x8601:/dev/sda2
>> > x8602# lfs df                   ### should show master-MDT0000 and
>> master-OST0000
>> >
>> > Cheers, Andreas
>> > --
>> > Andreas Dilger
>> > Principal Lustre Architect
>> > Whamcloud
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Baptiste Gerondeau
>> > Engineer - HPC SIG - LDCG - Linaro
>> > #irc : BaptisteGer
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Principal Lustre Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>
> --
> Baptiste Gerondeau
> Engineer - HPC SIG - LDCG - Linaro
> #irc : BaptisteGer
>
>

-- 
Baptiste Gerondeau
Engineer - HPC SIG - LDCG - Linaro
#irc : BaptisteGer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190822/16b9186e/attachment.html>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-22  8:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-16 12:11 [lustre-devel] Auster and no facet /usr/sbin/lctl Baptiste Gerondeau
2019-07-16 21:09 ` Andreas Dilger
2019-07-18 10:29   ` Baptiste Gerondeau
2019-07-18 18:56     ` Andreas Dilger
2019-07-23  8:33       ` Baptiste Gerondeau
2019-08-02 11:04         ` Andreas Dilger
2019-08-22  8:36           ` Baptiste Gerondeau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.