All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Bug found: Missing lnetctl command on any recent daily built package
@ 2016-03-04 13:55 10000
  2016-03-04 19:24 ` Christopher J. Morrone
  0 siblings, 1 reply; 5+ messages in thread
From: 10000 @ 2016-03-04 13:55 UTC (permalink / raw)
  To: lustre-devel

Hi,
      First I would like to answer the question that llmount.sh cannot successfully run when Install lustre at CentOS 7.2 which I post a month ago, hopefully it would be useful. If you have the same problem, maybe it is due to you have more than one network card on the machine and the ip where hostname sets are not the same as Lustre using, try running "dmesg", and if you find something like "NIDs not found", you can add a conf file in /etc/modprobe.d/ which contains the line "options lnet networks=tcp0(enp0s8)". The word 'tcp0' stands for drivers and the word 'enp0s8' for the network card. You can find more about such configuration at Chapter 15 on the official manual.

      Now Let's come to the bug (or some reason developers are not specified). It's easy to repeat it at CentOS 7.2, just follow these steps.
      1. Install all package that it is needed for compiling Lustre but libyaml-devel
      2. Build the Lustre customerized Linux kernel, my version is 3.10.0-327.el7.x86_64. You can refer the old manual at Chapter 30. Installing a Lustre File System from Source Code although there are some mistakes.
      3. git clone the newest Lustre-release code from git://git.hpdd.intel.com/fs/lustre-release.git
      4. Run 'configure' at root folder of lustre-release code with the path of the modified kernel source. It may look like as follow:
        ./configure --with-linux=/root/rpmbuild/BUILD/kernel-3.10.0_327.el7_lustre.x86_64/ --with-o2ib=no
      5. run 'make' at root folder of lustre-release code
      6. Once you finished, just go to the folder lustre-release/lnet/utils and you will see that the lnetctl are not exist while other such as 'lst' is.
      7. Check the Makefile under that folder, you will find that there comment symbol '#' on line 123:
            line 121:          sbin_PROGRAMS = routerstat$(EXEEXT) lst$(EXEEXT) \
            line 122:            $(am__EXEEXT_1) $(am__EXEEXT_2)
            line 123:          #am__append_1 = lnetctl
            line 124:          am__append_2 = wirecheck
            line 125:          subdir = lnet/utils
          And also at line 176: #am__EXEEXT_1 = lnetctl$(EXEEXT)
          And also at the folder lustre-release/lnet/utils/lnetconfig there are no 'lnetconfig.la' which should exist because lnetctl need it to compile.

      It won't help even you run 'configure' and 'make' for more times. And now you install libyaml-devel, you can install it by yum or source. I use yum to install it by simpily running: sudo yum install libyaml-devel.
      Now you run 'configure' and 'make' again, you will find 'lnetctl' has been successfully compiled and placed at lustre-release/lnet/utils. You can also run './lnetctl' at the folder and find it seems work.

      Since lnetctl is an important tool to configure LNET, and the total Chapter 9 are tell us how to use it configuring network. I think if it is due to lack of some package, it should have a message to tell us that was lacking a package, however, the 'configure' and 'make' are running successfully without any message but not produce the 'lnetctl'.  That Makefile must be auto-gernerate by running 'configure'. I hope the developers can check this problem, and a bad news is that all recent daily built package at https://build.hpdd.intel.com/job/lustre-master/ , such as '#3330' are not containing 'lnetctl', you can just download, install and check. (Mostly you will get "bash: lnetctl: command not found", I add this to make search engine find this email on purpose which may help others, the Lustre are lacking of documents ^_^)

Yingdi Guo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Bug found: Missing lnetctl command on any recent daily built package
  2016-03-04 13:55 [lustre-devel] Bug found: Missing lnetctl command on any recent daily built package 10000
@ 2016-03-04 19:24 ` Christopher J. Morrone
  2016-03-05  3:14   ` Drokin, Oleg
  0 siblings, 1 reply; 5+ messages in thread
From: Christopher J. Morrone @ 2016-03-04 19:24 UTC (permalink / raw)
  To: lustre-devel

Please open a bug report for this in the Lustre issue tracker:

  https://jira.hpdd.intel.com/

If you check your configure output you will likely see that it does tell
you that it could not find libyaml.  lnet/autoconf/lustre-lnet.m4
contains the LN_CONFIG_DLC function that should be printing "libyaml not
present".  Granted, it is only a warning, and the consequences of
it being missing are not going to be clear to anyone.

I would not have thought that llmount.sh would require lnetctl.  If it
does, then the LN_CONFIG_DLC function should be throwing an error not a
warning.

And Intel's build farm should definitely be fixed to build lnetctl.
That is worrying if they missed that.

But like I said, open a bug report and we'll get to the bottom of it.

Chris

On 03/04/2016 05:55 AM, 10000 wrote:
> Hi,
>       First I would like to answer the question that llmount.sh cannot successfully run when Install lustre at CentOS 7.2 which I post a month ago, hopefully it would be useful. If you have the same problem, maybe it is due to you have more than one network card on the machine and the ip where hostname sets are not the same as Lustre using, try running "dmesg", and if you find something like "NIDs not found", you can add a conf file in /etc/modprobe.d/ which contains the line "options lnet networks=tcp0(enp0s8)". The word 'tcp0' stands for drivers and the word 'enp0s8' for the network card. You can find more about such configuration at Chapter 15 on the official manual.
> 
>       Now Let's come to the bug (or some reason developers are not specified). It's easy to repeat it at CentOS 7.2, just follow these steps.
>       1. Install all package that it is needed for compiling Lustre but libyaml-devel
>       2. Build the Lustre customerized Linux kernel, my version is 3.10.0-327.el7.x86_64. You can refer the old manual at Chapter 30. Installing a Lustre File System from Source Code although there are some mistakes.
>       3. git clone the newest Lustre-release code from git://git.hpdd.intel.com/fs/lustre-release.git
>       4. Run 'configure' at root folder of lustre-release code with the path of the modified kernel source. It may look like as follow:
>         ./configure --with-linux=/root/rpmbuild/BUILD/kernel-3.10.0_327.el7_lustre.x86_64/ --with-o2ib=no
>       5. run 'make' at root folder of lustre-release code
>       6. Once you finished, just go to the folder lustre-release/lnet/utils and you will see that the lnetctl are not exist while other such as 'lst' is.
>       7. Check the Makefile under that folder, you will find that there comment symbol '#' on line 123:
>             line 121:          sbin_PROGRAMS = routerstat$(EXEEXT) lst$(EXEEXT) \
>             line 122:            $(am__EXEEXT_1) $(am__EXEEXT_2)
>             line 123:          #am__append_1 = lnetctl
>             line 124:          am__append_2 = wirecheck
>             line 125:          subdir = lnet/utils
>           And also at line 176: #am__EXEEXT_1 = lnetctl$(EXEEXT)
>           And also at the folder lustre-release/lnet/utils/lnetconfig there are no 'lnetconfig.la' which should exist because lnetctl need it to compile.
> 
>       It won't help even you run 'configure' and 'make' for more times. And now you install libyaml-devel, you can install it by yum or source. I use yum to install it by simpily running: sudo yum install libyaml-devel.
>       Now you run 'configure' and 'make' again, you will find 'lnetctl' has been successfully compiled and placed at lustre-release/lnet/utils. You can also run './lnetctl' at the folder and find it seems work.
> 
>       Since lnetctl is an important tool to configure LNET, and the total Chapter 9 are tell us how to use it configuring network. I think if it is due to lack of some package, it should have a message to tell us that was lacking a package, however, the 'configure' and 'make' are running successfully without any message but not produce the 'lnetctl'.  That Makefile must be auto-gernerate by running 'configure'. I hope the developers can check this problem, and a bad news is that all recent daily built package at https://build.hpdd.intel.com/job/lustre-master/ , such as '#3330' are not containing 'lnetctl', you can just download, install and check. (Mostly you will get "bash: lnetctl: command not found", I add this to make search engine find this email on purpose which may help others, the Lustre are lacking of documents ^_^)
> 
> Yingdi Guo
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
> .
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Bug found: Missing lnetctl command on any recent daily built package
  2016-03-04 19:24 ` Christopher J. Morrone
@ 2016-03-05  3:14   ` Drokin, Oleg
  2016-03-05  6:08     ` [lustre-devel] Bug found: Missing lnetctl command on any recentdaily " 10000
  0 siblings, 1 reply; 5+ messages in thread
From: Drokin, Oleg @ 2016-03-05  3:14 UTC (permalink / raw)
  To: lustre-devel


On Mar 4, 2016, at 2:24 PM, Christopher J. Morrone wrote:

> I would not have thought that llmount.sh would require lnetctl.  If it
> does, then the LN_CONFIG_DLC function should be throwing an error not a
> warning.

llmount.sh does not appear to need lnetctl (I use the llmount.sh,
and I do not have lnetctl built). Also grep for lnetctl in lustre/tests
has no hits.

Bye,
    Oelg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Bug found: Missing lnetctl command on any recentdaily built package
  2016-03-05  3:14   ` Drokin, Oleg
@ 2016-03-05  6:08     ` 10000
  2016-03-05 12:16       ` Drokin, Oleg
  0 siblings, 1 reply; 5+ messages in thread
From: 10000 @ 2016-03-05  6:08 UTC (permalink / raw)
  To: lustre-devel

On Mar 5, 2016, at 11:14 AM, Drokin, Oleg wrote:

> llmount.sh does not appear to need lnetctl (I use the llmount.sh,
> and I do not have lnetctl built). 

I would say it may be needed at some situation. I can repeat this situation at VirtualBox with the following steps:
1. Create a virtual machine with two network interface card, and the first one set to NAT network while the second one set to Host-Only network.
2. Install CentOS 7.2 on it.
3.  run "ip addr" you may get as below:

     	1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
	   ...
	    inet 127.0.0.1/8 scope host lo
	   ...
	2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
	   ...
	    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
	   ...
	3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
	   ...
	    inet 192.168.56.101/24 brd 192.168.56.255 scope global dynamic enp0s8
	   ...

     As it shows you have the NAT network at first NIC and the host-only network at second.
     Now modify /etc/hostname with a name you specified (such as "node1") and modify /etc/hosts adding the host-only IP address with that hostname. you may reboot the machine after modifing to take effect.
     After modify these two files, if you run 'cat' to them, you should get something like as below:

     	[eteced at node1 ~]$ cat /etc/hostname 
	node1
	[eteced at node1 ~]$ cat /etc/hosts
	127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
	::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
	192.168.56.101  node1
	[eteced at node1 ~]$ 

4. Download the latest build rpms (#3330) from https://build.hpdd.intel.com/job/lustre-master/ and install them. (You may need reboot to use the kernel which just been installed)
5. Simply run "llmount.sh" at /lib64/lustre/tests/llmount.sh, it would like as below:

	[root at node1 eteced]# /lib64/lustre/tests/llmount.sh
	Stopping clients: node1 /mnt/lustre (opts:)
	Stopping clients: node1 /mnt/lustre2 (opts:)
	Loading modules from /lib64/lustre/tests/..
	detected 1 online CPUs by sysfs
	libcfs will create CPU partition based on online CPUs
	debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
	subsystem_debug=all -lnet -lnd -pinger
	quota/lquota options: 'hash_lqs_cur_bits=3'
	Formatting mgs, mds, osts
	Format mds1: /tmp/lustre-mdt1
	Format ost1: /tmp/lustre-ost1
	Format ost2: /tmp/lustre-ost2
	Checking servers environments
	Checking clients node1 environments
	Loading modules from /lib64/lustre/tests/..
	detected 1 online CPUs by sysfs
	libcfs will create CPU partition based on online CPUs
	debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
	subsystem_debug=all -lnet -lnd -pinger
	Setup mgs, mdt, osts
	Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
	Started lustre-MDT0000
	Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
	mount.lustre: mount /dev/loop1 at /mnt/ost1 failed: Connection timed out

   And then you may run 'dmesg', it shows:

   	...
	[  134.960367] LNetError: 120-3: Refusing connection from 192.168.56.101 for 192.168.56.101 at tcp: No matching NI
	[  134.960666] LNetError: 10438:0:(socklnd_cb.c:1723:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.56.101
	[  134.961040] LNetError: 11b-b: Connection to 192.168.56.101 at tcp at host 192.168.56.101 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.56.101 at tcp one of its NIDs?
	[  139.960163] Lustre: 10446:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1457156893/real 1457156893]  req at ffff88020433a600 x1527939743088740/t0(0) o250->MGC192.168.56.101 at tcp@192.168.56.101 at tcp:26/25 lens 520/544 e 0 to 1 dl 1457156898 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
	[  139.960500] Lustre: lustre-MDT0000: Connection restored to 10.0.2.15 at tcp (at 0 at lo)
	[  139.960684] LNetError: 120-3: Refusing connection from 192.168.56.101 for 192.168.56.101 at tcp: No matching NI
	[  139.961892] LNetError: 10439:0:(socklnd_cb.c:1723:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.56.101
	[  139.962902] LNetError: 11b-b: Connection to 192.168.56.101 at tcp at host 192.168.56.101 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.56.101 at tcp one of its NIDs?
	[  144.971200] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -110. Is the MGS running?
	[  144.972325] LustreError: 11686:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -110
	[  144.974060] LustreError: 11686:0:(obd_mount_server.c:1512:server_put_super()) no obd lustre-OST0000
	[  144.974866] LustreError: 11686:0:(obd_mount_server.c:140:server_deregister_mount()) lustre-OST0000 not registered
	[  145.011302] Lustre: server umount lustre-OST0000 complete
	[  145.011302] LustreError: 11686:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-110)

   Since there are no 'lnetctl' command line tools, you may have to add a conf at /etc/modprobe.d/ with line "options lnet networks=tcp0(enp0s8)" , then you need to run 'llmountcleanup.sh' before running 'llmount.sh' again. Just like below:

     	[root at node1 eteced]# /lib64/lustre/tests/llmountcleanup.sh
	Stopping clients: node1 /mnt/lustre (opts:-f)
	Stopping clients: node1 /mnt/lustre2 (opts:-f)
	Stopping /mnt/mds1 (opts:-f) on node1
	modules unloaded.
	[root at node1 eteced]# /lib64/lustre/tests/llmount.sh
	Stopping clients: node1 /mnt/lustre (opts:)
	Stopping clients: node1 /mnt/lustre2 (opts:)
	Loading modules from /lib64/lustre/tests/..
	detected 1 online CPUs by sysfs
	libcfs will create CPU partition based on online CPUs
	debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
	subsystem_debug=all -lnet -lnd -pinger
	quota/lquota options: 'hash_lqs_cur_bits=3'
	Formatting mgs, mds, osts
	Format mds1: /tmp/lustre-mdt1
	Format ost1: /tmp/lustre-ost1
	Format ost2: /tmp/lustre-ost2
	Checking servers environments
	Checking clients node1 environments
	Loading modules from /lib64/lustre/tests/..
	detected 1 online CPUs by sysfs
	libcfs will create CPU partition based on online CPUs
	debug=vfstrace rpctrace dlmtrace neterror ha config                   ioctl super lfsck
	subsystem_debug=all -lnet -lnd -pinger
	Setup mgs, mdt, osts
	Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
	Started lustre-MDT0000
	Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
	Started lustre-OST0000
	Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/ost2
	Started lustre-OST0001
	Starting client: node1:  -o user_xattr,flock node1 at tcp:/lustre /mnt/lustre
	Using TIMEOUT=20
	seting jobstats to procname_uid
	Setting lustre.sys.jobid_var from disable to procname_uid
	Waiting 90 secs for update
	Updated after 3s: wanted 'procname_uid' got 'procname_uid'
	disable quota as required
	[root at node1 eteced]# 

It seems successfully running "llmount.sh". Although these steps are producing at a virtual machine, I think the key point to truggle the bug is that you have two network card, and the hostname is set to the second rather than the first (at /etc/hosts or some other name resolving settings).

I will post it at jira also.

Yingdi Guo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Bug found: Missing lnetctl command on any recentdaily built package
  2016-03-05  6:08     ` [lustre-devel] Bug found: Missing lnetctl command on any recentdaily " 10000
@ 2016-03-05 12:16       ` Drokin, Oleg
  0 siblings, 0 replies; 5+ messages in thread
From: Drokin, Oleg @ 2016-03-05 12:16 UTC (permalink / raw)
  To: lustre-devel

Hello!

On Mar 5, 2016, at 1:08 AM, 10000 wrote:

> On Mar 5, 2016, at 11:14 AM, Drokin, Oleg wrote:
> 
>> llmount.sh does not appear to need lnetctl (I use the llmount.sh,
>> and I do not have lnetctl built). 
> 
> I would say it may be needed at some situation. I can repeat this situation at VirtualBox with the following steps:
...
> It seems successfully running "llmount.sh". Although these steps are producing at a virtual machine, I think the key point to truggle the bug is that you have two network card, and the hostname is set to the second rather than the first (at /etc/hosts or some other name resolving settings).

You are right that in the situation you describe llmount.sh does not work.
But since it does not call lnetctl, even if lnetctl was available, llmount.sh still would not work in your scenario. Also since llmount.sh loads and reloads the modules, you really
need those lnet module parameters.

> I will post it at jira also.

Thanks! the lnetctl building issue really needs to be fixed.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-05 12:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 13:55 [lustre-devel] Bug found: Missing lnetctl command on any recent daily built package 10000
2016-03-04 19:24 ` Christopher J. Morrone
2016-03-05  3:14   ` Drokin, Oleg
2016-03-05  6:08     ` [lustre-devel] Bug found: Missing lnetctl command on any recentdaily " 10000
2016-03-05 12:16       ` Drokin, Oleg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.