All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Steve Wise" <swise@opengridcomputing.com>
To: 'Leon Romanovsky' <leon@kernel.org>
Cc: dsahern@gmail.com, stephen@networkplumber.org,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: RE: [PATCH v1 iproute2-next 1/4] rdma: add helper rd_sendrecv_msg()
Date: Mon, 4 Mar 2019 08:13:04 -0600	[thread overview]
Message-ID: <007901d4d294$62d5d280$28817780$@opengridcomputing.com> (raw)
In-Reply-To: <20190303135052.GY15253@mtr-leonro.mtl.com>



> > >
> > > On 2/23/2019 3:26 AM, Leon Romanovsky wrote:
> > > > On Thu, Feb 21, 2019 at 08:19:03AM -0800, Steve Wise wrote:
> > > >> This function sends the constructed netlink message and then
> > > >> receives the response, displaying any error text.
> > > >>
> > > >> Change 'rdma dev set' to use it.
> > > >>
> > > >> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > > >> ---
> > > >>  rdma/dev.c   |  2 +-
> > > >>  rdma/rdma.h  |  1 +
> > > >>  rdma/utils.c | 21 +++++++++++++++++++++
> > > >>  3 files changed, 23 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/rdma/dev.c b/rdma/dev.c
> > > >> index 60ff4b31e320..d2949c378f08 100644
> > > >> --- a/rdma/dev.c
> > > >> +++ b/rdma/dev.c
> > > >> @@ -273,7 +273,7 @@ static int dev_set_name(struct rd *rd)
> > > >>  	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd-
> > > >dev_idx);
> > > >>  	mnl_attr_put_strz(rd->nlh, RDMA_NLDEV_ATTR_DEV_NAME,
> > > rd_argv(rd));
> > > >>
> > > >> -	return rd_send_msg(rd);
> > > >> +	return rd_sendrecv_msg(rd, seq);
> > > >>  }
> > > >>
> > > >>  static int dev_one_set(struct rd *rd)
> > > >> diff --git a/rdma/rdma.h b/rdma/rdma.h
> > > >> index 547bb5749a39..20be2f12c4f8 100644
> > > >> --- a/rdma/rdma.h
> > > >> +++ b/rdma/rdma.h
> > > >> @@ -115,6 +115,7 @@ bool rd_check_is_key_exist(struct rd *rd, const
> > > char *key);
> > > >>   */
> > > >>  int rd_send_msg(struct rd *rd);
> > > >>  int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data,
uint32_t
> > > seq);
> > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq);
> > > >>  void rd_prepare_msg(struct rd *rd, uint32_t cmd, uint32_t *seq,
> > uint16_t
> > > flags);
> > > >>  int rd_dev_init_cb(const struct nlmsghdr *nlh, void *data);
> > > >>  int rd_attr_cb(const struct nlattr *attr, void *data);
> > > >> diff --git a/rdma/utils.c b/rdma/utils.c
> > > >> index 069d44fece10..a6f2826c9605 100644
> > > >> --- a/rdma/utils.c
> > > >> +++ b/rdma/utils.c
> > > >> @@ -664,6 +664,27 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t
> callback,
> > > void *data, unsigned int seq)
> > > >>  	return ret;
> > > >>  }
> > > >>
> > > >> +static int null_cb(const struct nlmsghdr *nlh, void *data)
> > > >> +{
> > > >> +	return MNL_CB_OK;
> > > >> +}
> > > >> +
> > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq)
> > > >> +{
> > > >> +	int ret;
> > > >> +
> > > >> +	ret = rd_send_msg(rd);
> > > >> +	if (ret) {
> > > >> +		perror(NULL);
> > > > This is more or less already done in rd_send_msg() and that function
> > > > prints something in case of execution error. So the missing piece
> > > > is to update rd_recv_msg(), so all places will "magically" print
errors
> > > > and not only dev_set_name().
> > >
> > > Yea ok.
> > >
> >
> > dev_set_name() doesn't call rd_recv_msg().  So you're suggesting I fix
up
> > rd_recv_msg() to display errors and make dev_set_name() call
> rd_recv_msg()
> > with the null_cb function?  You sure that's the way to go?
> 
> I'm sure that we need to fix dev_set_name(), everything else I'm not sure.
> 
> Thanks

Hey Leon, adding this to rd_recv_msg():

@@ -693,10 +693,28 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void
*data, unsigned int seq)
                ret = mnl_cb_run(buf, ret, seq, portid, callback, data);
        } while (ret > 0);

+       if (ret < 0)
+               perror(NULL);
+
        mnl_socket_close(rd->nl);
        return ret;
 }

Results in unexpected errors being logged when doing a query such as:

[root@stevo1 iproute2]# ./rdma/rdma res show qp lqpn 176
error: Invalid argument
link mlx5_0/1 lqpn 176 type UD state RTS sq-psn 0 comm [ib_core]
error: Invalid argument
error: No such file or directory
error: Invalid argument
error: No such file or directory

It appears the "invalid argument" errors are due to rdmatool sending a
RDMA_NLDEV_CMD_RES_QP_GET command using the doit kernel method to allow
querying for just a QP with lqpn = 176.  However, rdmatool isn't passing a
port index in the messages that generate the "invalid argument" error from
the kernel.  IE you must provide a device index and port index when issuing
a doit command vs a dumpit command.  I think. 

This error was not found because rd_recv_msg() never displayed any errors
previously.  Further, the RES_FUNC() massive macro has code that will retry
a failed doit call with a dumpit call.  I think _##name() should distinguish
between failures reported by the kernel doit function vs failures because no
doit function exists.  Not sure how to support that.


        static inline int _##name(struct rd *rd)
\
        {
\
                uint32_t idx;
\
                int ret;
\
                if (id) {
\
                        ret = rd_doit_index(rd, &idx);
\
                        if (ret) {
\
                                ret = _res_send_idx_msg(rd, command,
\
                                                        name##_idx_parse_cb,
\
                                                        idx, id);
\
                                if (!ret)
\
                                        return ret;
\
                                /* Fallback for old systems without .doit
callbacks */ \
                        }
\
                }
\
                return _res_send_msg(rd, command, name##_parse_cb);
\
        }
\



The "no such file or dir" errors are being returned because, in my setup,
there are 2 other links that do not have lqpn 176.   So there are 2 issues
uncovered by adding generic printing of errors in rd_recv_msg()

1) the doit code in rdmatool is generating requests for a doit method in the
kernel w/o providing a port index.
2) some paths in rdmatool should not print "benign" errors like filtering on
a GET command causing a "does not exist" error returned by the kernel doit
func.

#1 is a bug, IMO.  Can you propose a fix?
#2 could be solved by adding an error callback func passed to rd_recv_msg().
Then the RES_FUNC() functions could parse errors like "no such file or dir"
when doing a filtered query and silently drop them.  And functions like
dev_set_name() would display all errors returned because there are no
expected errors other than "success".

Steve.

WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise@opengridcomputing.com>
To: "'Leon Romanovsky'" <leon@kernel.org>
Cc: <dsahern@gmail.com>, <stephen@networkplumber.org>,
	<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>
Subject: RE: [PATCH v1 iproute2-next 1/4] rdma: add helper rd_sendrecv_msg()
Date: Mon, 4 Mar 2019 08:13:04 -0600	[thread overview]
Message-ID: <007901d4d294$62d5d280$28817780$@opengridcomputing.com> (raw)
In-Reply-To: <20190303135052.GY15253@mtr-leonro.mtl.com>



> > >
> > > On 2/23/2019 3:26 AM, Leon Romanovsky wrote:
> > > > On Thu, Feb 21, 2019 at 08:19:03AM -0800, Steve Wise wrote:
> > > >> This function sends the constructed netlink message and then
> > > >> receives the response, displaying any error text.
> > > >>
> > > >> Change 'rdma dev set' to use it.
> > > >>
> > > >> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > > >> ---
> > > >>  rdma/dev.c   |  2 +-
> > > >>  rdma/rdma.h  |  1 +
> > > >>  rdma/utils.c | 21 +++++++++++++++++++++
> > > >>  3 files changed, 23 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/rdma/dev.c b/rdma/dev.c
> > > >> index 60ff4b31e320..d2949c378f08 100644
> > > >> --- a/rdma/dev.c
> > > >> +++ b/rdma/dev.c
> > > >> @@ -273,7 +273,7 @@ static int dev_set_name(struct rd *rd)
> > > >>  	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd-
> > > >dev_idx);
> > > >>  	mnl_attr_put_strz(rd->nlh, RDMA_NLDEV_ATTR_DEV_NAME,
> > > rd_argv(rd));
> > > >>
> > > >> -	return rd_send_msg(rd);
> > > >> +	return rd_sendrecv_msg(rd, seq);
> > > >>  }
> > > >>
> > > >>  static int dev_one_set(struct rd *rd)
> > > >> diff --git a/rdma/rdma.h b/rdma/rdma.h
> > > >> index 547bb5749a39..20be2f12c4f8 100644
> > > >> --- a/rdma/rdma.h
> > > >> +++ b/rdma/rdma.h
> > > >> @@ -115,6 +115,7 @@ bool rd_check_is_key_exist(struct rd *rd, const
> > > char *key);
> > > >>   */
> > > >>  int rd_send_msg(struct rd *rd);
> > > >>  int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data,
uint32_t
> > > seq);
> > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq);
> > > >>  void rd_prepare_msg(struct rd *rd, uint32_t cmd, uint32_t *seq,
> > uint16_t
> > > flags);
> > > >>  int rd_dev_init_cb(const struct nlmsghdr *nlh, void *data);
> > > >>  int rd_attr_cb(const struct nlattr *attr, void *data);
> > > >> diff --git a/rdma/utils.c b/rdma/utils.c
> > > >> index 069d44fece10..a6f2826c9605 100644
> > > >> --- a/rdma/utils.c
> > > >> +++ b/rdma/utils.c
> > > >> @@ -664,6 +664,27 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t
> callback,
> > > void *data, unsigned int seq)
> > > >>  	return ret;
> > > >>  }
> > > >>
> > > >> +static int null_cb(const struct nlmsghdr *nlh, void *data)
> > > >> +{
> > > >> +	return MNL_CB_OK;
> > > >> +}
> > > >> +
> > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq)
> > > >> +{
> > > >> +	int ret;
> > > >> +
> > > >> +	ret = rd_send_msg(rd);
> > > >> +	if (ret) {
> > > >> +		perror(NULL);
> > > > This is more or less already done in rd_send_msg() and that function
> > > > prints something in case of execution error. So the missing piece
> > > > is to update rd_recv_msg(), so all places will "magically" print
errors
> > > > and not only dev_set_name().
> > >
> > > Yea ok.
> > >
> >
> > dev_set_name() doesn't call rd_recv_msg().  So you're suggesting I fix
up
> > rd_recv_msg() to display errors and make dev_set_name() call
> rd_recv_msg()
> > with the null_cb function?  You sure that's the way to go?
> 
> I'm sure that we need to fix dev_set_name(), everything else I'm not sure.
> 
> Thanks

Hey Leon, adding this to rd_recv_msg():

@@ -693,10 +693,28 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void
*data, unsigned int seq)
                ret = mnl_cb_run(buf, ret, seq, portid, callback, data);
        } while (ret > 0);

+       if (ret < 0)
+               perror(NULL);
+
        mnl_socket_close(rd->nl);
        return ret;
 }

Results in unexpected errors being logged when doing a query such as:

[root@stevo1 iproute2]# ./rdma/rdma res show qp lqpn 176
error: Invalid argument
link mlx5_0/1 lqpn 176 type UD state RTS sq-psn 0 comm [ib_core]
error: Invalid argument
error: No such file or directory
error: Invalid argument
error: No such file or directory

It appears the "invalid argument" errors are due to rdmatool sending a
RDMA_NLDEV_CMD_RES_QP_GET command using the doit kernel method to allow
querying for just a QP with lqpn = 176.  However, rdmatool isn't passing a
port index in the messages that generate the "invalid argument" error from
the kernel.  IE you must provide a device index and port index when issuing
a doit command vs a dumpit command.  I think. 

This error was not found because rd_recv_msg() never displayed any errors
previously.  Further, the RES_FUNC() massive macro has code that will retry
a failed doit call with a dumpit call.  I think _##name() should distinguish
between failures reported by the kernel doit function vs failures because no
doit function exists.  Not sure how to support that.


        static inline int _##name(struct rd *rd)
\
        {
\
                uint32_t idx;
\
                int ret;
\
                if (id) {
\
                        ret = rd_doit_index(rd, &idx);
\
                        if (ret) {
\
                                ret = _res_send_idx_msg(rd, command,
\
                                                        name##_idx_parse_cb,
\
                                                        idx, id);
\
                                if (!ret)
\
                                        return ret;
\
                                /* Fallback for old systems without .doit
callbacks */ \
                        }
\
                }
\
                return _res_send_msg(rd, command, name##_parse_cb);
\
        }
\



The "no such file or dir" errors are being returned because, in my setup,
there are 2 other links that do not have lqpn 176.   So there are 2 issues
uncovered by adding generic printing of errors in rd_recv_msg()

1) the doit code in rdmatool is generating requests for a doit method in the
kernel w/o providing a port index.
2) some paths in rdmatool should not print "benign" errors like filtering on
a GET command causing a "does not exist" error returned by the kernel doit
func.

#1 is a bug, IMO.  Can you propose a fix?
#2 could be solved by adding an error callback func passed to rd_recv_msg().
Then the RES_FUNC() functions could parse errors like "no such file or dir"
when doing a filtered query and silently drop them.  And functions like
dev_set_name() would display all errors returned because there are no
expected errors other than "success".

Steve.



  reply	other threads:[~2019-03-04 14:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-21 18:22 [PATCH v1 iproute2-next 0/4] Dynamic rdma link creation Steve Wise
2019-02-21 16:19 ` [PATCH v1 iproute2-next 1/4] rdma: add helper rd_sendrecv_msg() Steve Wise
2019-02-23  9:26   ` Leon Romanovsky
2019-02-23  9:31     ` Leon Romanovsky
2019-02-26 17:19       ` Steve Wise
2019-02-26 19:16         ` Leon Romanovsky
2019-02-26 19:55           ` Steve Wise
2019-02-26 19:55         ` David Ahern
2019-02-26 17:13     ` Steve Wise
2019-02-27 20:23       ` Steve Wise
2019-02-27 20:23         ` Steve Wise
2019-03-03 13:50         ` Leon Romanovsky
2019-03-04 14:13           ` Steve Wise [this message]
2019-03-04 14:13             ` Steve Wise
2019-03-06 21:50             ` Steve Wise
2019-03-07  8:33               ` Leon Romanovsky
2019-02-28 19:41     ` Steve Wise
2019-02-28 19:56       ` Leon Romanovsky
2019-02-28 20:10         ` Steve Wise
2019-02-21 16:19 ` [PATCH v1 iproute2-next 2/4] Sync up rdma_netlink.h Steve Wise
2019-02-21 18:56   ` Stephen Hemminger
2019-02-21 22:10     ` Leon Romanovsky
2019-02-21 23:11     ` Jason Gunthorpe
2019-02-23  9:28   ` Leon Romanovsky
2019-02-26 17:15     ` Steve Wise
2019-02-21 16:19 ` [PATCH v1 iproute2-next 3/4] rdma: add 'link add/delete' commands Steve Wise
2019-02-21 23:14   ` Jason Gunthorpe
2019-02-23  9:43   ` Leon Romanovsky
2019-02-26 17:24     ` Steve Wise
2019-02-27 21:18       ` Steve Wise
2019-02-27 21:18         ` Steve Wise
2019-02-21 16:19 ` [PATCH v1 iproute2-next 4/4] rdma: man page update for link add/delete Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='007901d4d294$62d5d280$28817780$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    --cc=dsahern@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.