All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Jack Wang <jinpuwang@gmail.com>,
	linux-block@vger.kernel.org, linux-rdma@vger.kernel.org
Cc: axboe@kernel.dk, hch@infradead.org, sagi@grimberg.me,
	leon@kernel.org, dledford@redhat.com, jgg@ziepe.ca,
	danil.kipnis@cloud.ionos.com, jinpu.wang@cloud.ionos.com,
	rpenyaev@suse.de, pankaj.gupta@cloud.ionos.com
Subject: Re: [PATCH v9 06/25] RDMA/rtrs: client: main functionality
Date: Sat, 29 Feb 2020 17:33:41 -0800	[thread overview]
Message-ID: <c3c8fbcc-0028-9b23-8eff-3b5f1f60e652@acm.org> (raw)
In-Reply-To: <20200221104721.350-7-jinpuwang@gmail.com>

On 2020-02-21 02:47, Jack Wang wrote:
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sess, &clt->paths_list, s.entry)
> +		connected |= (READ_ONCE(sess->state) == RTRS_CLT_CONNECTED);
> +	rcu_read_unlock();

Are the parentheses around the comparison necessary?

> +static struct rtrs_permit *
> +__rtrs_get_permit(struct rtrs_clt *clt, enum rtrs_clt_con_type con_type)
> +{
> +	size_t max_depth = clt->queue_depth;
> +	struct rtrs_permit *permit;
> +	int cpu, bit;
> +
> +	/* Combined with cq_vector, we pin the IO to the the cpu it comes */

This comment is confusing. Please clarify this comment. All I see below
is that preemption is disabled. I don't see pinning of I/O to the CPU of
the caller.

> +	cpu = get_cpu();
> +	do {
> +		bit = find_first_zero_bit(clt->permits_map, max_depth);
> +		if (unlikely(bit >= max_depth)) {
> +			put_cpu();
> +			return NULL;
> +		}
> +
> +	} while (unlikely(test_and_set_bit_lock(bit, clt->permits_map)));
> +	put_cpu();
> +
> +	permit = GET_PERMIT(clt, bit);
> +	WARN_ON(permit->mem_id != bit);
> +	permit->cpu_id = cpu;
> +	permit->con_type = con_type;
> +
> +	return permit;
> +}

Please remove the blank line before "} while (...)".

> +void rtrs_clt_put_permit(struct rtrs_clt *clt, struct rtrs_permit *permit)
> +{
> +	if (WARN_ON(!test_bit(permit->mem_id, clt->permits_map)))
> +		return;
> +
> +	__rtrs_put_permit(clt, permit);
> +
> +	/*
> +	 * Putting a permit is a barrier, so we will observe
> +	 * new entry in the wait list, no worries.
> +	 */
> +	if (waitqueue_active(&clt->permits_wait))
> +		wake_up(&clt->permits_wait);
> +}
> +EXPORT_SYMBOL(rtrs_clt_put_permit);

The comment in the above function is not only confusing but it also
fails to explain why it is safe to guard wake_up() with a
waitqueue_active() check. How about changing that comment into the
following:

rtrs_clt_get_permit() adds itself to the &clt->permits_wait list before
calling schedule(). So if rtrs_clt_get_permit() is sleeping it must have
added itself to &clt->permits_wait before __rtrs_put_permit() finished.
Hence it is safe to guard wake_up() with a waitqueue_active() test.

> +static int rtrs_post_send_rdma(struct rtrs_clt_con *con,
> +				struct rtrs_clt_io_req *req,
> +				struct rtrs_rbuf *rbuf, u32 off,
> +				u32 imm, struct ib_send_wr *wr)
> +{
> +	struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess);
> +	enum ib_send_flags flags;
> +	struct ib_sge sge;
> +
> +	if (unlikely(!req->sg_size)) {
> +		rtrs_wrn(con->c.sess,
> +			 "Doing RDMA Write failed, no data supplied\n");
> +		return -EINVAL;
> +	}
> +
> +	/* user data and user message in the first list element */
> +	sge.addr   = req->iu->dma_addr;
> +	sge.length = req->sg_size;
> +	sge.lkey   = sess->s.dev->ib_pd->local_dma_lkey;
> +
> +	/*
> +	 * From time to time we have to post signalled sends,
> +	 * or send queue will fill up and only QP reset can help.
> +	 */
> +	flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
> +			0 : IB_SEND_SIGNALED;
> +
> +	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
> +				      req->sg_size, DMA_TO_DEVICE);
> +
> +	return rtrs_iu_post_rdma_write_imm(&con->c, req->iu, &sge, 1,
> +					    rbuf->rkey, rbuf->addr + off,
> +					    imm, flags, wr);
> +}

I don't think that posting a signalled send from time to time is
sufficient to prevent send queue overflow. Please address Jason's
comment from January 7th: "Not quite. If the SQ depth is 16 and you post
16 things and then signal the last one, you *cannot* post new work until
you see the completion. More SQ space *ONLY* becomes available upon
receipt of a completion. This is why you can't have an unsignaled SQ."

See also https://lore.kernel.org/linux-rdma/20200107182528.GB26174@ziepe.ca/

> +static void rtrs_clt_init_req(struct rtrs_clt_io_req *req,
> +				     struct rtrs_clt_sess *sess,
> +				     rtrs_conf_fn *conf,
> +				     struct rtrs_permit *permit, void *priv,
> +				     const struct kvec *vec, size_t usr_len,
> +				     struct scatterlist *sg, size_t sg_cnt,
> +				     size_t data_len, int dir)
> +{
> +	struct iov_iter iter;
> +	size_t len;
> +
> +	req->permit = permit;
> +	req->in_use = true;
> +	req->usr_len = usr_len;
> +	req->data_len = data_len;
> +	req->sglist = sg;
> +	req->sg_cnt = sg_cnt;
> +	req->priv = priv;
> +	req->dir = dir;
> +	req->con = rtrs_permit_to_clt_con(sess, permit);
> +	req->conf = conf;
> +	req->need_inv = false;
> +	req->need_inv_comp = false;
> +	req->inv_errno = 0;
> +
> +	iov_iter_kvec(&iter, READ, vec, 1, usr_len);
> +	len = _copy_from_iter(req->iu->buf, usr_len, &iter);
> +	WARN_ON(len != usr_len);
> +
> +	reinit_completion(&req->inv_comp);
> +}

It is hard to verify whether the above function initializes all fields
of 'req' or not. Consider changing the 'req' member assignments into
something like *req = (struct rtrs_clt_io_req){ .permit = permit, ... };

> +static int rtrs_post_rdma_write_sg(struct rtrs_clt_con *con,
> +				    struct rtrs_clt_io_req *req,
> +				    struct rtrs_rbuf *rbuf,
> +				    u32 size, u32 imm)
> +{
> +	struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess);
> +	struct ib_sge *sge = req->sge;
> +	enum ib_send_flags flags;
> +	struct scatterlist *sg;
> +	size_t num_sge;
> +	int i;
> +
> +	for_each_sg(req->sglist, sg, req->sg_cnt, i) {
> +		sge[i].addr   = sg_dma_address(sg);
> +		sge[i].length = sg_dma_len(sg);
> +		sge[i].lkey   = sess->s.dev->ib_pd->local_dma_lkey;
> +	}
> +	sge[i].addr   = req->iu->dma_addr;
> +	sge[i].length = size;
> +	sge[i].lkey   = sess->s.dev->ib_pd->local_dma_lkey;
> +
> +	num_sge = 1 + req->sg_cnt;
> +
> +	/*
> +	 * From time to time we have to post signalled sends,
> +	 * or send queue will fill up and only QP reset can help.
> +	 */
> +	flags = atomic_inc_return(&con->io_cnt) % sess->queue_depth ?
> +			0 : IB_SEND_SIGNALED;
> +
> +	ib_dma_sync_single_for_device(sess->s.dev->ib_dev, req->iu->dma_addr,
> +				      size, DMA_TO_DEVICE);
> +
> +	return rtrs_iu_post_rdma_write_imm(&con->c, req->iu, sge, num_sge,
> +					    rbuf->rkey, rbuf->addr, imm,
> +					    flags, NULL);
> +}

Same comment here. Posting a signalled send from time to time is not
sufficient to prevent send queue overflow.

> +		memset(&rwr, 0, sizeof(rwr));
> +		rwr.wr.next = NULL;
> +		rwr.wr.opcode = IB_WR_REG_MR;
> +		rwr.wr.wr_cqe = &fast_reg_cqe;
> +		rwr.wr.num_sge = 0;
> +		rwr.mr = req->mr;
> +		rwr.key = req->mr->rkey;
> +		rwr.access = (IB_ACCESS_LOCAL_WRITE |
> +			      IB_ACCESS_REMOTE_WRITE);

How about changing the above code into rwr = (struct ib_reg_wr){.wr = {
... }, ... };?

> +static int rtrs_rdma_route_resolved(struct rtrs_clt_con *con)
> +{
> +	struct rtrs_clt_sess *sess = to_clt_sess(con->c.sess);
> +	struct rtrs_clt *clt = sess->clt;
> +	struct rtrs_msg_conn_req msg;
> +	struct rdma_conn_param param;
> +
> +	int err;
> +
> +	memset(&param, 0, sizeof(param));
> +	param.retry_count = 7;
> +	param.rnr_retry_count = 7;
> +	param.private_data = &msg;
> +	param.private_data_len = sizeof(msg);
> +
> +	/*
> +	 * Those two are the part of struct cma_hdr which is shared
> +	 * with private_data in case of AF_IB, so put zeroes to avoid
> +	 * wrong validation inside cma.c on receiver side.
> +	 */
> +	msg.__cma_version = 0;
> +	msg.__ip_version = 0;
> +	msg.magic = cpu_to_le16(RTRS_MAGIC);
> +	msg.version = cpu_to_le16(RTRS_PROTO_VER);
> +	msg.cid = cpu_to_le16(con->c.cid);
> +	msg.cid_num = cpu_to_le16(sess->s.con_num);
> +	msg.recon_cnt = cpu_to_le16(sess->s.recon_cnt);
> +	uuid_copy(&msg.sess_uuid, &sess->s.uuid);
> +	uuid_copy(&msg.paths_uuid, &clt->paths_uuid);
> +
> +	err = rdma_connect(con->c.cm_id, &param);
> +	if (err)
> +		rtrs_err(clt, "rdma_connect(): %d\n", err);
> +
> +	return err;
> +}

Please use structure assignment instead of memset() followed by a series
of member assignments.

> +static inline bool xchg_sessions(struct rtrs_clt_sess __rcu **rcu_ppcpu_path,
> +				 struct rtrs_clt_sess *sess,
> +				 struct rtrs_clt_sess *next)
> +{
> +	struct rtrs_clt_sess **ppcpu_path;
> +
> +	/* Call cmpxchg() without sparse warnings */
> +	ppcpu_path = (typeof(ppcpu_path))rcu_ppcpu_path;
> +	return (sess == cmpxchg(ppcpu_path, sess, next));
> +}

Did checkpatch report for the above code that "return is not a function"?

> +static void rtrs_clt_add_path_to_arr(struct rtrs_clt_sess *sess,
> +				      struct rtrs_addr *addr)
> +{
> +	struct rtrs_clt *clt = sess->clt;
> +
> +	mutex_lock(&clt->paths_mutex);
> +	clt->paths_num++;
> +
> +	/*
> +	 * Firstly increase paths_num, wait for GP and then
> +	 * add path to the list.  Why?  Since we add path with
> +	 * !CONNECTED state explanation is similar to what has
> +	 * been written in rtrs_clt_remove_path_from_arr().
> +	 */
> +	synchronize_rcu();
> +
> +	list_add_tail_rcu(&sess->s.entry, &clt->paths_list);
> +	mutex_unlock(&clt->paths_mutex);
> +}

At least in the Linux kernel keeping track of the number of elements in
a list is considered bad practice. What prevents removal of the
'paths_num' counter?

> +static void rtrs_clt_dev_release(struct device *dev)
> +{
> +	struct rtrs_clt *clt  = container_of(dev, struct rtrs_clt, dev);
> +
> +	kfree(clt);
> +}

Please surround the assignment operator with only a single space at each
side.

> +int rtrs_clt_remove_path_from_sysfs(struct rtrs_clt_sess *sess,
> +				     const struct attribute *sysfs_self)
> +{
> +	struct rtrs_clt *clt = sess->clt;
> +	enum rtrs_clt_state old_state;
> +	bool changed;
> +
> +	/*
> +	 * That can happen only when userspace tries to remove path
> +	 * very early, when rtrs_clt_open() is not yet finished.
> +	 */
> +	if (!clt->opened)
> +		return -EBUSY;
> +
> +	/*
> +	 * Continue stopping path till state was changed to DEAD or
> +	 * state was observed as DEAD:
> +	 * 1. State was changed to DEAD - we were fast and nobody
> +	 *    invoked rtrs_clt_reconnect(), which can again start
> +	 *    reconnecting.
> +	 * 2. State was observed as DEAD - we have someone in parallel
> +	 *    removing the path.
> +	 */
> +	do {
> +		rtrs_clt_close_conns(sess, true);
> +	} while (!(changed = rtrs_clt_change_state_get_old(sess,
> +							    RTRS_CLT_DEAD,
> +							    &old_state)) &&
> +		   old_state != RTRS_CLT_DEAD);

Did checkpatch ask not to use an assignment in the while-loop condition?

Thanks,

Bart.

  reply	other threads:[~2020-03-01  1:33 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 10:46 [PATCH v9 00/25] RTRS (former IBTRS) RDMA Transport Library and RNBD (former IBNBD) RDMA Network Block Device Jack Wang
2020-02-21 10:46 ` [PATCH v9 01/25] sysfs: export sysfs_remove_file_self() Jack Wang
2020-03-01  0:24   ` Bart Van Assche
2020-03-02 14:37     ` Jinpu Wang
2020-02-21 10:46 ` [PATCH v9 02/25] RDMA/rtrs: public interface header to establish RDMA connections Jack Wang
2020-03-01  0:31   ` Bart Van Assche
2020-03-02  8:39     ` Jinpu Wang
2020-03-03  9:40   ` Leon Romanovsky
2020-03-03 14:05     ` Jinpu Wang
2020-03-03 14:16       ` Leon Romanovsky
2020-03-03 14:23         ` Jinpu Wang
2020-02-21 10:46 ` [PATCH v9 03/25] RDMA/rtrs: private headers with rtrs protocol structs and helpers Jack Wang
2020-03-01  0:37   ` Bart Van Assche
2020-03-02  9:21     ` Jinpu Wang
2020-03-03  9:45   ` Leon Romanovsky
2020-03-03 13:52     ` Jinpu Wang
2020-03-03 14:05       ` Jason Gunthorpe
2020-03-03 16:13         ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 04/25] RDMA/rtrs: core: lib functions shared between client and server modules Jack Wang
2020-03-01  0:47   ` Bart Van Assche
2020-03-02  8:40     ` Jinpu Wang
2020-03-03  9:57   ` Leon Romanovsky
2020-03-04 11:21     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 05/25] RDMA/rtrs: client: private header with client structs and functions Jack Wang
2020-03-01  0:51   ` Bart Van Assche
2020-03-02 13:49     ` Jinpu Wang
2020-03-02 16:13       ` Bart Van Assche
2020-03-02 16:18         ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 06/25] RDMA/rtrs: client: main functionality Jack Wang
2020-03-01  1:33   ` Bart Van Assche [this message]
2020-03-02 13:20     ` Danil Kipnis
2020-03-03 16:04       ` Bart Van Assche
2020-03-04 16:43         ` Jinpu Wang
2020-03-04 16:49           ` Jason Gunthorpe
2020-03-05 11:26             ` Jinpu Wang
2020-03-05 13:27               ` Jason Gunthorpe
2020-03-05 13:37                 ` Jinpu Wang
2020-03-05 13:54                   ` Jason Gunthorpe
2020-03-06 13:33                     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 07/25] RDMA/rtrs: client: statistics functions Jack Wang
2020-03-03 11:28   ` Leon Romanovsky
2020-03-03 11:46     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 08/25] RDMA/rtrs: client: sysfs interface functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 09/25] RDMA/rtrs: server: private header with server structs and functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 10/25] RDMA/rtrs: server: main functionality Jack Wang
2020-03-01  1:42   ` Bart Van Assche
2020-03-02 14:39     ` Jinpu Wang
2020-03-03 11:37   ` Leon Romanovsky
2020-03-03 16:41     ` Jinpu Wang
2020-03-03 16:59       ` Leon Romanovsky
2020-03-04 11:03         ` Jinpu Wang
2020-03-05  8:00           ` Leon Romanovsky
     [not found]             ` <CAHg0Huyc=pn1=WSKGLjm+c8AcchyQ8q7JS-0ToQyiBRgpGG=jA@mail.gmail.com>
2020-03-05 12:16               ` Leon Romanovsky
2020-03-05 12:28                 ` Jinpu Wang
2020-03-05 12:35                   ` Leon Romanovsky
2020-03-05 13:02                     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 11/25] RDMA/rtrs: server: statistics functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 12/25] RDMA/rtrs: server: sysfs interface functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 13/25] RDMA/rtrs: include client and server modules into kernel compilation Jack Wang
2020-02-21 10:47 ` [PATCH v9 14/25] RDMA/rtrs: a bit of documentation Jack Wang
2020-02-21 10:47 ` [PATCH v9 15/25] block/rnbd: private headers with rnbd protocol structs and helpers Jack Wang
2020-03-01  2:12   ` Bart Van Assche
2020-03-02 16:37     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 16/25] block/rnbd: client: private header with client structs and functions Jack Wang
2020-03-01  2:26   ` Bart Van Assche
2020-03-02 14:59     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 17/25] block/rnbd: client: main functionality Jack Wang
2020-03-01  2:46   ` Bart Van Assche
2020-03-02 14:58     ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 18/25] block/rnbd: client: sysfs interface functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 19/25] block/rnbd: server: private header with server structs and functions Jack Wang
2020-03-01  2:47   ` Bart Van Assche
2020-03-02 10:07     ` Danil Kipnis
2020-02-21 10:47 ` [PATCH v9 20/25] block/rnbd: server: main functionality Jack Wang
2020-03-01  2:58   ` Bart Van Assche
2020-03-02  9:58     ` Danil Kipnis
2020-03-03  5:57       ` Bart Van Assche
2020-02-21 10:47 ` [PATCH v9 21/25] block/rnbd: server: functionality for IO submission to file or block dev Jack Wang
2020-03-01  3:09   ` Bart Van Assche
2020-03-02 10:06     ` Danil Kipnis
2020-03-03 16:20       ` Jinpu Wang
2020-03-03 16:28         ` Bart Van Assche
2020-03-03 16:43           ` Jinpu Wang
2020-02-21 10:47 ` [PATCH v9 22/25] block/rnbd: server: sysfs interface functions Jack Wang
2020-02-21 10:47 ` [PATCH v9 23/25] block/rnbd: include client and server modules into kernel compilation Jack Wang
2020-02-21 10:47 ` [PATCH v9 24/25] block/rnbd: a bit of documentation Jack Wang
2020-02-21 10:47 ` [PATCH v9 25/25] MAINTAINERS: Add maintainers for RNBD/RTRS modules Jack Wang
2020-03-03  9:28 ` [PATCH v9 00/25] RTRS (former IBTRS) RDMA Transport Library and RNBD (former IBNBD) RDMA Network Block Device Leon Romanovsky
2020-03-04 14:06   ` Jinpu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3c8fbcc-0028-9b23-8eff-3b5f1f60e652@acm.org \
    --to=bvanassche@acm.org \
    --cc=axboe@kernel.dk \
    --cc=danil.kipnis@cloud.ionos.com \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=jgg@ziepe.ca \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=jinpuwang@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=pankaj.gupta@cloud.ionos.com \
    --cc=rpenyaev@suse.de \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.