linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Stephen Bates <stephen.bates@microsemi.com>
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org,
	linux-rdma@vger.kernel.org, linux-block@vger.kernel.org,
	linux-mm@kvack.org, dan.j.williams@intel.com,
	ross.zwisler@linux.intel.com, willy@linux.intel.com,
	jgunthorpe@obsidianresearch.com, haggaie@mellanox.com,
	hch@infradead.org, axboe@fb.com, corbet@lwn.net,
	jim.macdonald@everspin.com, sbates@raithin.com,
	logang@deltatee.com
Subject: Re: [PATCH 2/3] iopmem : Add a block device driver for PCIe attached IO memory.
Date: Thu, 27 Oct 2016 23:45:56 -0700	[thread overview]
Message-ID: <20161028064556.GA3231@infradead.org> (raw)
In-Reply-To: <1476826937-20665-3-git-send-email-sbates@raithlin.com>

> Signed-off-by: Stephen Bates <sbates@raithlin.com>

FYI, that address has bounced throught the whole thread for me,
replacing it with a known good one for now.


> + * This driver is heavily based on drivers/block/pmem.c.
> + * Copyright (c) 2014, Intel Corporation.
> + * Copyright (C) 2007 Nick Piggin
> + * Copyright (C) 2007 Novell Inc.

Is there anything left of it actually?  I didn't spot anything
obvious.  Nevermind that we don't have a file with that name anymore :)

> +  /*
> +   * We can only access the iopmem device with full 32-bit word
> +   * accesses which cannot be gaurantee'd by the regular memcpy
> +   */

Odd comment formatting. 

> +static void memcpy_from_iopmem(void *dst, const void *src, size_t sz)
> +{
> +	u64 *wdst = dst;
> +	const u64 *wsrc = src;
> +	u64 tmp;
> +
> +	while (sz >= sizeof(*wdst)) {
> +		*wdst++ = *wsrc++;
> +		sz -= sizeof(*wdst);
> +	}
> +
> +	if (!sz)
> +		return;
> +
> +	tmp = *wsrc;
> +	memcpy(wdst, &tmp, sz);
> +}

And then we dod a memcpy here anyway.  And no volatile whatsover, so
the compiler could do anything to it.  I defintively feel a bit uneasy
about having this in the driver as well.  Can we define the exact
semantics for this and define it by the system, possibly in an arch
specific way?

> +static void iopmem_do_bvec(struct iopmem_device *iopmem, struct page *page,
> +			   unsigned int len, unsigned int off, bool is_write,
> +			   sector_t sector)
> +{
> +	phys_addr_t iopmem_off = sector * 512;
> +	void *iopmem_addr = iopmem->virt_addr + iopmem_off;
> +
> +	if (!is_write) {
> +		read_iopmem(page, off, iopmem_addr, len);
> +		flush_dcache_page(page);
> +	} else {
> +		flush_dcache_page(page);
> +		write_iopmem(iopmem_addr, page, off, len);
> +	}

How about moving the  address and offset calculation as well as the
cache flushing into read_iopmem/write_iopmem and removing this function?

> +static blk_qc_t iopmem_make_request(struct request_queue *q, struct bio *bio)
> +{
> +	struct iopmem_device *iopmem = q->queuedata;
> +	struct bio_vec bvec;
> +	struct bvec_iter iter;
> +
> +	bio_for_each_segment(bvec, bio, iter) {
> +		iopmem_do_bvec(iopmem, bvec.bv_page, bvec.bv_len,
> +			    bvec.bv_offset, op_is_write(bio_op(bio)),
> +			    iter.bi_sector);

op_is_write just checks the data direction.  I'd feel much more
comfortable with a switch on the op, e.g.

	switch (bio_op(bio))) {
	case REQ_OP_READ:
		bio_for_each_segment(bvec, bio, iter)
			read_iopmem(iopmem, bvec, iter.bi_sector);
		break;
	case REQ_OP_READ:
		bio_for_each_segment(bvec, bio, iter)
			write_iopmem(iopmem, bvec, iter.bi_sector);
	defualt:
		WARN_ON_ONCE(1);
		bio->bi_error = -EIO;
		break;
	}
			

> +static long iopmem_direct_access(struct block_device *bdev, sector_t sector,
> +			       void **kaddr, pfn_t *pfn, long size)
> +{
> +	struct iopmem_device *iopmem = bdev->bd_queue->queuedata;
> +	resource_size_t offset = sector * 512;
> +
> +	if (!iopmem)
> +		return -ENODEV;

I don't think this can ever happen, can it?

> +static DEFINE_IDA(iopmem_instance_ida);
> +static DEFINE_SPINLOCK(ida_lock);
> +
> +static int iopmem_set_instance(struct iopmem_device *iopmem)
> +{
> +	int instance, error;
> +
> +	do {
> +		if (!ida_pre_get(&iopmem_instance_ida, GFP_KERNEL))
> +			return -ENODEV;
> +
> +		spin_lock(&ida_lock);
> +		error = ida_get_new(&iopmem_instance_ida, &instance);
> +		spin_unlock(&ida_lock);
> +
> +	} while (error == -EAGAIN);
> +
> +	if (error)
> +		return -ENODEV;
> +
> +	iopmem->instance = instance;
> +	return 0;
> +}
> +
> +static void iopmem_release_instance(struct iopmem_device *iopmem)
> +{
> +	spin_lock(&ida_lock);
> +	ida_remove(&iopmem_instance_ida, iopmem->instance);
> +	spin_unlock(&ida_lock);
> +}
> +

Just use ida_simple_get/ida_simple_remove instead to take care
of the locking and preloading, and get rid of these two functions.


> +static int iopmem_attach_disk(struct iopmem_device *iopmem)
> +{
> +	struct gendisk *disk;
> +	int nid = dev_to_node(iopmem->dev);
> +	struct request_queue *q = iopmem->queue;
> +
> +	blk_queue_write_cache(q, true, true);

You don't handle flush commands or the fua bit in make_request, so
this setting seems wrong.

> +	int err = 0;
> +	int nid = dev_to_node(&pdev->dev);
> +
> +	if (pci_enable_device_mem(pdev) < 0) {

propagate the actual error code, please.

  reply	other threads:[~2016-10-28  6:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-18 21:42 [PATCH 0/3] iopmem : A block device for PCIe memory Stephen Bates
2016-10-18 21:42 ` [PATCH 1/3] memremap.c : Add support for ZONE_DEVICE IO memory with struct pages Stephen Bates
2016-10-19 17:50   ` Dan Williams
2016-10-19 18:40     ` Stephen Bates
2016-10-19 20:01       ` Dan Williams
2016-10-25 11:54         ` Stephen Bates
2016-10-18 21:42 ` [PATCH 2/3] iopmem : Add a block device driver for PCIe attached IO memory Stephen Bates
2016-10-28  6:45   ` Christoph Hellwig [this message]
2016-10-28 19:22     ` Logan Gunthorpe
2016-10-18 21:42 ` [PATCH 3/3] iopmem : Add documentation for iopmem driver Stephen Bates
2016-10-28  6:46   ` Christoph Hellwig
2016-10-19  3:51 ` [PATCH 0/3] iopmem : A block device for PCIe memory Dan Williams
2016-10-19 18:48   ` Stephen Bates
2016-10-19 19:58     ` Dan Williams
2016-10-19 22:54       ` Stephen Bates
2016-10-20 23:22     ` Dave Chinner
2016-10-21  9:57       ` Christoph Hellwig
2016-10-21 11:12         ` Dave Chinner
2016-10-25 11:50           ` Stephen Bates
2016-10-25 21:19             ` Dave Chinner
2016-11-06 14:05               ` Stephen Bates
2016-10-27 10:22         ` Sagi Grimberg
2016-10-27 12:32           ` Christoph Hellwig
2016-10-26  8:24   ` Haggai Eran
2016-10-26 13:39     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161028064556.GA3231@infradead.org \
    --to=hch@infradead.org \
    --cc=axboe@fb.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=haggaie@mellanox.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=jim.macdonald@everspin.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=sbates@raithin.com \
    --cc=stephen.bates@microsemi.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).