All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
@ 2016-04-04  3:38 Al Viro
  2016-04-04  6:52 ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-04-04  3:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block

	SG_DXFER_TO_FROM_DEV is weird everywhere, but skd manages to get it
even stranger than usual: copying to/from userland happens in
skd_sg_io_copy_buffer(), which is called twice - once before the actual
talking to device and once after.  The first call either does nothing or
copies from userland, the second one either does nothing or copies to
userland.  So far, so good, but the logics in that sucker deciding whether
to bugger off or not is wrong:
        if (dxfer_dir != sksgio->sg.dxfer_direction) {
                if (dxfer_dir != SG_DXFER_TO_DEV ||
                    sksgio->sg.dxfer_direction != SG_DXFER_TO_FROM_DEV)
                        return 0;
        }

The first call is with dxfer_dir equal to SG_DXFER_TO_DEV, the second -
SG_DXFER_FROM_DEV.  So we get

SG_DXFER_NONE:		nothing		nothing
SG_DXFER_FROM_DEV:	nothing		out
SG_DXFER_TO_DEV:	in		nothing
SG_DXFER_TO_FROM_DEV:	in		nothing

I've no idea if anything is still using SG_DXFER_TO_FROM_DEV, but this
behaviour AFAICS doesn't match that of write() on /dev/sg* (both
in and out are done) or normal SG_IO (either both in and out, in case if it
hits bio_map_user_iov(), or only out if it hits bio_copy_user_iov()).  In
all cases the out part is done.  Here it is skipped.

Not sure who (if anybody) maintains it these days, but that behaviour looks
wrong...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04  3:38 [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*) Al Viro
@ 2016-04-04  6:52 ` Christoph Hellwig
  2016-04-04 17:16   ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-04  6:52 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-kernel, linux-block

On Mon, Apr 04, 2016 at 04:38:45AM +0100, Al Viro wrote:
> I've no idea if anything is still using SG_DXFER_TO_FROM_DEV, but this
> behaviour AFAICS doesn't match that of write() on /dev/sg* (both
> in and out are done) or normal SG_IO (either both in and out, in case if it
> hits bio_map_user_iov(), or only out if it hits bio_copy_user_iov()).  In
> all cases the out part is done.  Here it is skipped.
> 
> Not sure who (if anybody) maintains it these days, but that behaviour looks
> wrong...

The right fix is to kill the duplicate SG_IO implementation and use
the block layer one.  The driver actually is a pretty straight SCSI
implementation, so making it a block driver has been a mistake from the
start.  I'll see if I can maybe get hold of hardware for the driver - it
seems pretty much unmaintained unfortunately.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04  6:52 ` Christoph Hellwig
@ 2016-04-04 17:16   ` Al Viro
  2016-04-04 18:47     ` Al Viro
  2016-04-07 15:53     ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2016-04-04 17:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-block

On Sun, Apr 03, 2016 at 11:52:20PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 04, 2016 at 04:38:45AM +0100, Al Viro wrote:
> > I've no idea if anything is still using SG_DXFER_TO_FROM_DEV, but this
> > behaviour AFAICS doesn't match that of write() on /dev/sg* (both
> > in and out are done) or normal SG_IO (either both in and out, in case if it
> > hits bio_map_user_iov(), or only out if it hits bio_copy_user_iov()).  In
> > all cases the out part is done.  Here it is skipped.
> > 
> > Not sure who (if anybody) maintains it these days, but that behaviour looks
> > wrong...
> 
> The right fix is to kill the duplicate SG_IO implementation and use
> the block layer one.  The driver actually is a pretty straight SCSI
> implementation, so making it a block driver has been a mistake from the
> start.  I'll see if I can maybe get hold of hardware for the driver - it
> seems pretty much unmaintained unfortunately.

OK...  FWIW, that's the last remaining user of iov_shorten(); I have a patch
getting rid of it (preserving the behaviour of that driver), but behaviour
appears to be bogus...

Another fun question: should the normal sg_io() copy the buffer in on
SG_DXFER_TO_FROM_DEV?  Right now it doesn't; in !copy case (when it goes
through bio_map_user_iov()) the effect is achieved simply by doing the
read into the pages user has mapped in that area, but bio_copy_user_iov()
doesn't do it:
        /*
         * success
         */
        if (((iter->type & WRITE) && (!map_data || !map_data->null_mapped)) ||
            (map_data && map_data->from_user)) {
                ret = bio_copy_from_iter(bio, *iter);
                if (ret)
                        goto cleanup;
        }
will see NULL map_data; the ->from_user case is sg_start_req() stuff.  IOW,
SG_IO behaviour for /dev/sg* is different from the generic one...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04 17:16   ` Al Viro
@ 2016-04-04 18:47     ` Al Viro
  2016-04-04 19:50       ` Al Viro
  2016-04-07 15:53     ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-04-04 18:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-block

On Mon, Apr 04, 2016 at 06:16:12PM +0100, Al Viro wrote:

> will see NULL map_data; the ->from_user case is sg_start_req() stuff.  IOW,
> SG_IO behaviour for /dev/sg* is different from the generic one...

While we are at it: in bio_map_user_iov() we have
        iov_for_each(iov, i, *iter) { 
                unsigned long uaddr = (unsigned long) iov.iov_base;
                unsigned long len = iov.iov_len;
                unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
                unsigned long start = uaddr >> PAGE_SHIFT;

                /*
                 * Overflow, abort
                 */
                if (end < start)
                        return ERR_PTR(-EINVAL);

                nr_pages += end - start;
                /*
                 * buffer must be aligned to at least hardsector size for now
                 */
                if (uaddr & queue_dma_alignment(q))
                        return ERR_PTR(-EINVAL);
        }

Do we only care about the iov_base alignment?  IOW, shouldn't we check for
iov_len being a multiple of queue_dma_alignment(q) as well?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04 18:47     ` Al Viro
@ 2016-04-04 19:50       ` Al Viro
  2016-04-04 23:45         ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-04-04 19:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-block

On Mon, Apr 04, 2016 at 07:47:36PM +0100, Al Viro wrote:
> On Mon, Apr 04, 2016 at 06:16:12PM +0100, Al Viro wrote:
> 
> > will see NULL map_data; the ->from_user case is sg_start_req() stuff.  IOW,
> > SG_IO behaviour for /dev/sg* is different from the generic one...
> 
> While we are at it: in bio_map_user_iov() we have
>         iov_for_each(iov, i, *iter) { 
>                 unsigned long uaddr = (unsigned long) iov.iov_base;
>                 unsigned long len = iov.iov_len;
>                 unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
>                 unsigned long start = uaddr >> PAGE_SHIFT;
> 
>                 /*
>                  * Overflow, abort
>                  */
>                 if (end < start)
>                         return ERR_PTR(-EINVAL);
> 
>                 nr_pages += end - start;
>                 /*
>                  * buffer must be aligned to at least hardsector size for now
>                  */
>                 if (uaddr & queue_dma_alignment(q))
>                         return ERR_PTR(-EINVAL);
>         }
> 
> Do we only care about the iov_base alignment?  IOW, shouldn't we check for
> iov_len being a multiple of queue_dma_alignment(q) as well?

What happens if somebody issues SG_IO with 256-segment vector, each segment
1 byte long and page-aligned?  Will the driver really be happy with the
resulting request, as long as it hasn't claimed non-zero queue_virt_boundary?
Because AFAICS we'll get a request with a pile of bvecs, each with
->bv_offset equal to 0 and ->bv_len equal to 1; can that really work?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04 19:50       ` Al Viro
@ 2016-04-04 23:45         ` Al Viro
  2016-04-07 15:55           ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-04-04 23:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-block

On Mon, Apr 04, 2016 at 08:50:42PM +0100, Al Viro wrote:

> What happens if somebody issues SG_IO with 256-segment vector, each segment
> 1 byte long and page-aligned?  Will the driver really be happy with the
> resulting request, as long as it hasn't claimed non-zero queue_virt_boundary?
> Because AFAICS we'll get a request with a pile of bvecs, each with
> ->bv_offset equal to 0 and ->bv_len equal to 1; can that really work?

OK, it really doesn't make sense.  What happened, AFAICS, is that when
blk_rq_map_user_iov() has grown a "misaligned, need to copy" code, the
check had been mishandled - rather than checking both the base and the
length of segments, as blk_rq_map_{user,kern} used to do (and as ..._kern
is still doing) it checked only the base.

Then in "block: use blk_rq_map_user_iov to implement blk_rq_map_user" you've
missed that problem, which got us the current situaiton.  Note that e.g.
PIO case of libata really wants copy in case of 500 bytes + 12 bytes
vector - it'll splat the last 12 bytes adjacent to the end of the first
segment, etc.

AFAICS, what we need there is simply
	nr_pages = iov_iter_npages(iter);
	alignment = iov_iter_alignment(iter);
	if (alignment & (queue_dma_alignment(q) | q->dma_pad_mask))
		copy = true;
and I really wonder if we care about special-casing the situation when the
ends are not aligned to queue_virt_boundary(q).  If we don't, we might as
well add queue_virt_boundary(q) to the mask we are checking.  If we do,
it's not hard to add a variant that would calculate both the alignment and
alignment for internal boundaries...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04 17:16   ` Al Viro
  2016-04-04 18:47     ` Al Viro
@ 2016-04-07 15:53     ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:53 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-block

On Mon, Apr 04, 2016 at 06:16:12PM +0100, Al Viro wrote:
> Another fun question: should the normal sg_io() copy the buffer in on
> SG_DXFER_TO_FROM_DEV?  Right now it doesn't; in !copy case (when it goes
> through bio_map_user_iov()) the effect is achieved simply by doing the
> read into the pages user has mapped in that area, but bio_copy_user_iov()
> doesn't do it:
>         /*
>          * success
>          */
>         if (((iter->type & WRITE) && (!map_data || !map_data->null_mapped)) ||
>             (map_data && map_data->from_user)) {
>                 ret = bio_copy_from_iter(bio, *iter);
>                 if (ret)
>                         goto cleanup;
>         }
> will see NULL map_data; the ->from_user case is sg_start_req() stuff.  IOW,
> SG_IO behaviour for /dev/sg* is different from the generic one...

meh.  I really wish /dev/sg was just using the generic page pool :(

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-04 23:45         ` Al Viro
@ 2016-04-07 15:55           ` Christoph Hellwig
  2016-04-08 19:19             ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Jens Axboe, linux-kernel, linux-block

On Tue, Apr 05, 2016 at 12:45:08AM +0100, Al Viro wrote:
> AFAICS, what we need there is simply
> 	nr_pages = iov_iter_npages(iter);
> 	alignment = iov_iter_alignment(iter);
> 	if (alignment & (queue_dma_alignment(q) | q->dma_pad_mask))
> 		copy = true;
> and I really wonder if we care about special-casing the situation when the
> ends are not aligned to queue_virt_boundary(q).  If we don't, we might as
> well add queue_virt_boundary(q) to the mask we are checking.  If we do,
> it's not hard to add a variant that would calculate both the alignment and
> alignment for internal boundaries...

I suspect this is the right thing to do.  Care to send a patch to Jens?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*)
  2016-04-07 15:55           ` Christoph Hellwig
@ 2016-04-08 19:19             ` Al Viro
  0 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2016-04-08 19:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-kernel, linux-block

On Thu, Apr 07, 2016 at 08:55:33AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 05, 2016 at 12:45:08AM +0100, Al Viro wrote:
> > AFAICS, what we need there is simply
> > 	nr_pages = iov_iter_npages(iter);
> > 	alignment = iov_iter_alignment(iter);
> > 	if (alignment & (queue_dma_alignment(q) | q->dma_pad_mask))
> > 		copy = true;
> > and I really wonder if we care about special-casing the situation when the
> > ends are not aligned to queue_virt_boundary(q).  If we don't, we might as
> > well add queue_virt_boundary(q) to the mask we are checking.  If we do,
> > it's not hard to add a variant that would calculate both the alignment and
> > alignment for internal boundaries...
> 
> I suspect this is the right thing to do.  Care to send a patch to Jens?

OK...  The interesting part is the calling conventions - we could either
return a pair (minimal alignment, minimal alignment of inner boundaries) or
do something like iov_iter_aligned(iter, all, inner).  The thing is,
iov_iter_alignment() has a couple of users that do not fit well into the
latter model, the worst one being in do_blockdev_direct_IO().

OTOH, the cost of walking the array of iovecs the second time is going to
be trivial - it's already in cache, so something like

unsigned long iov_iter_gap_alignment(struct iov_iter *i)
{
	unsigned long res = 0;
	size_t size = i->i_count;
	iterate_all_kinds(i, size, v,
		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
			(size != v.iov_len ? size : 0), 0),
		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
			(size != v.bv_len ? size : 0)),
		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
			(size != v.iov_len ? size : 0))
	);
	return res;
}

in addition to iov_iter_alignment() might be the least painful approach...

I still wonder what would be the legitimate cause for stronger alignment
requirements on the gaps than on the ends, though - could you explain why
virt_boundary_mask is there in the first place?  Do we really have drivers
that can take weakly aligned single segment, but can't take the same
alignment between the segments?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-08 19:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-04  3:38 [RFC] weird semantics of SG_DXFER_TO_FROM_DEV in BLK_DEV_SKD (drivers/block/skd*) Al Viro
2016-04-04  6:52 ` Christoph Hellwig
2016-04-04 17:16   ` Al Viro
2016-04-04 18:47     ` Al Viro
2016-04-04 19:50       ` Al Viro
2016-04-04 23:45         ` Al Viro
2016-04-07 15:55           ` Christoph Hellwig
2016-04-08 19:19             ` Al Viro
2016-04-07 15:53     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.