linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] direct IO support for loop driver
@ 2013-11-18 19:03 Dave Kleikamp
  2013-11-18 19:07 ` Dave Kleikamp
  2013-11-20 21:19 ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Kleikamp @ 2013-11-18 19:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, LKML, linux-fsdevel, Maxim V. Patlasov,
	linux-aio, Kent Overstreet, Jens Axboe

Linus,

Please pull the following patches. They add the infrastructure for
kernel-initiated direct-io, change the loop driver to perform direct-io,
and change swap-over-nfs to use the new infrastructure.

These patches have been under development for a long time and have been
in linux-next since August. I'd have sent the request sooner, but I was
on vacation last week.

There are merge conflicts with several files, so I will follow up with
a merge patch.

The following changes since commit 1e52db6908ca5aa8e035caeaf96e8de2567fd84d:

  Merge tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfio (2013-10-14 18:11:28 -0700)

are available in the git repository at:


  git://github.com/kleikamp/linux-shaggy.git tags/aio-direct-for-3.13

for you to fetch changes up to bb6f7be483180a9674f3d00b8e97e9b29b39d55d:

  tmpfs: add support for read_iter and write_iter (2013-10-18 10:56:54 -0500)

----------------------------------------------------------------
Allow the loop driver to perform direct IO to the underlying file system

----------------------------------------------------------------
Asias He (1):
      block_dev: add support for read_iter, write_iter

Dave Kleikamp (22):
      iov_iter: iov_iter_copy_from_user() should use non-atomic copy
      iov_iter: add __iovec_copy_to_user()
      fuse: convert fuse to use iov_iter_copy_[to|from]_user
      iov_iter: ii_iovec_copy_to_user should pre-fault user pages
      dio: Convert direct_IO to use iov_iter
      dio: add bio_vec support to __blockdev_direct_IO()
      aio: add aio_kernel_() interface
      aio: add aio support for iov_iter arguments
      fs: create file_readable() and file_writable() functions
      fs: use read_iter and write_iter rather than aio_read and aio_write
      fs: add read_iter and write_iter to several file systems
      ocfs2: add support for read_iter and write_iter
      ext4: add support for read_iter and write_iter
      nfs: add support for read_iter, write_iter
      nfs: simplify swap
      btrfs: add support for read_iter and write_iter
      xfs: add support for read_iter and write_iter
      gfs2: Convert aio_read/write ops to read/write_iter
      udf: convert file ops from aio_read/write to read/write_iter
      afs: add support for read_iter and write_iter
      ecrpytfs: Convert aio_read/write ops to read/write_iter
      ubifs: convert file ops from aio_read/write to read/write_iter

Hugh Dickins (1):
      tmpfs: add support for read_iter and write_iter

Zach Brown (9):
      iov_iter: move into its own file
      iov_iter: add copy_to_user support
      iov_iter: hide iovec details behind ops function pointers
      iov_iter: add bvec support
      iov_iter: add a shorten call
      iov_iter: let callers extract iovecs and bio_vecs
      fs: pull iov_iter use higher up the stack
      bio: add bvec_length(), like iov_length()
      loop: use aio to perform io on the underlying file

 Documentation/filesystems/Locking   |   6 +-
 Documentation/filesystems/vfs.txt   |  12 +-
 drivers/block/loop.c                | 158 +++++++++----
 drivers/char/raw.c                  |   4 +-
 drivers/mtd/nand/nandsim.c          |   4 +-
 drivers/usb/gadget/storage_common.c |   4 +-
 fs/9p/vfs_addr.c                    |  12 +-
 fs/9p/vfs_file.c                    |   8 +-
 fs/Makefile                         |   2 +-
 fs/adfs/file.c                      |   4 +-
 fs/affs/file.c                      |   4 +-
 fs/afs/file.c                       |   4 +-
 fs/afs/internal.h                   |   3 +-
 fs/afs/write.c                      |   9 +-
 fs/aio.c                            | 136 ++++++++++-
 fs/bad_inode.c                      |  14 ++
 fs/bfs/file.c                       |   4 +-
 fs/block_dev.c                      |  27 ++-
 fs/btrfs/file.c                     |  42 ++--
 fs/btrfs/inode.c                    |  63 +++---
 fs/ceph/addr.c                      |   3 +-
 fs/cifs/file.c                      |   4 +-
 fs/direct-io.c                      | 223 +++++++++++++------
 fs/ecryptfs/file.c                  |  15 +-
 fs/exofs/file.c                     |   4 +-
 fs/ext2/file.c                      |   4 +-
 fs/ext2/inode.c                     |   8 +-
 fs/ext3/file.c                      |   4 +-
 fs/ext3/inode.c                     |  15 +-
 fs/ext4/ext4.h                      |   3 +-
 fs/ext4/file.c                      |  34 +--
 fs/ext4/indirect.c                  |  16 +-
 fs/ext4/inode.c                     |  23 +-
 fs/f2fs/data.c                      |   4 +-
 fs/f2fs/file.c                      |   4 +-
 fs/fat/file.c                       |   4 +-
 fs/fat/inode.c                      |  10 +-
 fs/fuse/cuse.c                      |  10 +-
 fs/fuse/file.c                      |  90 ++++----
 fs/fuse/fuse_i.h                    |   5 +-
 fs/gfs2/aops.c                      |   7 +-
 fs/gfs2/file.c                      |  21 +-
 fs/hfs/inode.c                      |  11 +-
 fs/hfsplus/inode.c                  |  10 +-
 fs/hostfs/hostfs_kern.c             |   4 +-
 fs/hpfs/file.c                      |   4 +-
 fs/internal.h                       |   4 +
 fs/iov-iter.c                       | 411 ++++++++++++++++++++++++++++++++++
 fs/jffs2/file.c                     |   8 +-
 fs/jfs/file.c                       |   4 +-
 fs/jfs/inode.c                      |   7 +-
 fs/logfs/file.c                     |   4 +-
 fs/minix/file.c                     |   4 +-
 fs/nfs/direct.c                     | 301 ++++++++++++++++---------
 fs/nfs/file.c                       |  33 ++-
 fs/nfs/internal.h                   |   4 +-
 fs/nfs/nfs4file.c                   |   4 +-
 fs/nilfs2/file.c                    |   4 +-
 fs/nilfs2/inode.c                   |   8 +-
 fs/ocfs2/aops.c                     |   8 +-
 fs/ocfs2/aops.h                     |   2 +-
 fs/ocfs2/file.c                     |  55 ++---
 fs/ocfs2/ocfs2_trace.h              |   6 +-
 fs/omfs/file.c                      |   4 +-
 fs/ramfs/file-mmu.c                 |   4 +-
 fs/ramfs/file-nommu.c               |   4 +-
 fs/read_write.c                     |  78 +++++--
 fs/reiserfs/file.c                  |   4 +-
 fs/reiserfs/inode.c                 |   7 +-
 fs/romfs/mmap-nommu.c               |   2 +-
 fs/sysv/file.c                      |   4 +-
 fs/ubifs/file.c                     |  12 +-
 fs/udf/file.c                       |  13 +-
 fs/udf/inode.c                      |  10 +-
 fs/ufs/file.c                       |   4 +-
 fs/xfs/xfs_aops.c                   |  13 +-
 fs/xfs/xfs_file.c                   |  51 ++---
 include/linux/aio.h                 |  25 ++-
 include/linux/bio.h                 |   8 +
 include/linux/blk_types.h           |   2 -
 include/linux/fs.h                  | 165 ++++++++++++--
 include/linux/nfs_fs.h              |  13 +-
 include/uapi/linux/loop.h           |   1 +
 mm/filemap.c                        | 433 ++++++++++++++----------------------
 mm/page_io.c                        |  15 +-
 mm/shmem.c                          |  61 ++---
 86 files changed, 1857 insertions(+), 1003 deletions(-)
 create mode 100644 fs/iov-iter.c

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp
@ 2013-11-18 19:07 ` Dave Kleikamp
  2013-11-20 21:19 ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Kleikamp @ 2013-11-18 19:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, LKML, linux-fsdevel, Maxim V. Patlasov,
	linux-aio, Kent Overstreet, Jens Axboe

Linus,
Here is a merge patch for resolving the conflicts in my git tree.
Of course I could rebase, but I think you prefer I didn't do that.

Thanks,
Shaggy


    Conflicts:
    	drivers/mtd/nand/nandsim.c
    	fs/btrfs/inode.c
	fs/cifs/file.c
    	fs/nfs/direct.c
    	fs/nfs/file.c
    	fs/read_write.c
    	include/linux/blk_types.h
    	mm/filemap.c

diff --cc fs/btrfs/inode.c
index da8d2f6,6feae86..1b83942
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@@ -7234,10 -7233,8 +7247,10 @@@ static ssize_t btrfs_direct_IO(int rw, 
  	 * call btrfs_wait_ordered_range to make absolutely sure that any
  	 * outstanding dirty pages are on disk.
  	 */
- 	count = iov_length(iov, nr_segs);
+ 	count = iov_iter_count(iter);
 -	btrfs_wait_ordered_range(inode, offset, count);
 +	ret = btrfs_wait_ordered_range(inode, offset, count);
 +	if (ret)
 +		return ret;
  
  	if (rw & WRITE) {
  		/*
diff --cc fs/cifs/file.c
index 5a5a872,cf6aedc..931158b
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@@ -3663,27 -3663,6 +3663,26 @@@ void cifs_oplock_break(struct work_stru
  	}
  }
  
 +/*
 + * The presence of cifs_direct_io() in the address space ops vector
 + * allowes open() O_DIRECT flags which would have failed otherwise.
 + *
 + * In the non-cached mode (mount with cache=none), we shunt off direct read and write requests
 + * so this method should never be called.
 + *
 + * Direct IO is not yet supported in the cached mode. 
 + */
 +static ssize_t
- cifs_direct_io(int rw, struct kiocb *iocb, const struct iovec *iov,
-                loff_t pos, unsigned long nr_segs)
++cifs_direct_io(int rw, struct kiocb *iocb, struct iov_iter *iter, loff_t pos)
 +{
 +        /*
 +         * FIXME
 +         * Eventually need to support direct IO for non forcedirectio mounts
 +         */
 +        return -EINVAL;
 +}
 +
 +
  const struct address_space_operations cifs_addr_ops = {
  	.readpage = cifs_readpage,
  	.readpages = cifs_readpages,
diff --cc fs/nfs/direct.c
index d71d66c,239c2fe..87a6475
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@@ -117,26 -118,18 +118,17 @@@ static inline int put_dreq(struct nfs_d
   * @nr_segs: size of iovec array
   *
   * The presence of this routine in the address space ops vector means
-  * the NFS client supports direct I/O. However, for most direct IO, we
-  * shunt off direct read and write requests before the VFS gets them,
-  * so this method is only ever called for swap.
+  * the NFS client supports direct I/O. However, we shunt off direct
+  * read and write requests before the VFS gets them, so this method
+  * should never be called.
   */
- ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs)
+ ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
+ 		      loff_t pos)
  {
- #ifndef CONFIG_NFS_SWAP
 -	dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n",
 -			iocb->ki_filp->f_path.dentry->d_name.name,
 -			(long long) pos, iter->nr_segs);
 +	dprintk("NFS: nfs_direct_IO (%pD) off/no(%Ld/%lu) EINVAL\n",
- 			iocb->ki_filp, (long long) pos, nr_segs);
++			iocb->ki_filp, (long long) pos, iter->nr_segs);
  
  	return -EINVAL;
- #else
- 	VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
- 
- 	if (rw == READ || rw == KERNEL_READ)
- 		return nfs_file_direct_read(iocb, iov, nr_segs, pos,
- 				rw == READ ? true : false);
- 	return nfs_file_direct_write(iocb, iov, nr_segs, pos,
- 				rw == WRITE ? true : false);
- #endif /* CONFIG_NFS_SWAP */
  }
  
  static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
@@@ -905,11 -1010,13 +1009,11 @@@ ssize_t nfs_file_direct_read(struct kio
  	struct address_space *mapping = file->f_mapping;
  	size_t count;
  
- 	count = iov_length(iov, nr_segs);
+ 	count = iov_iter_count(iter);
  	nfs_add_stats(mapping->host, NFSIOS_DIRECTREADBYTES, count);
  
 -	dfprintk(FILE, "NFS: direct read(%s/%s, %zd@%Ld)\n",
 -		file->f_path.dentry->d_parent->d_name.name,
 -		file->f_path.dentry->d_name.name,
 -		count, (long long) pos);
 +	dfprintk(FILE, "NFS: direct read(%pD2, %zd@%Ld)\n",
 +		file, count, (long long) pos);
  
  	retval = 0;
  	if (!count)
@@@ -959,11 -1065,13 +1062,11 @@@ ssize_t nfs_file_direct_write(struct ki
  	struct address_space *mapping = file->f_mapping;
  	size_t count;
  
- 	count = iov_length(iov, nr_segs);
+ 	count = iov_iter_count(iter);
  	nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count);
  
 -	dfprintk(FILE, "NFS: direct write(%s/%s, %zd@%Ld)\n",
 -		file->f_path.dentry->d_parent->d_name.name,
 -		file->f_path.dentry->d_name.name,
 -		count, (long long) pos);
 +	dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n",
 +		file, count, (long long) pos);
  
  	retval = generic_write_checks(file, &pos, &count, 0);
  	if (retval)
diff --cc fs/nfs/file.c
index e2fcacf,19ac4fd..e022fe9
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@@ -165,18 -174,18 +165,17 @@@ nfs_file_flush(struct file *file, fl_ow
  EXPORT_SYMBOL_GPL(nfs_file_flush);
  
  ssize_t
- nfs_file_read(struct kiocb *iocb, const struct iovec *iov,
- 		unsigned long nr_segs, loff_t pos)
+ nfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter, loff_t pos)
  {
 -	struct dentry * dentry = iocb->ki_filp->f_path.dentry;
 -	struct inode * inode = dentry->d_inode;
 +	struct inode *inode = file_inode(iocb->ki_filp);
  	ssize_t result;
  
  	if (iocb->ki_filp->f_flags & O_DIRECT)
- 		return nfs_file_direct_read(iocb, iov, nr_segs, pos, true);
+ 		return nfs_file_direct_read(iocb, iter, pos);
  
- 	dprintk("NFS: read(%pD2, %lu@%lu)\n",
 -	dprintk("NFS: read_iter(%s/%s, %lu@%lu)\n",
 -		dentry->d_parent->d_name.name, dentry->d_name.name,
++	dprintk("NFS: read_iter(%pD2, %lu@%lu)\n",
 +		iocb->ki_filp,
- 		(unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);
+ 		(unsigned long) iov_iter_count(iter), (unsigned long) pos);
  
  	result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping);
  	if (!result) {
@@@ -634,24 -655,25 +633,24 @@@ static int nfs_need_sync_write(struct f
  	return 0;
  }
  
- ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
- 		       unsigned long nr_segs, loff_t pos)
+ ssize_t nfs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter,
+ 			    loff_t pos)
  {
 -	struct dentry * dentry = iocb->ki_filp->f_path.dentry;
 -	struct inode * inode = dentry->d_inode;
 +	struct file *file = iocb->ki_filp;
 +	struct inode *inode = file_inode(file);
  	unsigned long written = 0;
  	ssize_t result;
- 	size_t count = iov_length(iov, nr_segs);
+ 	size_t count = iov_iter_count(iter);
  
 -	result = nfs_key_timeout_notify(iocb->ki_filp, inode);
 +	result = nfs_key_timeout_notify(file, inode);
  	if (result)
  		return result;
  
 -	if (iocb->ki_filp->f_flags & O_DIRECT)
 +	if (file->f_flags & O_DIRECT)
- 		return nfs_file_direct_write(iocb, iov, nr_segs, pos, true);
+ 		return nfs_file_direct_write(iocb, iter, pos);
  
- 	dprintk("NFS: write(%pD2, %lu@%Ld)\n",
 -	dprintk("NFS: write_iter(%s/%s, %lu@%lld)\n",
 -		dentry->d_parent->d_name.name, dentry->d_name.name,
 -		(unsigned long) count, (long long) pos);
++	dprintk("NFS: write_iter(%pD2, %lu@%Ld)\n",
 +		file, (unsigned long) count, (long long) pos);
  
  	result = -EBUSY;
  	if (IS_SWAPFILE(inode))
diff --cc include/linux/blk_types.h
index 238ef0e,1bea25f..2c1c8c9
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@@ -176,9 -176,7 +176,8 @@@ enum rq_flag_bits 
  	__REQ_FLUSH_SEQ,	/* request for flush sequence */
  	__REQ_IO_STAT,		/* account I/O stat */
  	__REQ_MIXED_MERGE,	/* merge of different types, fail separately */
- 	__REQ_KERNEL, 		/* direct IO to kernel pages */
  	__REQ_PM,		/* runtime pm request */
 +	__REQ_END,		/* last of chain of requests */
  	__REQ_NR_BITS,		/* stops here */
  };
  
@@@ -207,29 -205,27 +206,28 @@@
  #define REQ_NOMERGE_FLAGS \
  	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
  
 -#define REQ_RAHEAD		(1 << __REQ_RAHEAD)
 -#define REQ_THROTTLED		(1 << __REQ_THROTTLED)
 -
 -#define REQ_SORTED		(1 << __REQ_SORTED)
 -#define REQ_SOFTBARRIER		(1 << __REQ_SOFTBARRIER)
 -#define REQ_FUA			(1 << __REQ_FUA)
 -#define REQ_NOMERGE		(1 << __REQ_NOMERGE)
 -#define REQ_STARTED		(1 << __REQ_STARTED)
 -#define REQ_DONTPREP		(1 << __REQ_DONTPREP)
 -#define REQ_QUEUED		(1 << __REQ_QUEUED)
 -#define REQ_ELVPRIV		(1 << __REQ_ELVPRIV)
 -#define REQ_FAILED		(1 << __REQ_FAILED)
 -#define REQ_QUIET		(1 << __REQ_QUIET)
 -#define REQ_PREEMPT		(1 << __REQ_PREEMPT)
 -#define REQ_ALLOCED		(1 << __REQ_ALLOCED)
 -#define REQ_COPY_USER		(1 << __REQ_COPY_USER)
 -#define REQ_FLUSH		(1 << __REQ_FLUSH)
 -#define REQ_FLUSH_SEQ		(1 << __REQ_FLUSH_SEQ)
 -#define REQ_IO_STAT		(1 << __REQ_IO_STAT)
 -#define REQ_MIXED_MERGE		(1 << __REQ_MIXED_MERGE)
 -#define REQ_SECURE		(1 << __REQ_SECURE)
 -#define REQ_PM			(1 << __REQ_PM)
 +#define REQ_RAHEAD		(1ULL << __REQ_RAHEAD)
 +#define REQ_THROTTLED		(1ULL << __REQ_THROTTLED)
 +
 +#define REQ_SORTED		(1ULL << __REQ_SORTED)
 +#define REQ_SOFTBARRIER		(1ULL << __REQ_SOFTBARRIER)
 +#define REQ_FUA			(1ULL << __REQ_FUA)
 +#define REQ_NOMERGE		(1ULL << __REQ_NOMERGE)
 +#define REQ_STARTED		(1ULL << __REQ_STARTED)
 +#define REQ_DONTPREP		(1ULL << __REQ_DONTPREP)
 +#define REQ_QUEUED		(1ULL << __REQ_QUEUED)
 +#define REQ_ELVPRIV		(1ULL << __REQ_ELVPRIV)
 +#define REQ_FAILED		(1ULL << __REQ_FAILED)
 +#define REQ_QUIET		(1ULL << __REQ_QUIET)
 +#define REQ_PREEMPT		(1ULL << __REQ_PREEMPT)
 +#define REQ_ALLOCED		(1ULL << __REQ_ALLOCED)
 +#define REQ_COPY_USER		(1ULL << __REQ_COPY_USER)
 +#define REQ_FLUSH		(1ULL << __REQ_FLUSH)
 +#define REQ_FLUSH_SEQ		(1ULL << __REQ_FLUSH_SEQ)
 +#define REQ_IO_STAT		(1ULL << __REQ_IO_STAT)
 +#define REQ_MIXED_MERGE		(1ULL << __REQ_MIXED_MERGE)
 +#define REQ_SECURE		(1ULL << __REQ_SECURE)
- #define REQ_KERNEL		(1ULL << __REQ_KERNEL)
 +#define REQ_PM			(1ULL << __REQ_PM)
 +#define REQ_END			(1ULL << __REQ_END)
  
  #endif /* __LINUX_BLK_TYPES_H */
diff --cc mm/filemap.c
index b7749a9,9b0b852..fc78cf2
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@@ -1199,14 -1200,13 +1199,14 @@@ page_ok
  		 * Ok, we have the page, and it's up-to-date, so
  		 * now we can copy it to user space...
  		 *
- 		 * The file_read_actor routine returns how many bytes were
 -		 * The actor routine returns how many bytes were actually used..
++		 * The file_read_iter_actor routine returns how many bytes were
 +		 * actually used..
  		 * NOTE! This may not be the same as how much of a user buffer
  		 * we filled up (we may be padding etc), so we can only update
  		 * "pos" here (the actor routine has to update the user buffer
  		 * pointers and the remaining count).
  		 */
- 		ret = file_read_actor(desc, page, offset, nr);
 -		ret = actor(desc, page, offset, nr);
++		ret = file_read_iter_actor(desc, page, offset, nr);
  		offset += ret;
  		index += offset >> PAGE_CACHE_SHIFT;
  		offset &= ~PAGE_CACHE_MASK;
@@@ -1455,39 -1426,15 +1426,15 @@@ generic_file_read_iter(struct kiocb *io
  		}
  	}
  
- 	count = retval;
- 	for (seg = 0; seg < nr_segs; seg++) {
- 		read_descriptor_t desc;
- 		loff_t offset = 0;
- 
- 		/*
- 		 * If we did a short DIO read we need to skip the section of the
- 		 * iov that we've already read data into.
- 		 */
- 		if (count) {
- 			if (count > iov[seg].iov_len) {
- 				count -= iov[seg].iov_len;
- 				continue;
- 			}
- 			offset = count;
- 			count = 0;
- 		}
- 
- 		desc.written = 0;
- 		desc.arg.buf = iov[seg].iov_base + offset;
- 		desc.count = iov[seg].iov_len - offset;
- 		if (desc.count == 0)
- 			continue;
- 		desc.error = 0;
- 		do_generic_file_read(filp, ppos, &desc);
- 		retval += desc.written;
- 		if (desc.error) {
- 			retval = retval ?: desc.error;
- 			break;
- 		}
- 		if (desc.count > 0)
- 			break;
- 	}
+ 	desc.written = 0;
+ 	desc.arg.data = iter;
+ 	desc.count = count;
+ 	desc.error = 0;
 -	do_generic_file_read(filp, ppos, &desc, file_read_iter_actor);
++	do_generic_file_read(filp, ppos, &desc);
+ 	if (desc.written)
+ 		retval = desc.written;
+ 	else
+ 		retval = desc.error;
  out:
  	return retval;
  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp
  2013-11-18 19:07 ` Dave Kleikamp
@ 2013-11-20 21:19 ` Linus Torvalds
  2013-11-20 21:38   ` Linus Torvalds
  2013-11-21  9:58   ` Christoph Hellwig
  1 sibling, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2013-11-20 21:19 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Christoph Hellwig, LKML, linux-fsdevel, Maxim V. Patlasov,
	linux-aio, Kent Overstreet, Jens Axboe

On Mon, Nov 18, 2013 at 11:03 AM, Dave Kleikamp
<dave.kleikamp@oracle.com> wrote:
> Linus,
>
> Please pull the following patches. They add the infrastructure for
> kernel-initiated direct-io, change the loop driver to perform direct-io,
> and change swap-over-nfs to use the new infrastructure.

Quite frankly, I got maybe ten patches into this series, at which
point I just threw my hands up and said: "This is too ugly to live".

The naming in fs/iov-iter.c is disgusting. :ii_iov_xyz? WTF?

Random "flag" value for marking things atomic? F*ck me, that's ugly.

A separate phase for checking addresses instead of just doing it in
the loop that loops over iovec's? Why? It sure as hell isn't because
it's more efficient, and it doubly sure as hell isn't because it's
prettier.

At that point, I just couldn't take it any more.

I really don't see the point of all this crap. All this for the loop
driver? If so, it had better at least be prettier than it is.

                      Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 21:19 ` Linus Torvalds
@ 2013-11-20 21:38   ` Linus Torvalds
  2013-11-20 21:50     ` Kent Overstreet
                       ` (2 more replies)
  2013-11-21  9:58   ` Christoph Hellwig
  1 sibling, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2013-11-20 21:38 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Christoph Hellwig, LKML, linux-fsdevel, Maxim V. Patlasov,
	linux-aio, Kent Overstreet, Jens Axboe

On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> At that point, I just couldn't take it any more.

Just to clarify, I think it might be fixable. But it does need fixing,
because I really feel dirty from reading it. And I may not care all
that deeply about what random drivers or low-level filesystems do, but
I *do* care about generic code in mm/ and fs/, so making those iovec
functions uglier makes me go all "Hulk angry! Hulk smash" on the code.

The whole "separate out checking from user copy" needs to go away.
There's no excuse for it.

The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap
needs to go away. You can do it either by just duplicating the
function, or by having it use a indirect function for the copy (and
that indirect function acts like copy_from/to_user() and checks the
address range - and you can obviously then also have it be a "copy
from kernel" thing too if you want/need it). And no, you don't then
make it do *both* the conditional *and* the function pointer like you
did in that discusting commit that mixes the two with the struct
iov_iter_ops).

The "__" versions that don't check the user address range needs to die entirely.

The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even
make sense (one of the "i"s is for "iov".

That "unsigned long data" that contains an iovec *? WTF? How did that
ever start making sense?

IOW, there are many many details that just make me absolutely detest
this series. Enough that there's no way in hell I feel comfortable
pulling it. But they are likely fixable.

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 21:38   ` Linus Torvalds
@ 2013-11-20 21:50     ` Kent Overstreet
  2013-11-20 22:46     ` Dave Kleikamp
  2013-11-21  8:53     ` Jon Medhurst (Tixy)
  2 siblings, 0 replies; 13+ messages in thread
From: Kent Overstreet @ 2013-11-20 21:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Kleikamp, Christoph Hellwig, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Jens Axboe

On Wed, Nov 20, 2013 at 01:38:19PM -0800, Linus Torvalds wrote:
> On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > At that point, I just couldn't take it any more.
> 
> Just to clarify, I think it might be fixable. But it does need fixing,
> because I really feel dirty from reading it. And I may not care all
> that deeply about what random drivers or low-level filesystems do, but
> I *do* care about generic code in mm/ and fs/, so making those iovec
> functions uglier makes me go all "Hulk angry! Hulk smash" on the code.
> 
> The whole "separate out checking from user copy" needs to go away.
> There's no excuse for it.
> 
> The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap
> needs to go away. You can do it either by just duplicating the
> function, or by having it use a indirect function for the copy (and
> that indirect function acts like copy_from/to_user() and checks the
> address range - and you can obviously then also have it be a "copy
> from kernel" thing too if you want/need it). And no, you don't then
> make it do *both* the conditional *and* the function pointer like you
> did in that discusting commit that mixes the two with the struct
> iov_iter_ops).
> 
> The "__" versions that don't check the user address range needs to die entirely.
> 
> The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even
> make sense (one of the "i"s is for "iov".
> 
> That "unsigned long data" that contains an iovec *? WTF? How did that
> ever start making sense?
> 
> IOW, there are many many details that just make me absolutely detest
> this series. Enough that there's no way in hell I feel comfortable
> pulling it. But they are likely fixable.

To be honest, I'm skeptical that this approach (adding methods and a bunch of
complexity to iov_iter for backing them with biovecs) is necessary for fixing
the loop driver. 

I've been working on an alternate approach that in the process cleans a bunch of
stuff up - the idea is basically, 1) refactor and massage a bunch of stuff in
the block layer so bios can be created and submitting without caring about the
constraints of the underlying device, and then 2) rewrite the dio code to start
out by pinning pages directly into bios, then query the filesystem to map those
bios wherever - splitting as needed.

What this gets us is a nice clean split where (with some more handwaving; Zach
had some ideas about how to handle some annoying details) we can just submit
bios to some sort of new DIO method - and then the loop driver could just use
that directly.

IMO this would be vastly cleaner; we'd have one data structure - the iov_iter -
for memory that isn't pinned, and another data structure - struct bio - for
iterating over pinned pages. Most of the aforementioned block layer massaging
I've been doing was creating a real iterator for bios (that doesn't modify the
biovecs).

I've also done the DIO rewrite - awhile ago - and it's definitely not ready to
go upstream (various prereqs still aren't in), but it at least shows that the
approach is viable and my rewrite cuts fs/direct-io.c in _half_ while
significantly improving performance:

http://evilpiepirate.org/git/linux-bcache.git/commit/?h=block_stuff&id=caf18e8ec531daea29f61c9aa486443026a343c7

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 21:38   ` Linus Torvalds
  2013-11-20 21:50     ` Kent Overstreet
@ 2013-11-20 22:46     ` Dave Kleikamp
  2013-11-21  4:24       ` Stephen Rothwell
  2013-11-21  8:53     ` Jon Medhurst (Tixy)
  2 siblings, 1 reply; 13+ messages in thread
From: Dave Kleikamp @ 2013-11-20 22:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, LKML, linux-fsdevel, Maxim V. Patlasov,
	linux-aio, Kent Overstreet, Jens Axboe

On 11/20/2013 03:38 PM, Linus Torvalds wrote:
> On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> At that point, I just couldn't take it any more.
> 
> Just to clarify, I think it might be fixable. But it does need fixing,
> because I really feel dirty from reading it. And I may not care all
> that deeply about what random drivers or low-level filesystems do, but
> I *do* care about generic code in mm/ and fs/, so making those iovec
> functions uglier makes me go all "Hulk angry! Hulk smash" on the code.

Okay, fair enough. This patchset started out as a bit of a hack, but it
ended up working well and I had multiple other developers wanting the
function in mainline. I didn't get too much critical feedback, but I
should have been more critical myself.

> The whole "separate out checking from user copy" needs to go away.
> There's no excuse for it.
> 
> The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap
> needs to go away. You can do it either by just duplicating the
> function, or by having it use a indirect function for the copy (and
> that indirect function acts like copy_from/to_user() and checks the
> address range - and you can obviously then also have it be a "copy
> from kernel" thing too if you want/need it). And no, you don't then
> make it do *both* the conditional *and* the function pointer like you
> did in that discusting commit that mixes the two with the struct
> iov_iter_ops).

I'll fix those.

> The "__" versions that don't check the user address range needs to die entirely.
> 
> The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even
> make sense (one of the "i"s is for "iov".

okay

> That "unsigned long data" that contains an iovec *? WTF? How did that
> ever start making sense?

In a later patch, it could also be a biovec. It'd probably be better to
make it a union at that point, or at least add a comment.

> IOW, there are many many details that just make me absolutely detest
> this series. Enough that there's no way in hell I feel comfortable
> pulling it. But they are likely fixable.

Yeah, I'll work on something cleaner. The vital piece is some container
that can either contain an iovec or or an array of kernel pages to be
passed to the filesystems. The iov_iter structure was a reasonable
choice, but the implementation is a bit sloppy. Kent wants to pass the
bio structure itself, while this patchset used the biovec array. As long
as I'm reworking the patchset, I'll see if I can accommodate his
requirements.

Thanks,
Dave

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 22:46     ` Dave Kleikamp
@ 2013-11-21  4:24       ` Stephen Rothwell
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Rothwell @ 2013-11-21  4:24 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Linus Torvalds, Christoph Hellwig, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

Hi all,

On Wed, 20 Nov 2013 16:46:38 -0600 Dave Kleikamp <dave.kleikamp@oracle.com> wrote:
>
> Yeah, I'll work on something cleaner. The vital piece is some container
> that can either contain an iovec or or an array of kernel pages to be
> passed to the filesystems. The iov_iter structure was a reasonable
> choice, but the implementation is a bit sloppy. Kent wants to pass the
> bio structure itself, while this patchset used the biovec array. As long
> as I'm reworking the patchset, I'll see if I can accommodate his
> requirements.

So given this rework, I think I should drop the aio-direct tree from
linux-next for now.  We can add it back (hopefully with a better name)
when the rewrite is in a reasonable state.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 21:38   ` Linus Torvalds
  2013-11-20 21:50     ` Kent Overstreet
  2013-11-20 22:46     ` Dave Kleikamp
@ 2013-11-21  8:53     ` Jon Medhurst (Tixy)
  2 siblings, 0 replies; 13+ messages in thread
From: Jon Medhurst (Tixy) @ 2013-11-21  8:53 UTC (permalink / raw)
  To: LKML

On Wed, 2013-11-20 at 13:38 -0800, Linus Torvalds wrote:
[...]
> And no, you don't then
> make it do *both* the conditional *and* the function pointer like you
> did in that discusting commit that mixes the two with the struct

discusting, adjective
1. Something unpleasant that causes people to cuss.
2. Something unpleasant that provokes a discussion.

Sorry, couldn't resist :-)

-- 
Tixy



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-20 21:19 ` Linus Torvalds
  2013-11-20 21:38   ` Linus Torvalds
@ 2013-11-21  9:58   ` Christoph Hellwig
  2013-11-21 10:06     ` Kent Overstreet
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2013-11-21  9:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Kleikamp, Christoph Hellwig, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe

On Wed, Nov 20, 2013 at 01:19:33PM -0800, Linus Torvalds wrote:
> I really don't see the point of all this crap. All this for the loop
> driver? If so, it had better at least be prettier than it is.

It's for anyone trying to to direct I/O from kernel space.  I have
patches that speed up ecryptfs and allow sane operation for the file
backed scsi target based on this that have been collecting dust for a
while.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-21  9:58   ` Christoph Hellwig
@ 2013-11-21 10:06     ` Kent Overstreet
  2013-11-21 10:11       ` Christoph Hellwig
  2013-11-21 17:34       ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Kent Overstreet @ 2013-11-21 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Jens Axboe

On Thu, Nov 21, 2013 at 01:58:37AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 20, 2013 at 01:19:33PM -0800, Linus Torvalds wrote:
> > I really don't see the point of all this crap. All this for the loop
> > driver? If so, it had better at least be prettier than it is.
> 
> It's for anyone trying to to direct I/O from kernel space.  I have
> patches that speed up ecryptfs and allow sane operation for the file
> backed scsi target based on this that have been collecting dust for a
> while.

Can you post those patches somewhere? If that's the end goal of this
patch series, we should be able to look at them too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-21 10:06     ` Kent Overstreet
@ 2013-11-21 10:11       ` Christoph Hellwig
  2013-11-21 10:13         ` Kent Overstreet
  2013-11-21 17:34       ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2013-11-21 10:11 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Linus Torvalds, Dave Kleikamp, LKML,
	linux-fsdevel, Maxim V. Patlasov, linux-aio, Jens Axboe

On Thu, Nov 21, 2013 at 02:06:09AM -0800, Kent Overstreet wrote:
> Can you post those patches somewhere? If that's the end goal of this
> patch series, we should be able to look at them too.

I've should have the target side back working soon as I started on it
again about a week ago.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-21 10:11       ` Christoph Hellwig
@ 2013-11-21 10:13         ` Kent Overstreet
  0 siblings, 0 replies; 13+ messages in thread
From: Kent Overstreet @ 2013-11-21 10:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Jens Axboe

On Thu, Nov 21, 2013 at 02:11:38AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 21, 2013 at 02:06:09AM -0800, Kent Overstreet wrote:
> > Can you post those patches somewhere? If that's the end goal of this
> > patch series, we should be able to look at them too.
> 
> I've should have the target side back working soon as I started on it
> again about a week ago.

Broken code is fine, I just want to see the overall approach.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] direct IO support for loop driver
  2013-11-21 10:06     ` Kent Overstreet
  2013-11-21 10:11       ` Christoph Hellwig
@ 2013-11-21 17:34       ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2013-11-21 17:34 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel,
	Maxim V. Patlasov, linux-aio, Jens Axboe

find the target one below.  I haven't found the ecryptfs work, but it
is a lot more involved as it only wants to use direct I/O, not aio.

--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -31,6 +31,7 @@
 #include <linux/spinlock.h>
 #include <linux/module.h>
 #include <linux/falloc.h>
+#include <linux/blk_types.h>
 #include <scsi/scsi.h>
 #include <scsi/scsi_host.h>
 #include <asm/unaligned.h>
@@ -120,6 +121,8 @@ static int fd_configure_device(struct se_device *dev)
 	 * of pure timestamp updates.
 	 */
 	flags = O_RDWR | O_CREAT | O_LARGEFILE | O_DSYNC;
+	if (1)
+		flags |= O_DIRECT;
 
 	/*
 	 * Optionally allow fd_buffered_io=1 to be enabled for people
@@ -546,6 +549,61 @@ fd_execute_unmap(struct se_cmd *cmd)
 	return sbc_execute_unmap(cmd, fd_do_unmap, file);
 }
 
+static void fd_rw_aio_complete(u64 data, long res)
+{
+	struct se_cmd *cmd = (struct se_cmd *)(uintptr_t)data;
+
+	kfree(cmd->priv);
+
+	if (res < 0)
+		target_complete_cmd(cmd, SAM_STAT_CHECK_CONDITION);
+	else
+		target_complete_cmd(cmd, SAM_STAT_GOOD);
+}
+
+static sense_reason_t
+fd_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
+	      enum dma_data_direction data_direction)
+{
+	struct se_device *se_dev = cmd->se_dev;
+	struct fd_dev *dev = FD_DEV(se_dev);
+	struct file *file = dev->fd_file;
+	struct scatterlist *sg;
+	struct kiocb *iocb;
+	unsigned int op;
+	struct iov_iter iter;
+	struct bio_vec *bvec;
+	loff_t pos = (cmd->t_task_lba * se_dev->dev_attrib.block_size);
+	int i;
+
+	iocb = aio_kernel_alloc(GFP_NOIO);
+	if (!iocb)
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+	
+	if (data_direction == DMA_TO_DEVICE)
+		op = IOCB_CMD_WRITE_ITER;
+	else
+		op = IOCB_CMD_READ_ITER;
+
+	bvec = cmd->priv = kcalloc(sgl_nents, sizeof(struct bio_vec), GFP_NOIO);
+	if (!bvec) {
+		aio_kernel_free(iocb);
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+	}
+
+	for_each_sg(sgl, sg, sgl_nents, i) {
+		bvec[i].bv_page = sg_page(sg);
+		bvec[i].bv_len = sg->length;
+		bvec[i].bv_offset = sg->offset;
+	}
+
+	iov_iter_init_bvec(&iter, bvec, sgl_nents, bvec_length(bvec, sgl_nents), 0);
+	aio_kernel_init_rw(iocb, file, iov_iter_count(&iter), pos);
+	aio_kernel_init_callback(iocb, fd_rw_aio_complete, (u64)(uintptr_t)bvec);
+
+	return aio_kernel_submit(iocb, op, &iter);
+}
+
 static sense_reason_t
 fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	      enum dma_data_direction data_direction)
@@ -553,6 +611,9 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	struct se_device *dev = cmd->se_dev;
 	int ret = 0;
 
+	if (1)
+		return fd_rw_aio(cmd, sgl, sgl_nents, data_direction);
+
 	/*
 	 * Call vectorized fileio functions to map struct scatterlist
 	 * physical memory addresses to struct iovec virtual memory.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-21 17:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp
2013-11-18 19:07 ` Dave Kleikamp
2013-11-20 21:19 ` Linus Torvalds
2013-11-20 21:38   ` Linus Torvalds
2013-11-20 21:50     ` Kent Overstreet
2013-11-20 22:46     ` Dave Kleikamp
2013-11-21  4:24       ` Stephen Rothwell
2013-11-21  8:53     ` Jon Medhurst (Tixy)
2013-11-21  9:58   ` Christoph Hellwig
2013-11-21 10:06     ` Kent Overstreet
2013-11-21 10:11       ` Christoph Hellwig
2013-11-21 10:13         ` Kent Overstreet
2013-11-21 17:34       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).