All of lore.kernel.org
 help / color / mirror / Atom feed
* [git pull] vfs part 2
@ 2015-04-15 18:14 Al Viro
  2015-04-23 10:16 ` Andrey Ryabinin
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-04-15 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	Now that net-next went in...  Here's the next big chunk - killing
->aio_read() and ->aio_write().  There'll be one more pile today (direct_IO
changes and generic_write_checks() cleanups/fixes), but I'd prefer to keep
that one separate.  Please, pull from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2

Shortlog:
Al Viro (37):
      cuse: switch to iov_iter
      fuse: switch fuse_direct_io_file_operations to ->{read,write}_iter()
      expand __fuse_direct_write() in both callers
      lustre: kill unused members of struct vvp_thread_info
      ioctx_alloc(): remove pointless check
      aio_run_iocb(): kill dead check
      acct: check FMODE_CAN_WRITE
      nommu: use __vfs_read()
      net/9p: switch the guts of p9_client_{read,write}() to iov_iter
      9p: switch p9_client_write() to passing it struct iov_iter *
      9p: switch ->writepage() to direct use of p9_client_write()
      9p: fold v9fs_file_write_internal() into the caller
      9p: get rid of v9fs_direct_file_write()
      9p: switch p9_client_read() to passing struct iov_iter *
      9p: get rid of v9fs_direct_file_read()
      9p: switch to ->read_iter/->write_iter
      9p: we are leaking glock.client_id in v9fs_file_getlock()
      p9_client_attach(): set fid->uid correctly
      net/9p: remove (now-)unused helpers
      ncpfs: switch to ->read_iter/->write_iter
      coda: switch to ->read_iter/->write_iter
      switch hugetlbfs to ->read_iter()
      new helper: __vfs_write()
      autofs: switch to __vfs_write()
      export __vfs_read()
      ashmem: use __vfs_read()
      serial2002: switch to __vfs_read/__vfs_write
      switch /dev/loop to vfs_iter_write()
      coredump: accept any write method
      make new_sync_{read,write}() static
      switch drivers/char/mem.c to ->read_iter/->write_iter
      fuse: switch to ->read_iter/->write_iter
      fuse: use iov_iter_get_pages() for non-splice path
      kill do_sync_read/do_sync_write
      infinibad: weird APIs switched to ->write_iter()
      pcm: another weird API abuse
      ->aio_read and ->aio_write removed

Diffstat:
 Documentation/filesystems/Locking                  |   2 -
 Documentation/filesystems/porting                  |  12 +
 Documentation/filesystems/vfs.txt                  |   6 -
 arch/s390/hypfs/inode.c                            |   2 -
 drivers/block/loop.c                               |  12 +-
 drivers/char/mem.c                                 |  20 +-
 drivers/char/raw.c                                 |   2 -
 drivers/infiniband/hw/ipath/ipath_file_ops.c       |  18 +-
 drivers/infiniband/hw/qib/qib_file_ops.c           |  20 +-
 drivers/net/macvtap.c                              |   2 -
 drivers/net/tun.c                                  |   2 -
 drivers/staging/android/ashmem.c                   |   2 +-
 drivers/staging/comedi/drivers/serial2002.c        |  18 +-
 drivers/staging/lustre/lustre/llite/file.c         |   6 -
 .../staging/lustre/lustre/llite/llite_internal.h   |   2 -
 drivers/usb/gadget/function/f_fs.c                 |   2 -
 drivers/usb/gadget/legacy/inode.c                  |   2 -
 fs/9p/v9fs_vfs.h                                   |   4 -
 fs/9p/vfs_addr.c                                   |  80 +++---
 fs/9p/vfs_dir.c                                    |  15 +-
 fs/9p/vfs_file.c                                   | 314 ++++-----------------
 fs/9p/xattr.c                                      |  80 ++----
 fs/adfs/file.c                                     |   2 -
 fs/affs/file.c                                     |   2 -
 fs/afs/file.c                                      |   2 -
 fs/aio.c                                           |  23 +-
 fs/autofs4/autofs_i.h                              |   2 +-
 fs/autofs4/waitq.c                                 |   2 +-
 fs/bfs/file.c                                      |   2 -
 fs/block_dev.c                                     |   2 -
 fs/btrfs/file.c                                    |   2 -
 fs/ceph/file.c                                     |   2 -
 fs/cifs/cifsfs.c                                   |  12 -
 fs/coda/file.c                                     |  38 +--
 fs/coredump.c                                      |   2 +-
 fs/ecryptfs/file.c                                 |   2 -
 fs/exofs/file.c                                    |   2 -
 fs/ext2/file.c                                     |   4 -
 fs/ext3/file.c                                     |   2 -
 fs/ext4/file.c                                     |   4 -
 fs/f2fs/file.c                                     |   2 -
 fs/fat/file.c                                      |   2 -
 fs/file_table.c                                    |   4 +-
 fs/fuse/cuse.c                                     |  25 +-
 fs/fuse/dev.c                                      |  63 ++---
 fs/fuse/file.c                                     |  69 ++---
 fs/gfs2/file.c                                     |   4 -
 fs/hfs/inode.c                                     |   2 -
 fs/hfsplus/inode.c                                 |   2 -
 fs/hostfs/hostfs_kern.c                            |   2 -
 fs/hpfs/file.c                                     |   2 -
 fs/hugetlbfs/inode.c                               |  91 +++---
 fs/jffs2/file.c                                    |   2 -
 fs/jfs/file.c                                      |   2 -
 fs/logfs/file.c                                    |   2 -
 fs/minix/file.c                                    |   2 -
 fs/ncpfs/file.c                                    |  90 ++----
 fs/ncpfs/ncplib_kernel.c                           |   6 +-
 fs/ncpfs/ncplib_kernel.h                           |   2 +-
 fs/nfs/file.c                                      |   2 -
 fs/nfs/nfs4file.c                                  |   2 -
 fs/nilfs2/file.c                                   |   2 -
 fs/ntfs/file.c                                     |   2 -
 fs/ocfs2/file.c                                    |   4 -
 fs/omfs/file.c                                     |   2 -
 fs/open.c                                          |   4 +-
 fs/pipe.c                                          |   2 -
 fs/ramfs/file-mmu.c                                |   2 -
 fs/ramfs/file-nommu.c                              |   2 -
 fs/read_write.c                                    | 109 ++-----
 fs/reiserfs/file.c                                 |   2 -
 fs/romfs/mmap-nommu.c                              |   1 -
 fs/sysv/file.c                                     |   2 -
 fs/ubifs/file.c                                    |   2 -
 fs/udf/file.c                                      |   2 -
 fs/ufs/file.c                                      |   2 -
 fs/xfs/xfs_file.c                                  |   2 -
 include/linux/fs.h                                 |   7 +-
 include/net/9p/client.h                            |   8 +-
 include/net/9p/transport.h                         |   2 +-
 kernel/acct.c                                      |   2 +-
 mm/nommu.c                                         |   4 +-
 mm/shmem.c                                         |   2 -
 net/9p/client.c                                    | 262 ++++++++---------
 net/9p/protocol.c                                  |  24 +-
 net/9p/trans_common.c                              |  42 +--
 net/9p/trans_common.h                              |   2 -
 net/9p/trans_virtio.c                              | 137 +++++----
 net/socket.c                                       |   2 -
 sound/core/pcm_native.c                            |  39 +--
 90 files changed, 598 insertions(+), 1185 deletions(-)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-04-15 18:14 [git pull] vfs part 2 Al Viro
@ 2015-04-23 10:16 ` Andrey Ryabinin
  2015-05-25  8:30   ` Andrey Ryabinin
  2015-06-21 21:12     ` Al Viro
  0 siblings, 2 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-04-23 10:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 04/15/2015 09:14 PM, Al Viro wrote:
>       9p: switch p9_client_write() to passing it struct iov_iter *

Hi Al,

This change caused following:

[   91.637917] ==================================================================
[   91.639252] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3e4/0x4b0 at addr ffff8800ba1efd20
[   91.640979] Read of size 8 by task trinity-c15/4746
[   91.641493] page:ffffea0002e87bc0 count:0 mapcount:0 mapping:          (null) index:0x0
[   91.641787] flags: 0x100000000000000()
[   91.641787] page dumped because: kasan: bad access detected
[   91.641787] CPU: 1 PID: 4746 Comm: trinity-c15 Not tainted 4.0.0+ #319
[   91.641787] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[   91.641787]  ffff8800ba1efd08 ffff8800ba1ef918 ffffffff81ab94ed 1ffffd40005d0f7f
[   91.641787]  ffff8800ba1ef9b0 ffff8800ba1ef998 ffffffff812bc0f4 ffffffff814901f0
[   91.641787]  ffff8800ba1efa40 0000000000000296 ffff8801f4993490 ffffffff81228fe0
[   91.641787] Call Trace:
[   91.641787] dump_stack (lib/dump_stack.c:52)
[   91.641787] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
[   91.641787] ? idr_mark_full (lib/idr.c:551)
[   91.641787] ? clear_exceptional_entry (mm/truncate.c:561)
[   91.641787] __asan_report_load8_noabort (mm/kasan/report.c:251)
[   91.641787] ? iov_iter_advance (lib/iov_iter.c:511)
[   91.641787] iov_iter_advance (lib/iov_iter.c:511)
[   91.641787] p9_client_write (net/9p/client.c:1656)
[   91.641787] ? p9_client_readdir (net/9p/client.c:1614)
[   91.641787] ? kasan_kmalloc (mm/kasan/kasan.c:355)
[   91.641787] ? __kmalloc (mm/slub.c:3325)
[   91.641787] ? p9_client_readdir (net/9p/client.c:1614)
[   91.641787] ? v9fs_file_lock_dotl (fs/9p/vfs_file.c:405)
[   91.641787] v9fs_file_write_iter (fs/9p/vfs_file.c:421)
[   91.641787] ? __sb_end_write (fs/super.c:1192)
[   91.641787] ? v9fs_file_lock_dotl (fs/9p/vfs_file.c:405)
[   91.641787] ? do_readv_writev (fs/read_write.c:776)
[   91.641787] ? recalc_sigpending (kernel/signal.c:160)
[   91.641787] ? __set_task_blocked (kernel/signal.c:2514)
[   91.641787] __vfs_write (fs/read_write.c:479 fs/read_write.c:490)
[   91.641787] ? recalc_sigpending (kernel/signal.c:160)
[   91.641787] ? __vfs_read (fs/read_write.c:486)
[   91.641787] ? __sb_end_write (fs/super.c:1192)
[   91.641787] ? signal_setup_done (kernel/signal.c:2556)
[   91.641787] ? hrtimer_start (kernel/time/hrtimer.c:1043)
[   91.641787] ? do_setitimer (kernel/time/itimer.c:222)
[   91.641787] vfs_write (include/linux/fs.h:1984 include/linux/fs.h:2416 fs/read_write.c:543)
[   91.641787] SyS_write (fs/read_write.c:585 fs/read_write.c:576)
[   91.641787] ? SyS_read (fs/read_write.c:576)
[   91.641787] ? init_fpu (arch/x86/kernel/i387.c:231 arch/x86/kernel/i387.c:266)
[   91.641787] ? math_state_restore (arch/x86/kernel/traps.c:869)
[   91.641787] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
[   91.641787] Memory state around the buggy address:
[   91.641787]  ffff8800ba1efc00: f2 00 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f3 f3 f3
[   91.641787]  ffff8800ba1efc80: f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[   91.641787] >ffff8800ba1efd00: f1 00 00 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4 f4
[   91.641787]                                ^
[   91.641787]  ffff8800ba1efd80: f4 f2 f2 f2 f2 00 00 00 00 00 f4 f4 f4 f3 f3 f3
[   91.641787]  ffff8800ba1efe00: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   91.641787] ==================================================================



This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):

p9_client_write():
<...>
		int count = iov_iter_count(from);
<...>
		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
<...>
		iov_iter_advance(from, count);




^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-04-23 10:16 ` Andrey Ryabinin
@ 2015-05-25  8:30   ` Andrey Ryabinin
  2015-06-21 21:12     ` Al Viro
  1 sibling, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-05-25  8:30 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 04/23/2015 01:16 PM, Andrey Ryabinin wrote:
> On 04/15/2015 09:14 PM, Al Viro wrote:
>>       9p: switch p9_client_write() to passing it struct iov_iter *
> 
> Hi Al,
> 
> This change caused following:
> 
> [   91.637917] ==================================================================
> [   91.639252] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3e4/0x4b0 at addr ffff8800ba1efd20
> [   91.640979] Read of size 8 by task trinity-c15/4746
> [   91.641493] page:ffffea0002e87bc0 count:0 mapcount:0 mapping:          (null) index:0x0
> [   91.641787] flags: 0x100000000000000()
> [   91.641787] page dumped because: kasan: bad access detected
> [   91.641787] CPU: 1 PID: 4746 Comm: trinity-c15 Not tainted 4.0.0+ #319
> [   91.641787] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
> [   91.641787]  ffff8800ba1efd08 ffff8800ba1ef918 ffffffff81ab94ed 1ffffd40005d0f7f
> [   91.641787]  ffff8800ba1ef9b0 ffff8800ba1ef998 ffffffff812bc0f4 ffffffff814901f0
> [   91.641787]  ffff8800ba1efa40 0000000000000296 ffff8801f4993490 ffffffff81228fe0
> [   91.641787] Call Trace:
> [   91.641787] dump_stack (lib/dump_stack.c:52)
> [   91.641787] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
> [   91.641787] ? idr_mark_full (lib/idr.c:551)
> [   91.641787] ? clear_exceptional_entry (mm/truncate.c:561)
> [   91.641787] __asan_report_load8_noabort (mm/kasan/report.c:251)
> [   91.641787] ? iov_iter_advance (lib/iov_iter.c:511)
> [   91.641787] iov_iter_advance (lib/iov_iter.c:511)
> [   91.641787] p9_client_write (net/9p/client.c:1656)
> [   91.641787] ? p9_client_readdir (net/9p/client.c:1614)
> [   91.641787] ? kasan_kmalloc (mm/kasan/kasan.c:355)
> [   91.641787] ? __kmalloc (mm/slub.c:3325)
> [   91.641787] ? p9_client_readdir (net/9p/client.c:1614)
> [   91.641787] ? v9fs_file_lock_dotl (fs/9p/vfs_file.c:405)
> [   91.641787] v9fs_file_write_iter (fs/9p/vfs_file.c:421)
> [   91.641787] ? __sb_end_write (fs/super.c:1192)
> [   91.641787] ? v9fs_file_lock_dotl (fs/9p/vfs_file.c:405)
> [   91.641787] ? do_readv_writev (fs/read_write.c:776)
> [   91.641787] ? recalc_sigpending (kernel/signal.c:160)
> [   91.641787] ? __set_task_blocked (kernel/signal.c:2514)
> [   91.641787] __vfs_write (fs/read_write.c:479 fs/read_write.c:490)
> [   91.641787] ? recalc_sigpending (kernel/signal.c:160)
> [   91.641787] ? __vfs_read (fs/read_write.c:486)
> [   91.641787] ? __sb_end_write (fs/super.c:1192)
> [   91.641787] ? signal_setup_done (kernel/signal.c:2556)
> [   91.641787] ? hrtimer_start (kernel/time/hrtimer.c:1043)
> [   91.641787] ? do_setitimer (kernel/time/itimer.c:222)
> [   91.641787] vfs_write (include/linux/fs.h:1984 include/linux/fs.h:2416 fs/read_write.c:543)
> [   91.641787] SyS_write (fs/read_write.c:585 fs/read_write.c:576)
> [   91.641787] ? SyS_read (fs/read_write.c:576)
> [   91.641787] ? init_fpu (arch/x86/kernel/i387.c:231 arch/x86/kernel/i387.c:266)
> [   91.641787] ? math_state_restore (arch/x86/kernel/traps.c:869)
> [   91.641787] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
> [   91.641787] Memory state around the buggy address:
> [   91.641787]  ffff8800ba1efc00: f2 00 f4 f4 f4 f2 f2 f2 f2 00 f4 f4 f4 f3 f3 f3
> [   91.641787]  ffff8800ba1efc80: f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
> [   91.641787] >ffff8800ba1efd00: f1 00 00 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4 f4
> [   91.641787]                                ^
> [   91.641787]  ffff8800ba1efd80: f4 f2 f2 f2 f2 00 00 00 00 00 f4 f4 f4 f3 f3 f3
> [   91.641787]  ffff8800ba1efe00: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [   91.641787] ==================================================================
> 
> 
> 
> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
> 
> p9_client_write():
> <...>
> 		int count = iov_iter_count(from);
> <...>
> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> <...>
> 		iov_iter_advance(from, count);
> 
> 
> 

Ping.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-04-23 10:16 ` Andrey Ryabinin
@ 2015-06-21 21:12     ` Al Viro
  2015-06-21 21:12     ` Al Viro
  1 sibling, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-06-21 21:12 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
> This change caused following:

> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
> 
> p9_client_write():
> <...>
> 		int count = iov_iter_count(from);
> <...>
> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> <...>
> 		iov_iter_advance(from, count);

*blink*

That's a bug, all right, but I would love to see how you trigger it.
It would require server to respond to "write that many bytes" with "OK,
<greater number> bytes written".  We certainly need to cope with that
(we can't trust the server to be sane), but if that's what is going on,
you've got a server bug as well.

Could you check if the patch below triggers WARN_ON() in it on your
reproducer?  p9_client_read() has a similar issue as well...

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..f99bce7 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1588,6 +1588,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
 			p9_free_req(clnt, req);
 			break;
 		}
+		if (count > rsize) {
+			WARN_ON(1);
+			count = rsize;
+		}
 
 		if (non_zc) {
 			int n = copy_to_iter(dataptr, count, to);
@@ -1650,6 +1654,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		}
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
+		if (count > rsize) {
+			WARN_ON(1);
+			count = rsize;
+		}
 
 		p9_free_req(clnt, req);
 		iov_iter_advance(from, count);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2015-06-21 21:12     ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-06-21 21:12 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
> This change caused following:

> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
> 
> p9_client_write():
> <...>
> 		int count = iov_iter_count(from);
> <...>
> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> <...>
> 		iov_iter_advance(from, count);

*blink*

That's a bug, all right, but I would love to see how you trigger it.
It would require server to respond to "write that many bytes" with "OK,
<greater number> bytes written".  We certainly need to cope with that
(we can't trust the server to be sane), but if that's what is going on,
you've got a server bug as well.

Could you check if the patch below triggers WARN_ON() in it on your
reproducer?  p9_client_read() has a similar issue as well...

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..f99bce7 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1588,6 +1588,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
 			p9_free_req(clnt, req);
 			break;
 		}
+		if (count > rsize) {
+			WARN_ON(1);
+			count = rsize;
+		}
 
 		if (non_zc) {
 			int n = copy_to_iter(dataptr, count, to);
@@ -1650,6 +1654,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		}
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
+		if (count > rsize) {
+			WARN_ON(1);
+			count = rsize;
+		}
 
 		p9_free_req(clnt, req);
 		iov_iter_advance(from, count);

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-06-21 21:12     ` Al Viro
@ 2015-06-21 21:16       ` Linus Torvalds
  -1 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2015-06-21 21:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Sun, Jun 21, 2015 at 2:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> +               if (count > rsize) {
> +                       WARN_ON(1);
> +                       count = rsize;
> +               }

So if we'd actually want to merge it with the warning, I'd prefer writing it as

        if (WARN_ON_ONCE(count > rsize))
              count = size;

because it's smaller and cannot spam your logs. WARN_ON_ONCE() will
only _warn_ once, but it always returns the conditional for the
warning, so the above does the right thing.

            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2015-06-21 21:16       ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2015-06-21 21:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Sun, Jun 21, 2015 at 2:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> +               if (count > rsize) {
> +                       WARN_ON(1);
> +                       count = rsize;
> +               }

So if we'd actually want to merge it with the warning, I'd prefer writing it as

        if (WARN_ON_ONCE(count > rsize))
              count = size;

because it's smaller and cannot spam your logs. WARN_ON_ONCE() will
only _warn_ once, but it always returns the conditional for the
warning, so the above does the right thing.

            Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-06-21 21:16       ` Linus Torvalds
@ 2015-06-21 21:35         ` Al Viro
  -1 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-06-21 21:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Sun, Jun 21, 2015 at 02:16:15PM -0700, Linus Torvalds wrote:
> On Sun, Jun 21, 2015 at 2:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > +               if (count > rsize) {
> > +                       WARN_ON(1);
> > +                       count = rsize;
> > +               }
> 
> So if we'd actually want to merge it with the warning, I'd prefer writing it as
> 
>         if (WARN_ON_ONCE(count > rsize))
>               count = size;
> 
> because it's smaller and cannot spam your logs. WARN_ON_ONCE() will
> only _warn_ once, but it always returns the conditional for the
> warning, so the above does the right thing.

Sure, but I would really like to verify that this _is_ what's going on
there.  This is just a "please try that on your reproducer to make sure
that it's not something entirely different", thus the lack of S-o-b,
etc.  For the final variant (and we definitely should cope with BS from
server) we probably want to use p9_debug() instead of WARN_ON/WARN_ON_ONCE.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2015-06-21 21:35         ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-06-21 21:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Sun, Jun 21, 2015 at 02:16:15PM -0700, Linus Torvalds wrote:
> On Sun, Jun 21, 2015 at 2:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > +               if (count > rsize) {
> > +                       WARN_ON(1);
> > +                       count = rsize;
> > +               }
> 
> So if we'd actually want to merge it with the warning, I'd prefer writing it as
> 
>         if (WARN_ON_ONCE(count > rsize))
>               count = size;
> 
> because it's smaller and cannot spam your logs. WARN_ON_ONCE() will
> only _warn_ once, but it always returns the conditional for the
> warning, so the above does the right thing.

Sure, but I would really like to verify that this _is_ what's going on
there.  This is just a "please try that on your reproducer to make sure
that it's not something entirely different", thus the lack of S-o-b,
etc.  For the final variant (and we definitely should cope with BS from
server) we probably want to use p9_debug() instead of WARN_ON/WARN_ON_ONCE.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-06-21 21:12     ` Al Viro
@ 2015-06-22 12:02       ` Andrey Ryabinin
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-06-22 12:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 06/22/2015 12:12 AM, Al Viro wrote:
> On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
>> This change caused following:
> 
>> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
>>
>> p9_client_write():
>> <...>
>> 		int count = iov_iter_count(from);
>> <...>
>> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
>> <...>
>> 		iov_iter_advance(from, count);
> 
> *blink*
> 
> That's a bug, all right, but I would love to see how you trigger it.
> It would require server to respond to "write that many bytes" with "OK,
> <greater number> bytes written".  We certainly need to cope with that
> (we can't trust the server to be sane), but if that's what is going on,
> you've got a server bug as well.
> 
> Could you check if the patch below triggers WARN_ON() in it on your
> reproducer?  p9_client_read() has a similar issue as well...
> 

I've tried something like your patch before to check the read side
and I haven't seen anything before and don't see it right now.
Though, this doesn't mean that there is no problem with read.
I mean that trinity hits this on write and may just not hit this on read.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2015-06-22 12:02       ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-06-22 12:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 06/22/2015 12:12 AM, Al Viro wrote:
> On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
>> This change caused following:
> 
>> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
>>
>> p9_client_write():
>> <...>
>> 		int count = iov_iter_count(from);
>> <...>
>> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
>> <...>
>> 		iov_iter_advance(from, count);
> 
> *blink*
> 
> That's a bug, all right, but I would love to see how you trigger it.
> It would require server to respond to "write that many bytes" with "OK,
> <greater number> bytes written".  We certainly need to cope with that
> (we can't trust the server to be sane), but if that's what is going on,
> you've got a server bug as well.
> 
> Could you check if the patch below triggers WARN_ON() in it on your
> reproducer?  p9_client_read() has a similar issue as well...
> 

I've tried something like your patch before to check the read side
and I haven't seen anything before and don't see it right now.
Though, this doesn't mean that there is no problem with read.
I mean that trinity hits this on write and may just not hit this on read.


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-06-22 12:02       ` Andrey Ryabinin
  (?)
@ 2015-07-01  6:27       ` Al Viro
  2015-07-01  7:50         ` Andrey Ryabinin
  -1 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-01  6:27 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Mon, Jun 22, 2015 at 03:02:11PM +0300, Andrey Ryabinin wrote:
> On 06/22/2015 12:12 AM, Al Viro wrote:
> > On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
> >> This change caused following:
> > 
> >> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
> >>
> >> p9_client_write():
> >> <...>
> >> 		int count = iov_iter_count(from);
> >> <...>
> >> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> >> <...>
> >> 		iov_iter_advance(from, count);
> > 
> > *blink*
> > 
> > That's a bug, all right, but I would love to see how you trigger it.
> > It would require server to respond to "write that many bytes" with "OK,
> > <greater number> bytes written".  We certainly need to cope with that
> > (we can't trust the server to be sane), but if that's what is going on,
> > you've got a server bug as well.
> > 
> > Could you check if the patch below triggers WARN_ON() in it on your
> > reproducer?  p9_client_read() has a similar issue as well...
> > 
> 
> I've tried something like your patch before to check the read side
> and I haven't seen anything before and don't see it right now.
> Though, this doesn't mean that there is no problem with read.
> I mean that trinity hits this on write and may just not hit this on read.

"This" being the WARN_ON() in that patch?  Could you please run the same
test with the following delta and post its printks?  It's one thing if
you are hitting a buggy server, it gets confused and tells you it has
written more bytes than you told it to write.  Quite a different story
in case if we are miscalculating the size we are putting into RWRITE
packet and/or advancing the iterator when we shouldn't...

What server are you using, BTW?  And which transport (virtio or network -
IOW, is it zero-copy path or not)?

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..80e45a5 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1638,6 +1638,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 			req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid,
 						    offset, rsize, from);
 		}
+		if (iov_iter_count(from) != count)
+			printk(KERN_ERR "fucked: iterator got advanced [%d -> %zd]\n",
+					count, iov_iter_count(from));
 		if (IS_ERR(req)) {
 			*err = PTR_ERR(req);
 			break;
@@ -1649,6 +1652,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 			p9_free_req(clnt, req);
 		}
 
+		if (count > rsize)
+			printk(KERN_ERR "fucked: sent %d, server says it got %d (err = %d)\n",
+				rsize, count, *err);
+
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 
 		p9_free_req(clnt, req);

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01  6:27       ` Al Viro
@ 2015-07-01  7:50         ` Andrey Ryabinin
  2015-07-01  8:27           ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-01  7:50 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 07/01/2015 09:27 AM, Al Viro wrote:
> On Mon, Jun 22, 2015 at 03:02:11PM +0300, Andrey Ryabinin wrote:
>> On 06/22/2015 12:12 AM, Al Viro wrote:
>>> On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote:
>>>> This change caused following:
>>>
>>>> This could happen when p9pdu_readf() changes 'count' to some value > iov_iter_count(from):
>>>>
>>>> p9_client_write():
>>>> <...>
>>>> 		int count = iov_iter_count(from);
>>>> <...>
>>>> 		*err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
>>>> <...>
>>>> 		iov_iter_advance(from, count);
>>>
>>> *blink*
>>>
>>> That's a bug, all right, but I would love to see how you trigger it.
>>> It would require server to respond to "write that many bytes" with "OK,
>>> <greater number> bytes written".  We certainly need to cope with that
>>> (we can't trust the server to be sane), but if that's what is going on,
>>> you've got a server bug as well.
>>>
>>> Could you check if the patch below triggers WARN_ON() in it on your
>>> reproducer?  p9_client_read() has a similar issue as well...
>>>
>>
>> I've tried something like your patch before to check the read side
>> and I haven't seen anything before and don't see it right now.
>> Though, this doesn't mean that there is no problem with read.
>> I mean that trinity hits this on write and may just not hit this on read.
> 
> "This" being the WARN_ON() in that patch? 

Yes.


> Could you please run the same
> test with the following delta and post its printks? 

# dmesg | grep fucked

[  114.732166] fucked: sent 2037, server says it got 2047 (err = 0)
[  124.937105] fucked: sent 27, server says it got 4096 (err = 0)
[  154.075400] fucked: sent 19, server says it got 4096 (err = 0)

> It's one thing if
> you are hitting a buggy server, it gets confused and tells you it has
> written more bytes than you told it to write.  Quite a different story
> in case if we are miscalculating the size we are putting into RWRITE
> packet and/or advancing the iterator when we shouldn't...
> 
> What server are you using, BTW?  And which transport (virtio or network -
> IOW, is it zero-copy path or not)?

qemu v2.2.1, virtio transport.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01  7:50         ` Andrey Ryabinin
@ 2015-07-01  8:27           ` Al Viro
  2015-07-01  8:41             ` Andrey Ryabinin
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-01  8:27 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Jul 01, 2015 at 10:50:59AM +0300, Andrey Ryabinin wrote:

> # dmesg | grep fucked
> 
> [  114.732166] fucked: sent 2037, server says it got 2047 (err = 0)
> [  124.937105] fucked: sent 27, server says it got 4096 (err = 0)
> [  154.075400] fucked: sent 19, server says it got 4096 (err = 0)
> 
> > It's one thing if
> > you are hitting a buggy server, it gets confused and tells you it has
> > written more bytes than you told it to write.  Quite a different story
> > in case if we are miscalculating the size we are putting into RWRITE
								  TWRITE,
sorry about the braino.

> > packet and/or advancing the iterator when we shouldn't...
> > 
> > What server are you using, BTW?  And which transport (virtio or network -
> > IOW, is it zero-copy path or not)?
> 
> qemu v2.2.1, virtio transport.

Hmm...  The first one should've hit the zero-copy path then, the last two
are too short for that...  *grumble*

I wonder if these short writes are getting matched with response from wrong
request - that would explain full-page responses...

Could you check if 3.19 was getting anything similar?   I.e. in
p9_client_write() there add
	if (count > rsize)
		printk(KERN_ERR "bogus RWRITE: %d -> %d\n", rsize, count);
just before
	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
and see if that triggers...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01  8:27           ` Al Viro
@ 2015-07-01  8:41             ` Andrey Ryabinin
  2015-07-01  8:55               ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-01  8:41 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 07/01/2015 11:27 AM, Al Viro wrote:
> 
> Could you check if 3.19 was getting anything similar?   I.e. in
> p9_client_write() there add
> 	if (count > rsize)
> 		printk(KERN_ERR "bogus RWRITE: %d -> %d\n", rsize, count);
> just before
> 	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
> and see if that triggers...
> 

Yeah, the same thing:
	[  125.962374] bogus RWRITE: 27 -> 4096
	[  207.587632] bogus RWRITE: 27 -> 4096
	[  215.055627] bogus RWRITE: 27 -> 4096
	[  235.583138] bogus RWRITE: 27 -> 4096
	[  245.749174] bogus RWRITE: 27 -> 4096
	[  246.759270] bogus RWRITE: 27 -> 4096
	[  248.020787] bogus RWRITE: 27 -> 4096

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01  8:41             ` Andrey Ryabinin
@ 2015-07-01  8:55               ` Al Viro
  2015-07-01 11:25                 ` Andrey Ryabinin
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-01  8:55 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Jul 01, 2015 at 11:41:04AM +0300, Andrey Ryabinin wrote:
> On 07/01/2015 11:27 AM, Al Viro wrote:
> > 
> > Could you check if 3.19 was getting anything similar?   I.e. in
> > p9_client_write() there add
> > 	if (count > rsize)
> > 		printk(KERN_ERR "bogus RWRITE: %d -> %d\n", rsize, count);
> > just before
> > 	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
> > and see if that triggers...
> > 
> 
> Yeah, the same thing:
> 	[  125.962374] bogus RWRITE: 27 -> 4096
> 	[  207.587632] bogus RWRITE: 27 -> 4096
> 	[  215.055627] bogus RWRITE: 27 -> 4096
> 	[  235.583138] bogus RWRITE: 27 -> 4096
> 	[  245.749174] bogus RWRITE: 27 -> 4096
> 	[  246.759270] bogus RWRITE: 27 -> 4096
> 	[  248.020787] bogus RWRITE: 27 -> 4096

Hrm...  Could you add (int)req->rc->id, (int)req->rc->tag and (int)req->tc->tag
to that printk (on either kernel, the problem's apparently not new)?

The question is whether we are mismatching replies, sending bogus requests or
if it's really the server sending bogus replies.  Which qemu version are
you using, BTW?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01  8:55               ` Al Viro
@ 2015-07-01 11:25                 ` Andrey Ryabinin
  2015-07-01 18:44                   ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-01 11:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On 07/01/2015 11:55 AM, Al Viro wrote:
> On Wed, Jul 01, 2015 at 11:41:04AM +0300, Andrey Ryabinin wrote:
>> On 07/01/2015 11:27 AM, Al Viro wrote:
>>>
>>> Could you check if 3.19 was getting anything similar?   I.e. in
>>> p9_client_write() there add
>>> 	if (count > rsize)
>>> 		printk(KERN_ERR "bogus RWRITE: %d -> %d\n", rsize, count);
>>> just before
>>> 	p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
>>> and see if that triggers...
>>>
>>
>> Yeah, the same thing:
>> 	[  125.962374] bogus RWRITE: 27 -> 4096
>> 	[  207.587632] bogus RWRITE: 27 -> 4096
>> 	[  215.055627] bogus RWRITE: 27 -> 4096
>> 	[  235.583138] bogus RWRITE: 27 -> 4096
>> 	[  245.749174] bogus RWRITE: 27 -> 4096
>> 	[  246.759270] bogus RWRITE: 27 -> 4096
>> 	[  248.020787] bogus RWRITE: 27 -> 4096
> 
> Hrm...  Could you add (int)req->rc->id, (int)req->rc->tag and (int)req->tc->tag
> to that printk (on either kernel, the problem's apparently not new)?
> 

I've attached gdb instead.
So, after message "bogus RWRITE: 93 -> 4096"
I've got this:

(gdb) p *req->rc
$11 = {size = 11, id = 119 'w', tag = 3, offset = 11, capacity = 8192, sdata = 0xffff8802347b8020 "\v"}
(gdb) p *req->tc
$12 = {size = 116, id = 118 'v', tag = 3, offset = 0, capacity = 8192, sdata = 0xffff88023479c020 "t"}


> The question is whether we are mismatching replies, sending bogus requests or
> if it's really the server sending bogus replies.  Which qemu version are
> you using, BTW?
> 

As I said before qemu's version is 2.2.1.

So, I've decided to try kvmtool. It took a bit longer to trigger, but still:
	[  466.552432] bogus RWRITE: 57 -> 8168
	[  969.317058] bogus RWRITE: 27 -> 8168

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01 11:25                 ` Andrey Ryabinin
@ 2015-07-01 18:44                   ` Al Viro
  2015-07-02  3:20                     ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-01 18:44 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Jul 01, 2015 at 02:25:43PM +0300, Andrey Ryabinin wrote:

> I've attached gdb instead.
> So, after message "bogus RWRITE: 93 -> 4096"
> I've got this:
> 
> (gdb) p *req->rc
> $11 = {size = 11, id = 119 'w', tag = 3, offset = 11, capacity = 8192, sdata = 0xffff8802347b8020 "\v"}
> (gdb) p *req->tc
> $12 = {size = 116, id = 118 'v', tag = 3, offset = 0, capacity = 8192, sdata = 0xffff88023479c020 "t"}

*grumble*

Request: id = P9_TWRITE, tag = 3.  Response: id = P9_RWRITE, tag = 3.
Size of request is reasonable: it should be 32bit request size + 8bit id
(TWRITE) + 16bit tag + 32bit fid + 64bit offset + 32bit count + payload,
i.e. 23 + payload size.  23 + 93 == 116, which is what we have there.
Success response is 32bit request size + 8bit id (RWRITE) + 16bit tag + 32bit
count, i.e. 11 bytes and that seems to be what we are getting here.

That would appear to exclude the possibility of bogus request - even if we had
somehow ended up with count == 4096 in TWRITE arguments, server wouldn't have
anywhere to get that much data from and either the things are *really* badly
fucked on server, or it should've replied with RERROR.

To exclude it completely we could check 4 bytes at req->tc->sdata + 19
(that's where count is), but I don't believe that this is where the problem
is.

The thing is, looking through qemu hw/9pfs/virtio-9p.c:v9fs_write() and the
stuff it calls, I don't see any way for that kind of crap to happen there...
Just in case - after the do-while loop in qemu v9fs_write(), just prior
to
    offset = 7;
    err = pdu_marshal(pdu, offset, "d", total);
    if (err < 0) {
        goto out;
    }

could you slap something like
	if (total > count)
		*(char *)0 = 0;

and see if it dumps core on your test?  Or use some more humane form of
debugging - it's been a while since I last ran qemu under gdb and I'd
need more coffee to bring that memory up...

Mismatched reply could also be a possibility, but only if we end up with
sending more than one request with the same tag without waiting for response
for the first one.

The reason why I don't want to let it go just with the "cap count with rsize
in a couple of places in net/9p/client.c" (which we'll definitely need - we
can't assume that server is sane, not compromised, etc.) is that it smells
like a symptom of something very fishy and I'd prefer to get to the root of
that thing...

Oh, lovely - I do see one bug in there that could've lead to bogosities around
the request lifetimes, but it's more recent than 3.19, so we have something
else going on.  In any case, the patch below is needed to fix a fuckup
introduced in 4.0 - getting truncated RWRITE packet from server (as in
"badly malformed response", not "short write") ends up with double-free.
Almost certainly not what we are hitting here, though.

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..ca3b342 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1647,6 +1647,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-01 18:44                   ` Al Viro
@ 2015-07-02  3:20                     ` Al Viro
  2015-07-02  4:10                       ` running out of tags in 9P (was Re: [git pull] vfs part 2) Al Viro
  2015-07-02 12:00                       ` [git pull] vfs part 2 Jeff Layton
  0 siblings, 2 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  3:20 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote:
> Mismatched reply could also be a possibility, but only if we end up with
> sending more than one request with the same tag without waiting for response
> for the first one.

... and I think I see what's going on.  Tags are 16bit.  Suppose the
server stalls for some reason *and* we keep piling the requests up.
New tags keep being grabbed by this:

        tag = P9_NOTAG;
        if (type != P9_TVERSION) {
                tag = p9_idpool_get(c->tagpool);
                if (tag < 0)
                        return ERR_PTR(-ENOMEM);
        }
tag is int here.  Then we pass tag to
        req = p9_tag_alloc(c, tag, req_size);
and that's what sets req->tc->tag.  OK, but... The argument of p9_tag_alloc()
in u16, so after 2^16 pending requests we'll wrap around.  p9_idpool_get()
will happily return values greater than 65535 - it's using idr and it's
used (with different pools) for 16bit tags and 32bit FIDs.

Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from
p9_tag_alloc(c, 3, max_size).  And we are fucked - as far as the server is
concerned, we'd just sent another request with tag 3.  And on the client
there are two threads waiting for responses on the same p9_req_t.  Both
happen to be TWRITE.  Response to the first request arrives and we happen
to let the second thread go at it first.  Voila - the first request had
been for page-sized write() and got successfully handled.  The _second_ one
had been short and is very surprised to see confirmation of 4Kb worth of
data having been written.

It should be easy to confirm - in p9_client_prepare_req() add
		if (WARN_ON_ONCE(tag != (u16)tag)) {
			p9_idpool_put(tag, c->tagpool);
			return ERR_PTR(-ENOMEM);
		}
right after
                tag = p9_idpool_get(c->tagpool);
                if (tag < 0)
                        return ERR_PTR(-ENOMEM);

and see if it triggers.  I'm not sure if failing with ENOMEM is the
right response (another variant is to sleep there until the pile
gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
not for the real work, but it will do for confirming that this is what
we are hitting.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* running out of tags in 9P (was Re: [git pull] vfs part 2)
  2015-07-02  3:20                     ` Al Viro
@ 2015-07-02  4:10                       ` Al Viro
  2015-07-02  7:50                           ` Andrey Ryabinin
       [not found]                         ` <5594E5EB.4030808@samsung.com>
  2015-07-02 12:00                       ` [git pull] vfs part 2 Jeff Layton
  1 sibling, 2 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  4:10 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, linux-kernel, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

[9p and sunrpc folks added to Cc]

On Thu, Jul 02, 2015 at 04:20:42AM +0100, Al Viro wrote:
> On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote:
> > Mismatched reply could also be a possibility, but only if we end up with
> > sending more than one request with the same tag without waiting for response
> > for the first one.
> 
> ... and I think I see what's going on.  Tags are 16bit.  Suppose the
> server stalls for some reason *and* we keep piling the requests up.
> New tags keep being grabbed by this:
> 
>         tag = P9_NOTAG;
>         if (type != P9_TVERSION) {
>                 tag = p9_idpool_get(c->tagpool);
>                 if (tag < 0)
>                         return ERR_PTR(-ENOMEM);
>         }
> tag is int here.  Then we pass tag to
>         req = p9_tag_alloc(c, tag, req_size);
> and that's what sets req->tc->tag.  OK, but... The argument of p9_tag_alloc()
> in u16, so after 2^16 pending requests we'll wrap around.  p9_idpool_get()
> will happily return values greater than 65535 - it's using idr and it's
> used (with different pools) for 16bit tags and 32bit FIDs.
> 
> Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from
> p9_tag_alloc(c, 3, max_size).  And we are fucked - as far as the server is
> concerned, we'd just sent another request with tag 3.  And on the client
> there are two threads waiting for responses on the same p9_req_t.  Both
> happen to be TWRITE.  Response to the first request arrives and we happen
> to let the second thread go at it first.  Voila - the first request had
> been for page-sized write() and got successfully handled.  The _second_ one
> had been short and is very surprised to see confirmation of 4Kb worth of
> data having been written.
> 
> It should be easy to confirm - in p9_client_prepare_req() add
> 		if (WARN_ON_ONCE(tag != (u16)tag)) {
> 			p9_idpool_put(tag, c->tagpool);
> 			return ERR_PTR(-ENOMEM);
> 		}
> right after
>                 tag = p9_idpool_get(c->tagpool);
>                 if (tag < 0)
>                         return ERR_PTR(-ENOMEM);
> 
> and see if it triggers.  I'm not sure if failing with ENOMEM is the
> right response (another variant is to sleep there until the pile
> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> not for the real work, but it will do for confirming that this is what
> we are hitting.

FWIW, we probably would be better off with throttling rather than ENOMEM
in such situations.  I'm not familiar with sunrpc enough to be sure how
to do that right way (note that RPC equivalent of 9P tags is 32bit, so
the throttling there is based on memory shortage rather than running out
of XID space), but the interesting issues should be similar - potential
deadlocks in near-OOM situations.  Suggestions?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  7:50                           ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02  7:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen, linux-nfs

[repeating, since my previous email didn't reach mailing lists]

2015-07-02 7:10 GMT+03:00 Al Viro <viro@zeniv.linux.org.uk>:
>> It should be easy to confirm - in p9_client_prepare_req() add
>>               if (WARN_ON_ONCE(tag != (u16)tag)) {
>>                       p9_idpool_put(tag, c->tagpool);
>>                       return ERR_PTR(-ENOMEM);
>>               }
>> right after
>>                 tag = p9_idpool_get(c->tagpool);
>>                 if (tag < 0)
>>                         return ERR_PTR(-ENOMEM);
>>
>> and see if it triggers.  I'm not sure if failing with ENOMEM is the
>> right response (another variant is to sleep there until the pile
>> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
>> not for the real work, but it will do for confirming that this is what
>> we are hitting.
>

Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  7:50                           ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02  7:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

[repeating, since my previous email didn't reach mailing lists]

2015-07-02 7:10 GMT+03:00 Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>:
>> It should be easy to confirm - in p9_client_prepare_req() add
>>               if (WARN_ON_ONCE(tag != (u16)tag)) {
>>                       p9_idpool_put(tag, c->tagpool);
>>                       return ERR_PTR(-ENOMEM);
>>               }
>> right after
>>                 tag = p9_idpool_get(c->tagpool);
>>                 if (tag < 0)
>>                         return ERR_PTR(-ENOMEM);
>>
>> and see if it triggers.  I'm not sure if failing with ENOMEM is the
>> right response (another variant is to sleep there until the pile
>> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
>> not for the real work, but it will do for confirming that this is what
>> we are hitting.
>

Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
       [not found]                         ` <5594E5EB.4030808@samsung.com>
@ 2015-07-02  7:50                           ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  7:50 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, linux-kernel, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 10:19:07AM +0300, Andrey Ryabinin wrote:
> On 07/02/2015 07:10 AM, Al Viro wrote:
> >>
> >> It should be easy to confirm - in p9_client_prepare_req() add
> >> 		if (WARN_ON_ONCE(tag != (u16)tag)) {
> >> 			p9_idpool_put(tag, c->tagpool);
> >> 			return ERR_PTR(-ENOMEM);
> >> 		}
> >> right after
> >>                 tag = p9_idpool_get(c->tagpool);
> >>                 if (tag < 0)
> >>                         return ERR_PTR(-ENOMEM);
> >>
> >> and see if it triggers.  I'm not sure if failing with ENOMEM is the
> >> right response (another variant is to sleep there until the pile
> >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> >> not for the real work, but it will do for confirming that this is what
> >> we are hitting.
> > 
> 
> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.

While the one in p9_client_write() (on rsize < count) did?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  7:59                             ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  7:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote:

> >> and see if it triggers.  I'm not sure if failing with ENOMEM is the
> >> right response (another variant is to sleep there until the pile
> >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> >> not for the real work, but it will do for confirming that this is what
> >> we are hitting.
> >
> 
> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.

Summary for those who'd missed the beginning of the thread: what we are
seeing is p9_client_write() issing TWRITE and getting RWRITE in reply
(tags match, packets look plausible) with count in RWRITE way more than
that in TWRITE.

IOW, we are telling the server to write e.g. 93 bytes and are getting told
that yes, the write had been successful - all 4096 bytes of it.

qemu virtio-9p for server; from my reading of qemu side of things, it can't
be sending reply with count greater than that in request.

The bug I suspected to be the cause of that is in tag allocation in
net/9p/client.c - we could end up wrapping around 2^16 with enough pending
requests and that would have triggered that kind of mess.  However, Andrey
doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
BTW, was that on the run where debugging printk in p9_client_write() *did*
trigger?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  7:59                             ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  7:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote:

> >> and see if it triggers.  I'm not sure if failing with ENOMEM is the
> >> right response (another variant is to sleep there until the pile
> >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> >> not for the real work, but it will do for confirming that this is what
> >> we are hitting.
> >
> 
> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.

Summary for those who'd missed the beginning of the thread: what we are
seeing is p9_client_write() issing TWRITE and getting RWRITE in reply
(tags match, packets look plausible) with count in RWRITE way more than
that in TWRITE.

IOW, we are telling the server to write e.g. 93 bytes and are getting told
that yes, the write had been successful - all 4096 bytes of it.

qemu virtio-9p for server; from my reading of qemu side of things, it can't
be sending reply with count greater than that in request.

The bug I suspected to be the cause of that is in tag allocation in
net/9p/client.c - we could end up wrapping around 2^16 with enough pending
requests and that would have triggered that kind of mess.  However, Andrey
doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
BTW, was that on the run where debugging printk in p9_client_write() *did*
trigger?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
  2015-07-02  7:59                             ` Al Viro
  (?)
@ 2015-07-02  8:19                             ` Andrey Ryabinin
  2015-07-02  8:25                               ` Al Viro
  -1 siblings, 1 reply; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02  8:19 UTC (permalink / raw)
  To: Al Viro, Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On 07/02/2015 10:59 AM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote:
>
>>>> and see if it triggers.  I'm not sure if failing with ENOMEM is the
>>>> right response (another variant is to sleep there until the pile
>>>> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
>>>> not for the real work, but it will do for confirming that this is what
>>>> we are hitting.
>>>
>>
>> Apparently, I'm seeing something else. That WARN_ON_ONCE didn't trigger.
>
> Summary for those who'd missed the beginning of the thread: what we are
> seeing is p9_client_write() issing TWRITE and getting RWRITE in reply
> (tags match, packets look plausible) with count in RWRITE way more than
> that in TWRITE.
>
> IOW, we are telling the server to write e.g. 93 bytes and are getting told
> that yes, the write had been successful - all 4096 bytes of it.
>
> qemu virtio-9p for server; from my reading of qemu side of things, it can't
> be sending reply with count greater than that in request.

Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
this under kvmtool as well. It just takes a bit longer to reproduce
this in kvmtool.

> The bug I suspected to be the cause of that is in tag allocation in
> net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> requests and that would have triggered that kind of mess.  However, Andrey
> doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> BTW, was that on the run where debugging printk in p9_client_write() *did*
> trigger?

Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
but debug printk in p9_client_write() *did* trigger.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
  2015-07-02  8:19                             ` Andrey Ryabinin
@ 2015-07-02  8:25                               ` Al Viro
  2015-07-02  8:42                                   ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-02  8:25 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
> this under kvmtool as well. It just takes a bit longer to reproduce
> this in kvmtool.
> 
> > The bug I suspected to be the cause of that is in tag allocation in
> > net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> > requests and that would have triggered that kind of mess.  However, Andrey
> > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> > BTW, was that on the run where debugging printk in p9_client_write() *did*
> > trigger?
> 
> Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
> but debug printk in p9_client_write() *did* trigger.

Bloody wonderful...  Could you check if v9fs_write() in qemu
hw/9pfs/virtio-9p.c ever gets to
    offset = 7;
    err = pdu_marshal(pdu, offset, "d", total);
with total > count on your testcase?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  8:42                                   ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  8:42 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
> > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
> > this under kvmtool as well. It just takes a bit longer to reproduce
> > this in kvmtool.
> > 
> > > The bug I suspected to be the cause of that is in tag allocation in
> > > net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> > > requests and that would have triggered that kind of mess.  However, Andrey
> > > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> > > BTW, was that on the run where debugging printk in p9_client_write() *did*
> > > trigger?
> > 
> > Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
> > but debug printk in p9_client_write() *did* trigger.
> 
> Bloody wonderful...  Could you check if v9fs_write() in qemu
> hw/9pfs/virtio-9p.c ever gets to
>     offset = 7;
>     err = pdu_marshal(pdu, offset, "d", total);
> with total > count on your testcase?

Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
before
        req->status = REQ_STATUS_ALLOC;
check that req->status == REQ_STATUS_IDLE and yell if it isn't.

BTW, the loop in there (
                /* check again since original check was outside of lock */
                while (tag >= c->max_tag) {
) looks fishy.  If we get more than P9_ROW_MAXTAG allocations at once,
we'll have trouble, but I doubt that this is what we are hitting.  In any
case, adding WARN_ON(c->req[row]); right after
                        row = (tag / P9_ROW_MAXTAG);
wouldn't hurt.  I would be very surprised if that one triggered, though.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02  8:42                                   ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02  8:42 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
> > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
> > this under kvmtool as well. It just takes a bit longer to reproduce
> > this in kvmtool.
> > 
> > > The bug I suspected to be the cause of that is in tag allocation in
> > > net/9p/client.c - we could end up wrapping around 2^16 with enough pending
> > > requests and that would have triggered that kind of mess.  However, Andrey
> > > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
> > > BTW, was that on the run where debugging printk in p9_client_write() *did*
> > > trigger?
> > 
> > Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
> > but debug printk in p9_client_write() *did* trigger.
> 
> Bloody wonderful...  Could you check if v9fs_write() in qemu
> hw/9pfs/virtio-9p.c ever gets to
>     offset = 7;
>     err = pdu_marshal(pdu, offset, "d", total);
> with total > count on your testcase?

Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
before
        req->status = REQ_STATUS_ALLOC;
check that req->status == REQ_STATUS_IDLE and yell if it isn't.

BTW, the loop in there (
                /* check again since original check was outside of lock */
                while (tag >= c->max_tag) {
) looks fishy.  If we get more than P9_ROW_MAXTAG allocations at once,
we'll have trouble, but I doubt that this is what we are hitting.  In any
case, adding WARN_ON(c->req[row]); right after
                        row = (tag / P9_ROW_MAXTAG);
wouldn't hurt.  I would be very surprised if that one triggered, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02  3:20                     ` Al Viro
  2015-07-02  4:10                       ` running out of tags in 9P (was Re: [git pull] vfs part 2) Al Viro
@ 2015-07-02 12:00                       ` Jeff Layton
  2015-07-02 12:07                         ` Jeff Layton
  1 sibling, 1 reply; 69+ messages in thread
From: Jeff Layton @ 2015-07-02 12:00 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, 2 Jul 2015 04:20:42 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote:
> > Mismatched reply could also be a possibility, but only if we end up with
> > sending more than one request with the same tag without waiting for response
> > for the first one.
> 
> ... and I think I see what's going on.  Tags are 16bit.  Suppose the
> server stalls for some reason *and* we keep piling the requests up.
> New tags keep being grabbed by this:
> 
>         tag = P9_NOTAG;
>         if (type != P9_TVERSION) {
>                 tag = p9_idpool_get(c->tagpool);
>                 if (tag < 0)
>                         return ERR_PTR(-ENOMEM);
>         }
> tag is int here.  Then we pass tag to
>         req = p9_tag_alloc(c, tag, req_size);
> and that's what sets req->tc->tag.  OK, but... The argument of p9_tag_alloc()
> in u16, so after 2^16 pending requests we'll wrap around.  p9_idpool_get()
> will happily return values greater than 65535 - it's using idr and it's
> used (with different pools) for 16bit tags and 32bit FIDs.
> 
> Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from
> p9_tag_alloc(c, 3, max_size).  And we are fucked - as far as the server is
> concerned, we'd just sent another request with tag 3.  And on the client
> there are two threads waiting for responses on the same p9_req_t.  Both
> happen to be TWRITE.  Response to the first request arrives and we happen
> to let the second thread go at it first.  Voila - the first request had
> been for page-sized write() and got successfully handled.  The _second_ one
> had been short and is very surprised to see confirmation of 4Kb worth of
> data having been written.
> 
> It should be easy to confirm - in p9_client_prepare_req() add
> 		if (WARN_ON_ONCE(tag != (u16)tag)) {
> 			p9_idpool_put(tag, c->tagpool);
> 			return ERR_PTR(-ENOMEM);
> 		}
> right after
>                 tag = p9_idpool_get(c->tagpool);
>                 if (tag < 0)
>                         return ERR_PTR(-ENOMEM);
> 
> and see if it triggers.  I'm not sure if failing with ENOMEM is the
> right response (another variant is to sleep there until the pile
> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> not for the real work, but it will do for confirming that this is what
> we are hitting.

ISTM that pd_idpool_get ought to be using idr_alloc_cyclic instead.
That should ensure that it's only allocating values from within the
given range.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 12:00                       ` [git pull] vfs part 2 Jeff Layton
@ 2015-07-02 12:07                         ` Jeff Layton
  2015-07-02 16:45                           ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Jeff Layton @ 2015-07-02 12:07 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, 2 Jul 2015 08:00:26 -0400
Jeff Layton <jlayton@poochiereds.net> wrote:

> On Thu, 2 Jul 2015 04:20:42 +0100
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote:
> > > Mismatched reply could also be a possibility, but only if we end up with
> > > sending more than one request with the same tag without waiting for response
> > > for the first one.
> > 
> > ... and I think I see what's going on.  Tags are 16bit.  Suppose the
> > server stalls for some reason *and* we keep piling the requests up.
> > New tags keep being grabbed by this:
> > 
> >         tag = P9_NOTAG;
> >         if (type != P9_TVERSION) {
> >                 tag = p9_idpool_get(c->tagpool);
> >                 if (tag < 0)
> >                         return ERR_PTR(-ENOMEM);
> >         }
> > tag is int here.  Then we pass tag to
> >         req = p9_tag_alloc(c, tag, req_size);
> > and that's what sets req->tc->tag.  OK, but... The argument of p9_tag_alloc()
> > in u16, so after 2^16 pending requests we'll wrap around.  p9_idpool_get()
> > will happily return values greater than 65535 - it's using idr and it's
> > used (with different pools) for 16bit tags and 32bit FIDs.
> > 
> > Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from
> > p9_tag_alloc(c, 3, max_size).  And we are fucked - as far as the server is
> > concerned, we'd just sent another request with tag 3.  And on the client
> > there are two threads waiting for responses on the same p9_req_t.  Both
> > happen to be TWRITE.  Response to the first request arrives and we happen
> > to let the second thread go at it first.  Voila - the first request had
> > been for page-sized write() and got successfully handled.  The _second_ one
> > had been short and is very surprised to see confirmation of 4Kb worth of
> > data having been written.
> > 
> > It should be easy to confirm - in p9_client_prepare_req() add
> > 		if (WARN_ON_ONCE(tag != (u16)tag)) {
> > 			p9_idpool_put(tag, c->tagpool);
> > 			return ERR_PTR(-ENOMEM);
> > 		}
> > right after
> >                 tag = p9_idpool_get(c->tagpool);
> >                 if (tag < 0)
> >                         return ERR_PTR(-ENOMEM);
> > 
> > and see if it triggers.  I'm not sure if failing with ENOMEM is the
> > right response (another variant is to sleep there until the pile
> > gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely
> > not for the real work, but it will do for confirming that this is what
> > we are hitting.
> 
> ISTM that pd_idpool_get ought to be using idr_alloc_cyclic instead.
> That should ensure that it's only allocating values from within the
> given range.
> 

Erm...and why is it passing in '0' to idr_alloc for the end value if it
can't deal with more than 16 bits? That seems like a plain old bug...

The other stuff you've noted should also be fixed of course, but the
IDR usage here could use a little rework.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 12:19                                     ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02 12:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On 07/02/2015 11:42 AM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
>> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
>>> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
>>> this under kvmtool as well. It just takes a bit longer to reproduce
>>> this in kvmtool.
>>>
>>>> The bug I suspected to be the cause of that is in tag allocation in
>>>> net/9p/client.c - we could end up wrapping around 2^16 with enough pending
>>>> requests and that would have triggered that kind of mess.  However, Andrey
>>>> doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
>>>> BTW, was that on the run where debugging printk in p9_client_write() *did*
>>>> trigger?
>>>
>>> Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
>>> but debug printk in p9_client_write() *did* trigger.
>>
>> Bloody wonderful...  Could you check if v9fs_write() in qemu
>> hw/9pfs/virtio-9p.c ever gets to
>>     offset = 7;
>>     err = pdu_marshal(pdu, offset, "d", total);
>> with total > count on your testcase?

Added:
+    if (total > count)
+           *(char *)0 = 0

and never hit this condition.

> 
> Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
> before
>         req->status = REQ_STATUS_ALLOC;
> check that req->status == REQ_STATUS_IDLE and yell if it isn't.
> 

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..16a17a0 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -286,6 +286,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
        p9pdu_reset(req->rc);

        req->tc->tag = tag-1;
+       if (WARN_ON(req->status != REQ_STATUS_IDLE))
+               pr_err("req->status: %d\n", req->status);
        req->status = REQ_STATUS_ALLOC;

        return req;

[  150.155020] ------------[ cut here ]------------
[  150.156700] WARNING: CPU: 2 PID: 2304 at ../net/9p/client.c:289 p9_client_prepare_req+0x3b0/0x550()
[  150.158404] Modules linked in:
[  150.160177] CPU: 2 PID: 2304 Comm: trinity-c84 Not tainted 4.1.0-rc8+ #409
[  150.161794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  150.165540]  0000000000000009 ffff8801ed13f7a8 ffffffff8161434b 0000000000000000
[  150.170939]  0000000000000000 ffff8801ed13f7f8 ffffffff8107cf99 ffff8801f451d5f0
[  150.175942]  ffffffff815f6760 0000000000000003 ffff8800bbac00e0 ffff8800bbac00f0
[  150.178393] Call Trace:
[  150.178883]  [<ffffffff8161434b>] dump_stack+0x45/0x57
[  150.179914]  [<ffffffff8107cf99>] warn_slowpath_common+0x99/0xe0
[  150.181375]  [<ffffffff815f6760>] ? p9_client_prepare_req+0x3b0/0x550
[  150.182597]  [<ffffffff8107d145>] warn_slowpath_null+0x15/0x20
[  150.184067]  [<ffffffff815f6760>] p9_client_prepare_req+0x3b0/0x550
[  150.185043]  [<ffffffff815fb1e2>] p9_client_zc_rpc.constprop.5+0xe2/0x730
[  150.186229]  [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[  150.187049]  [<ffffffff815fb100>] ? p9_client_xattrwalk+0x1b0/0x1b0
[  150.188477]  [<ffffffff812b1b5e>] ? idr_remove+0x2ce/0x420
[  150.189443]  [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[  150.190456]  [<ffffffff812b1890>] ? idr_mark_full+0x80/0x80
[  150.191489]  [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[  150.193911]  [<ffffffff8161bae7>] ? _raw_spin_unlock_irqrestore+0x47/0xb0
[  150.195462]  [<ffffffff815fcae4>] ? p9_idpool_put+0x54/0x60
[  150.196729]  [<ffffffff812c9906>] ? iov_iter_advance+0xb6/0x240
[  150.199766]  [<ffffffff815fbea3>] p9_client_write+0x333/0x3d0
[  150.201073]  [<ffffffff811a882e>] ? kasan_kmalloc+0x5e/0x70
[  150.202512]  [<ffffffff815fbb70>] ? p9_client_readdir+0x340/0x340
[  150.204115]  [<ffffffff811c80dd>] ? rw_copy_check_uvector+0xed/0x170
[  150.204960]  [<ffffffff812d82d6>] ? __percpu_counter_add+0x26/0xb0
[  150.206517]  [<ffffffff8113574a>] ? generic_write_checks+0xfa/0x1e0
[  150.208092]  [<ffffffff8125c054>] v9fs_file_write_iter+0xc4/0x200
[  150.209642]  [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[  150.211305]  [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[  150.216908]  [<ffffffff81128d14>] ? ctx_sched_in.isra.57+0xe4/0x2f0
[  150.221069]  [<ffffffff811c6d84>] ? rw_verify_area+0x54/0x150
[  150.222570]  [<ffffffff811c7363>] do_readv_writev+0x223/0x450
[  150.229044]  [<ffffffff811290b7>] ? perf_event_context_sched_in.isra.61+0x127/0x180
[  150.231020]  [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[  150.231862]  [<ffffffff811c7140>] ? vfs_write+0x1e0/0x1e0
[  150.232583]  [<ffffffff81129b7a>] ? __perf_event_task_sched_in+0x5a/0xa0
[  150.233471]  [<ffffffff810aee08>] ? finish_task_switch+0xa8/0x1b0
[  150.234282]  [<ffffffff8161588b>] ? __schedule+0x3db/0xc90
[  150.235020]  [<ffffffff81616197>] ? schedule+0x57/0xd0
[  150.235709]  [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[  150.236493]  [<ffffffff81618a0f>] ? __mutex_lock_slowpath+0x1df/0x200
[  150.239059]  [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[  150.242120]  [<ffffffff81618830>] ? __ww_mutex_lock_interruptible+0xe0/0xe0
[  150.245274]  [<ffffffff810ecfd3>] ? hrtimer_start+0x13/0x20
[  150.247527]  [<ffffffff810ee54d>] ? do_setitimer+0x30d/0x400
[  150.249351]  [<ffffffff811c764e>] vfs_writev+0x4e/0x70
[  150.250378]  [<ffffffff811c8348>] SyS_writev+0xa8/0x140
[  150.251545]  [<ffffffff811c82a0>] ? SyS_readv+0x140/0x140
[  150.253208]  [<ffffffff8161c2ae>] system_call_fastpath+0x12/0x71
[  150.256990] ---[ end trace 4f640ea141ed3d61 ]---
[  150.259076] 9pnet: req->status: 4

> BTW, the loop in there (
>                 /* check again since original check was outside of lock */
>                 while (tag >= c->max_tag) {
> ) looks fishy.  If we get more than P9_ROW_MAXTAG allocations at once,
> we'll have trouble, but I doubt that this is what we are hitting.  In any
> case, adding WARN_ON(c->req[row]); right after

I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
will trigger all the time.



>                         row = (tag / P9_ROW_MAXTAG);
> wouldn't hurt.  I would be very surprised if that one triggered, though.
> 


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 12:19                                     ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02 12:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 07/02/2015 11:42 AM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote:
>> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote:
>>> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing
>>> this under kvmtool as well. It just takes a bit longer to reproduce
>>> this in kvmtool.
>>>
>>>> The bug I suspected to be the cause of that is in tag allocation in
>>>> net/9p/client.c - we could end up wrapping around 2^16 with enough pending
>>>> requests and that would have triggered that kind of mess.  However, Andrey
>>>> doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger.
>>>> BTW, was that on the run where debugging printk in p9_client_write() *did*
>>>> trigger?
>>>
>>> Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger,
>>> but debug printk in p9_client_write() *did* trigger.
>>
>> Bloody wonderful...  Could you check if v9fs_write() in qemu
>> hw/9pfs/virtio-9p.c ever gets to
>>     offset = 7;
>>     err = pdu_marshal(pdu, offset, "d", total);
>> with total > count on your testcase?

Added:
+    if (total > count)
+           *(char *)0 = 0

and never hit this condition.

> 
> Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c)
> before
>         req->status = REQ_STATUS_ALLOC;
> check that req->status == REQ_STATUS_IDLE and yell if it isn't.
> 

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..16a17a0 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -286,6 +286,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
        p9pdu_reset(req->rc);

        req->tc->tag = tag-1;
+       if (WARN_ON(req->status != REQ_STATUS_IDLE))
+               pr_err("req->status: %d\n", req->status);
        req->status = REQ_STATUS_ALLOC;

        return req;

[  150.155020] ------------[ cut here ]------------
[  150.156700] WARNING: CPU: 2 PID: 2304 at ../net/9p/client.c:289 p9_client_prepare_req+0x3b0/0x550()
[  150.158404] Modules linked in:
[  150.160177] CPU: 2 PID: 2304 Comm: trinity-c84 Not tainted 4.1.0-rc8+ #409
[  150.161794] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  150.165540]  0000000000000009 ffff8801ed13f7a8 ffffffff8161434b 0000000000000000
[  150.170939]  0000000000000000 ffff8801ed13f7f8 ffffffff8107cf99 ffff8801f451d5f0
[  150.175942]  ffffffff815f6760 0000000000000003 ffff8800bbac00e0 ffff8800bbac00f0
[  150.178393] Call Trace:
[  150.178883]  [<ffffffff8161434b>] dump_stack+0x45/0x57
[  150.179914]  [<ffffffff8107cf99>] warn_slowpath_common+0x99/0xe0
[  150.181375]  [<ffffffff815f6760>] ? p9_client_prepare_req+0x3b0/0x550
[  150.182597]  [<ffffffff8107d145>] warn_slowpath_null+0x15/0x20
[  150.184067]  [<ffffffff815f6760>] p9_client_prepare_req+0x3b0/0x550
[  150.185043]  [<ffffffff815fb1e2>] p9_client_zc_rpc.constprop.5+0xe2/0x730
[  150.186229]  [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[  150.187049]  [<ffffffff815fb100>] ? p9_client_xattrwalk+0x1b0/0x1b0
[  150.188477]  [<ffffffff812b1b5e>] ? idr_remove+0x2ce/0x420
[  150.189443]  [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[  150.190456]  [<ffffffff812b1890>] ? idr_mark_full+0x80/0x80
[  150.191489]  [<ffffffff815f5af0>] ? v9fs_unregister_trans+0x70/0x70
[  150.193911]  [<ffffffff8161bae7>] ? _raw_spin_unlock_irqrestore+0x47/0xb0
[  150.195462]  [<ffffffff815fcae4>] ? p9_idpool_put+0x54/0x60
[  150.196729]  [<ffffffff812c9906>] ? iov_iter_advance+0xb6/0x240
[  150.199766]  [<ffffffff815fbea3>] p9_client_write+0x333/0x3d0
[  150.201073]  [<ffffffff811a882e>] ? kasan_kmalloc+0x5e/0x70
[  150.202512]  [<ffffffff815fbb70>] ? p9_client_readdir+0x340/0x340
[  150.204115]  [<ffffffff811c80dd>] ? rw_copy_check_uvector+0xed/0x170
[  150.204960]  [<ffffffff812d82d6>] ? __percpu_counter_add+0x26/0xb0
[  150.206517]  [<ffffffff8113574a>] ? generic_write_checks+0xfa/0x1e0
[  150.208092]  [<ffffffff8125c054>] v9fs_file_write_iter+0xc4/0x200
[  150.209642]  [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[  150.211305]  [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[  150.216908]  [<ffffffff81128d14>] ? ctx_sched_in.isra.57+0xe4/0x2f0
[  150.221069]  [<ffffffff811c6d84>] ? rw_verify_area+0x54/0x150
[  150.222570]  [<ffffffff811c7363>] do_readv_writev+0x223/0x450
[  150.229044]  [<ffffffff811290b7>] ? perf_event_context_sched_in.isra.61+0x127/0x180
[  150.231020]  [<ffffffff8125bf90>] ? v9fs_file_lock_dotl+0x3d0/0x3d0
[  150.231862]  [<ffffffff811c7140>] ? vfs_write+0x1e0/0x1e0
[  150.232583]  [<ffffffff81129b7a>] ? __perf_event_task_sched_in+0x5a/0xa0
[  150.233471]  [<ffffffff810aee08>] ? finish_task_switch+0xa8/0x1b0
[  150.234282]  [<ffffffff8161588b>] ? __schedule+0x3db/0xc90
[  150.235020]  [<ffffffff81616197>] ? schedule+0x57/0xd0
[  150.235709]  [<ffffffff8161ba46>] ? _raw_spin_unlock+0x16/0x70
[  150.236493]  [<ffffffff81618a0f>] ? __mutex_lock_slowpath+0x1df/0x200
[  150.239059]  [<ffffffff811c9840>] ? __sb_end_write+0x80/0x80
[  150.242120]  [<ffffffff81618830>] ? __ww_mutex_lock_interruptible+0xe0/0xe0
[  150.245274]  [<ffffffff810ecfd3>] ? hrtimer_start+0x13/0x20
[  150.247527]  [<ffffffff810ee54d>] ? do_setitimer+0x30d/0x400
[  150.249351]  [<ffffffff811c764e>] vfs_writev+0x4e/0x70
[  150.250378]  [<ffffffff811c8348>] SyS_writev+0xa8/0x140
[  150.251545]  [<ffffffff811c82a0>] ? SyS_readv+0x140/0x140
[  150.253208]  [<ffffffff8161c2ae>] system_call_fastpath+0x12/0x71
[  150.256990] ---[ end trace 4f640ea141ed3d61 ]---
[  150.259076] 9pnet: req->status: 4

> BTW, the loop in there (
>                 /* check again since original check was outside of lock */
>                 while (tag >= c->max_tag) {
> ) looks fishy.  If we get more than P9_ROW_MAXTAG allocations at once,
> we'll have trouble, but I doubt that this is what we are hitting.  In any
> case, adding WARN_ON(c->req[row]); right after

I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
will trigger all the time.



>                         row = (tag / P9_ROW_MAXTAG);
> wouldn't hurt.  I would be very surprised if that one triggered, though.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 16:43                                       ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02 16:43 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote:

> Added:
> +    if (total > count)
> +           *(char *)0 = 0
> 
> and never hit this condition.
> 

OK, so it's definitely a mismatched response.

>         req->tc->tag = tag-1;
> +       if (WARN_ON(req->status != REQ_STATUS_IDLE))
> +               pr_err("req->status: %d\n", req->status);
>         req->status = REQ_STATUS_ALLOC;
> 
>         return req;

> [  150.259076] 9pnet: req->status: 4

IOW, REQ_STATUS_RCVD.  Hmm...  Stray tag seen by req_done() after we'd already
freed the tag in question?  That, or it really would have to had wrapped
around...  Note that req_done() does *not* check anything about the req -
not even that p9_tag_lookup() hasn't returned NULL, so a server sending you
any response tagged with number well above anything you'd ever sent will
reliably oops you.

Frankly, the whole thing needs fuzzing from the server side - start throwing
crap at the client and see how badly does it get fucked...  Folks, it's
a network protocol, with userland servers, no less.  You *can't* assume
them competent and non-malicious...

How much traffic does it take to reproduce that fun, BTW?  IOW, is attempting
to log the sequence of tag {allocation,freeing}/tag of packet being {sent,
received} something completely suicidal, or is it more or less feasible?

> I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
> will trigger all the time.

????
                        row = (tag / P9_ROW_MAXTAG);
                        c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
                                        sizeof(struct p9_req_t), GFP_ATOMIC);

and you are seeing c->reqs[row] != NULL *BEFORE* that kcalloc()?  All the time,
no less?  Just to make sure we are on the same page - the delta against
mainline I would like tested is this:

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..fa88c9a 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 		/* check again since original check was outside of lock */
 		while (tag >= c->max_tag) {
 			row = (tag / P9_ROW_MAXTAG);
+
+			WARN_ON_ONCE(c->reqs[row]);	// are we about to leak?
+
 			c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
 					sizeof(struct p9_req_t), GFP_ATOMIC);
 
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 	p9pdu_reset(req->rc);
 
 	req->tc->tag = tag-1;
+	if (req->status != REQ_STATUS_IDLE)
+		pr_err("using tag %d with odd status (%d)", tag, req->status);
 	req->status = REQ_STATUS_ALLOC;
 
 	return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	 * the other thread wakes up will indeed be seen by the waiting side.
 	 */
 	smp_wmb();
+	if (req->status == REQ_STATUS_IDLE)
+		pr_err("late delivery, tag %d already freed", req->tc->tag);
 	req->status = status;
 
 	wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 		tag = p9_idpool_get(c->tagpool);
 		if (tag < 0)
 			return ERR_PTR(-ENOMEM);
+		if (WARN_ON_ONCE(tag != (u16)tag)) {	// wrapped around?
+			p9_idpool_put(tag, c->tagpool);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 16:43                                       ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02 16:43 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote:

> Added:
> +    if (total > count)
> +           *(char *)0 = 0
> 
> and never hit this condition.
> 

OK, so it's definitely a mismatched response.

>         req->tc->tag = tag-1;
> +       if (WARN_ON(req->status != REQ_STATUS_IDLE))
> +               pr_err("req->status: %d\n", req->status);
>         req->status = REQ_STATUS_ALLOC;
> 
>         return req;

> [  150.259076] 9pnet: req->status: 4

IOW, REQ_STATUS_RCVD.  Hmm...  Stray tag seen by req_done() after we'd already
freed the tag in question?  That, or it really would have to had wrapped
around...  Note that req_done() does *not* check anything about the req -
not even that p9_tag_lookup() hasn't returned NULL, so a server sending you
any response tagged with number well above anything you'd ever sent will
reliably oops you.

Frankly, the whole thing needs fuzzing from the server side - start throwing
crap at the client and see how badly does it get fucked...  Folks, it's
a network protocol, with userland servers, no less.  You *can't* assume
them competent and non-malicious...

How much traffic does it take to reproduce that fun, BTW?  IOW, is attempting
to log the sequence of tag {allocation,freeing}/tag of packet being {sent,
received} something completely suicidal, or is it more or less feasible?

> I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
> will trigger all the time.

????
                        row = (tag / P9_ROW_MAXTAG);
                        c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
                                        sizeof(struct p9_req_t), GFP_ATOMIC);

and you are seeing c->reqs[row] != NULL *BEFORE* that kcalloc()?  All the time,
no less?  Just to make sure we are on the same page - the delta against
mainline I would like tested is this:

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..fa88c9a 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 		/* check again since original check was outside of lock */
 		while (tag >= c->max_tag) {
 			row = (tag / P9_ROW_MAXTAG);
+
+			WARN_ON_ONCE(c->reqs[row]);	// are we about to leak?
+
 			c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
 					sizeof(struct p9_req_t), GFP_ATOMIC);
 
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 	p9pdu_reset(req->rc);
 
 	req->tc->tag = tag-1;
+	if (req->status != REQ_STATUS_IDLE)
+		pr_err("using tag %d with odd status (%d)", tag, req->status);
 	req->status = REQ_STATUS_ALLOC;
 
 	return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	 * the other thread wakes up will indeed be seen by the waiting side.
 	 */
 	smp_wmb();
+	if (req->status == REQ_STATUS_IDLE)
+		pr_err("late delivery, tag %d already freed", req->tc->tag);
 	req->status = status;
 
 	wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 		tag = p9_idpool_get(c->tagpool);
 		if (tag < 0)
 			return ERR_PTR(-ENOMEM);
+		if (WARN_ON_ONCE(tag != (u16)tag)) {	// wrapped around?
+			p9_idpool_put(tag, c->tagpool);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 12:07                         ` Jeff Layton
@ 2015-07-02 16:45                           ` Al Viro
  2015-07-02 17:01                             ` Jeff Layton
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-02 16:45 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Jul 02, 2015 at 08:07:38AM -0400, Jeff Layton wrote:

> Erm...and why is it passing in '0' to idr_alloc for the end value if it
> can't deal with more than 16 bits? That seems like a plain old bug...

Because they are using the same function (with different pool, obviously)
for FID allocation, and those are 32bit...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 16:49                                         ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02 16:49 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>  	req->tc->tag = tag-1;
> +	if (req->status != REQ_STATUS_IDLE)
> +		pr_err("using tag %d with odd status (%d)", tag, req->status);

Should be tag - 1 here, actually.

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..9719886 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 		/* check again since original check was outside of lock */
 		while (tag >= c->max_tag) {
 			row = (tag / P9_ROW_MAXTAG);
+
+			WARN_ON_ONCE(c->reqs[row]);	// are we about to leak?
+
 			c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
 					sizeof(struct p9_req_t), GFP_ATOMIC);
 
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 	p9pdu_reset(req->rc);
 
 	req->tc->tag = tag-1;
+	if (req->status != REQ_STATUS_IDLE)
+		pr_err("using tag %d with odd status (%d)", tag - 1, req->status);
 	req->status = REQ_STATUS_ALLOC;
 
 	return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	 * the other thread wakes up will indeed be seen by the waiting side.
 	 */
 	smp_wmb();
+	if (req->status == REQ_STATUS_IDLE)
+		pr_err("late delivery, tag %d already freed", req->tc->tag);
 	req->status = status;
 
 	wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 		tag = p9_idpool_get(c->tagpool);
 		if (tag < 0)
 			return ERR_PTR(-ENOMEM);
+		if (WARN_ON_ONCE(tag != (u16)tag)) {	// wrapped around?
+			p9_idpool_put(tag, c->tagpool);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-02 16:49                                         ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02 16:49 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>  	req->tc->tag = tag-1;
> +	if (req->status != REQ_STATUS_IDLE)
> +		pr_err("using tag %d with odd status (%d)", tag, req->status);

Should be tag - 1 here, actually.

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..9719886 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -248,6 +248,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 		/* check again since original check was outside of lock */
 		while (tag >= c->max_tag) {
 			row = (tag / P9_ROW_MAXTAG);
+
+			WARN_ON_ONCE(c->reqs[row]);	// are we about to leak?
+
 			c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
 					sizeof(struct p9_req_t), GFP_ATOMIC);
 
@@ -286,6 +289,8 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
 	p9pdu_reset(req->rc);
 
 	req->tc->tag = tag-1;
+	if (req->status != REQ_STATUS_IDLE)
+		pr_err("using tag %d with odd status (%d)", tag - 1, req->status);
 	req->status = REQ_STATUS_ALLOC;
 
 	return req;
@@ -425,6 +430,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	 * the other thread wakes up will indeed be seen by the waiting side.
 	 */
 	smp_wmb();
+	if (req->status == REQ_STATUS_IDLE)
+		pr_err("late delivery, tag %d already freed", req->tc->tag);
 	req->status = status;
 
 	wake_up(req->wq);
@@ -693,6 +700,10 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 		tag = p9_idpool_get(c->tagpool);
 		if (tag < 0)
 			return ERR_PTR(-ENOMEM);
+		if (WARN_ON_ONCE(tag != (u16)tag)) {	// wrapped around?
+			p9_idpool_put(tag, c->tagpool);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	req = p9_tag_alloc(c, tag, req_size);
@@ -1647,7 +1658,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 16:45                           ` Al Viro
@ 2015-07-02 17:01                             ` Jeff Layton
  2015-07-02 17:56                               ` Dominique Martinet
  2015-07-02 18:40                               ` Al Viro
  0 siblings, 2 replies; 69+ messages in thread
From: Jeff Layton @ 2015-07-02 17:01 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, 2 Jul 2015 17:45:35 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Thu, Jul 02, 2015 at 08:07:38AM -0400, Jeff Layton wrote:
> 
> > Erm...and why is it passing in '0' to idr_alloc for the end value if it
> > can't deal with more than 16 bits? That seems like a plain old bug...
> 
> Because they are using the same function (with different pool, obviously)
> for FID allocation, and those are 32bit...

Ahh, right...

So p9_idpool_create should take an argument for the "end" value, and
then store that in a new field in p9_idpool. Then they can pass that in
as the "end" parm in idr_alloc. Or, they could give up using the same
function there and use a different one for tags and FIDs.

In any case...allowing this thing to allocate tag values that can
collide seems fundamentally wrong. Using idr_alloc_cyclic might also
not hurt either, particularly given that these tag values are supposed
to function something like an XID and you probably don't want to be
reusing them too quickly.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 17:01                             ` Jeff Layton
@ 2015-07-02 17:56                               ` Dominique Martinet
  2015-07-02 18:43                                 ` Al Viro
                                                   ` (2 more replies)
  2015-07-02 18:40                               ` Al Viro
  1 sibling, 3 replies; 69+ messages in thread
From: Dominique Martinet @ 2015-07-02 17:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

Jeff Layton wrote on Thu, Jul 02, 2015:
> So p9_idpool_create should take an argument for the "end" value, and
> then store that in a new field in p9_idpool. Then they can pass that in
> as the "end" parm in idr_alloc. Or, they could give up using the same
> function there and use a different one for tags and FIDs.
> 
> In any case...allowing this thing to allocate tag values that can
> collide seems fundamentally wrong. Using idr_alloc_cyclic might also
> not hurt either, particularly given that these tag values are supposed
> to function something like an XID and you probably don't want to be
> reusing them too quickly.

Using cache=none here so behavious is likely different with cache, but
basically you can't get more than one tag per user thread accessing the
9P mount...
And in RDMA there's a credit so I can't get past whatever sq option was
given (defaults to 32) -- tbh even with other transports I doubt it's
going to get much higher.

Still definitely needs fixing, but I think the issue is somewhere
else... If Andrey could share the workload he uses I can try with other
servers, would be nice if we can rule a qemu bug out completely :)

-- 
Dominique

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 17:01                             ` Jeff Layton
  2015-07-02 17:56                               ` Dominique Martinet
@ 2015-07-02 18:40                               ` Al Viro
  2015-07-02 19:16                                 ` Linus Torvalds
  1 sibling, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-02 18:40 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Jul 02, 2015 at 01:01:39PM -0400, Jeff Layton wrote:

> So p9_idpool_create should take an argument for the "end" value, and
> then store that in a new field in p9_idpool. Then they can pass that in
> as the "end" parm in idr_alloc. Or, they could give up using the same
> function there and use a different one for tags and FIDs.
> 
> In any case...allowing this thing to allocate tag values that can
> collide seems fundamentally wrong. Using idr_alloc_cyclic might also
> not hurt either, particularly given that these tag values are supposed
> to function something like an XID and you probably don't want to be
> reusing them too quickly.

All they are used for is matching response to request.  Basically, you
can have up to 65535 pending requests.  Reusing it right after getting
the response is fine.

Keep in mind that it's not supposed to be used on top of something like
UDP - *all* retransmits, etc., are responsibility of transport.  It's
connection-oriented, reliable and order-preserving, with a shitload of state
tied to connection, starting with FIDs - unlike FHANDLE, FID meaning depends
upon connection history.  Tags are even more transient.

Basically, the rules are
	* request bears a 16bit tag.
	* server can process pending requests in any order (with one exception)
and it must put the same tag into responses.
	* client can ask to abort a pending request by issuing Tflush[old_tag];
	* server must handle Tflush immediately.  It must drop any pending
request matching old_tag and send Rflush confirming that.  No response to
any request matching old_tag sent before Tflush should be issued after issuing
Rflush.
	* if client has not issued Tflush, it must not reuse the tag until
getting a response bearing that tag.
	* if client *has* issued Tflush, it must not reuse the tag until
getting Rflush, even if it does get response to the request it has tried to
abort. 

BTW, failure to send Tflush means that we should leave the tag in use,
period.  As it is, p9_client_rpc()/p9_client_zc_rpc() ignore such
situations - failure from p9_client_flush() is simply not noticed.
I seriously doubt that this is what we are hitting here, but it's a bug
all the same.

We also must _not_ let p9_client_cb() do anything unless req is non-NULL
and req->status is REQ_STATUS_SENT - stray tags from server shouldn't
be acted upon.  As it is, the whole thing is trivial to oops - just have
server send _any_ R-message with something like 0xfff0 for tag.  End of
story, p9_tag_lookup() returns NULL, p9_client_cb() oopses.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 17:56                               ` Dominique Martinet
@ 2015-07-02 18:43                                 ` Al Viro
  2015-07-02 21:00                                   ` Dominique Martinet
  2015-07-02 18:59                                 ` Jeff Layton
  2015-07-02 20:36                                 ` Andrey Ryabinin
  2 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-02 18:43 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jeff Layton, Andrey Ryabinin, Linus Torvalds, linux-kernel,
	linux-fsdevel

On Thu, Jul 02, 2015 at 07:56:29PM +0200, Dominique Martinet wrote:

> Using cache=none here so behavious is likely different with cache, but
> basically you can't get more than one tag per user thread accessing the
> 9P mount...

Yes, and...?  You can get a lot more than one user thread...  Andrey is
using trinity(1) on client, and that's *definitely* not single-threaded -
the whole point is stressing the damn thing.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 17:56                               ` Dominique Martinet
  2015-07-02 18:43                                 ` Al Viro
@ 2015-07-02 18:59                                 ` Jeff Layton
  2015-07-02 20:36                                 ` Andrey Ryabinin
  2 siblings, 0 replies; 69+ messages in thread
From: Jeff Layton @ 2015-07-02 18:59 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Al Viro, Andrey Ryabinin, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, 2 Jul 2015 19:56:29 +0200
Dominique Martinet <dominique.martinet@cea.fr> wrote:

> Jeff Layton wrote on Thu, Jul 02, 2015:
> > So p9_idpool_create should take an argument for the "end" value, and
> > then store that in a new field in p9_idpool. Then they can pass that in
> > as the "end" parm in idr_alloc. Or, they could give up using the same
> > function there and use a different one for tags and FIDs.
> > 
> > In any case...allowing this thing to allocate tag values that can
> > collide seems fundamentally wrong. Using idr_alloc_cyclic might also
> > not hurt either, particularly given that these tag values are supposed
> > to function something like an XID and you probably don't want to be
> > reusing them too quickly.
> 
> Using cache=none here so behavious is likely different with cache, but
> basically you can't get more than one tag per user thread accessing the
> 9P mount...
> And in RDMA there's a credit so I can't get past whatever sq option was
> given (defaults to 32) -- tbh even with other transports I doubt it's
> going to get much higher.
> 
> Still definitely needs fixing, but I think the issue is somewhere
> else... If Andrey could share the workload he uses I can try with other
> servers, would be nice if we can rule a qemu bug out completely :)
> 

Fair enough...

If you're in there and decide to fix this up, then consider moving this
over to IDA instead of IDR. The pointers stored are not terribly
interesting (always the same as the p9_idpool), so by doing that you'll
save quite a bit of memory as well.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 18:40                               ` Al Viro
@ 2015-07-02 19:16                                 ` Linus Torvalds
  2015-07-02 20:44                                   ` Al Viro
  0 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2015-07-02 19:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Jeff Layton, Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Thu, Jul 2, 2015 at 11:40 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> All they are used for is matching response to request.  Basically, you
> can have up to 65535 pending requests.  Reusing it right after getting
> the response is fine.

Reusing a tag right after getting the completion may be fine in
theory, but it still sounds like a bad idea. Sure, it's used to match
the command with the reply, but using those kinds of things for
matching re-sends and to index into various "current data structures"
is also very common (not having looked at p9 I don't know how much it
does), and basically reusing tags "soon" tends to make those kidns of
things fragile.

Which can easily turn a "this _should_ work" into "it doesn't
_actually_ work" just because it ends up making things like race
conditions and re-ordering of replies trigger worse behavior. For
example, things like "done with previous command X" and "now starting
new command X" - if the tag is the same and those *independent*
messages get re-ordered, the tag just failed in what it was supposed
to do.

So circular allocators are likely a good idea even if there are other
layers that should handle retransmits etc.

So it does sound like it would be better to use a circular tag
allocator rather than a "lowest tag first" allocator.

            Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
  2015-07-02 16:43                                       ` Al Viro
  (?)
  (?)
@ 2015-07-02 20:26                                       ` Andrey Ryabinin
  -1 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02 20:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen, linux-nfs

2015-07-02 19:43 GMT+03:00 Al Viro <viro@zeniv.linux.org.uk>:
> On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote:
>
>> Added:
>> +    if (total > count)
>> +           *(char *)0 = 0
>>
>> and never hit this condition.
>>
>
> OK, so it's definitely a mismatched response.
>
>>         req->tc->tag = tag-1;
>> +       if (WARN_ON(req->status != REQ_STATUS_IDLE))
>> +               pr_err("req->status: %d\n", req->status);
>>         req->status = REQ_STATUS_ALLOC;
>>
>>         return req;
>
>> [  150.259076] 9pnet: req->status: 4
>
> IOW, REQ_STATUS_RCVD.  Hmm...  Stray tag seen by req_done() after we'd already
> freed the tag in question?  That, or it really would have to had wrapped
> around...  Note that req_done() does *not* check anything about the req -
> not even that p9_tag_lookup() hasn't returned NULL, so a server sending you
> any response tagged with number well above anything you'd ever sent will
> reliably oops you.
>
> Frankly, the whole thing needs fuzzing from the server side - start throwing
> crap at the client and see how badly does it get fucked...  Folks, it's
> a network protocol, with userland servers, no less.  You *can't* assume
> them competent and non-malicious...
>
> How much traffic does it take to reproduce that fun, BTW?  IOW, is attempting
> to log the sequence of tag {allocation,freeing}/tag of packet being {sent,
> received} something completely suicidal, or is it more or less feasible?
>

No idea. Usually it takes 1-2 minutes after trinity (100 threads) starts.

>> I didn't get this. c->reqs[row] is always non-NULL as it should be, so this warning
>> will trigger all the time.
>
> ????
>                         row = (tag / P9_ROW_MAXTAG);
>                         c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
>                                         sizeof(struct p9_req_t), GFP_ATOMIC);
>
> and you are seeing c->reqs[row] != NULL *BEFORE* that kcalloc()?  All the time,
> no less?  Just to make sure we are on the same page - the delta against
> mainline I would like tested is this:
>

Ah, I was looking at the second ' row = tag / P9_ROW_MAXTAG;' line
which is after kcalloc().
I'll check tomorrow then.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 17:56                               ` Dominique Martinet
  2015-07-02 18:43                                 ` Al Viro
  2015-07-02 18:59                                 ` Jeff Layton
@ 2015-07-02 20:36                                 ` Andrey Ryabinin
  2 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-02 20:36 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Jeff Layton, Al Viro, Andrey Ryabinin, Linus Torvalds, LKML,
	linux-fsdevel

2015-07-02 20:56 GMT+03:00 Dominique Martinet <dominique.martinet@cea.fr>:
>
> Still definitely needs fixing, but I think the issue is somewhere
> else... If Andrey could share the workload he uses I can try with other
> servers, would be nice if we can rule a qemu bug out completely :)
>

I simply run trinity from 9p rootfs:
         ./trinity  -qqq --dangerous -xinit_module -C100

qemu guest, virtio transport.

BTW, I've discovered that all this bogus writes comes from trinity
logger (log.c)
which just opens one file per thread (trinity-childX.log) and writes to it.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 19:16                                 ` Linus Torvalds
@ 2015-07-02 20:44                                   ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-02 20:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Layton, Andrey Ryabinin, Linux Kernel Mailing List, linux-fsdevel

On Thu, Jul 02, 2015 at 12:16:14PM -0700, Linus Torvalds wrote:
> On Thu, Jul 2, 2015 at 11:40 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > All they are used for is matching response to request.  Basically, you
> > can have up to 65535 pending requests.  Reusing it right after getting
> > the response is fine.
> 
> Reusing a tag right after getting the completion may be fine in
> theory, but it still sounds like a bad idea. Sure, it's used to match
> the command with the reply, but using those kinds of things for
> matching re-sends and to index into various "current data structures"
> is also very common (not having looked at p9 I don't know how much it
> does), and basically reusing tags "soon" tends to make those kidns of
> things fragile.

_All_ retransmits are done in transport layer there.  It's not NFS - it
really expects reliable ordered connection for transport.  No retransmits,
no duplicates, etc.  I'm not dead against circular allocation, but I would
really like to figure out what's going on first.

I still wonder if we are seeing wraparound (should've posted a diff instead
of verbal description - mea culpa).  If we are not, it smells like response
to request having arrived while the tag had been not in use from the client
POV, _or_ buggered barriers of some kind.  Maybe buggered ordering of
replies somewhere, but that's only if Tflush had been involved (as in
-> Twrite tag = 3
-> Tflush tag = 42 old_tag = 3		<- Rwrite tag = 3
<- Rflush tag = 42
mark tag 3 free to be reused
reuse tag 3
... somehow get to seeing Rwrite only now

But I don't see where such ordering violation could've happened at the moment.
The way it's supposed to work is that the sequence
-> Twhatever tag = N
-> Tflush old_tag = N
must either end up with no response to the former arriving at all, or
arriving before the response to the latter.  Transport itself does preserve
ordering (TCP certainly would, but virtio queue also does, AFAICS) and
we really need to have p9_client_cb() called in order of arrival.

Hmm...  This is a stab in the dark, but... we have vring_interrupt() calling
req_done(), which does
        while (1) {
                spin_lock_irqsave(&chan->lock, flags);
                rc = virtqueue_get_buf(chan->vq, &len);
                if (rc == NULL) {
                        spin_unlock_irqrestore(&chan->lock, flags);
                        break;
                }
                chan->ring_bufs_avail = 1;
                spin_unlock_irqrestore(&chan->lock, flags);
                /* Wakeup if anyone waiting for VirtIO ring space. */
                wake_up(chan->vc_wq);
                p9_debug(P9_DEBUG_TRANS, ": rc %p\n", rc);
                p9_debug(P9_DEBUG_TRANS, ": lookup tag %d\n", rc->tag);
                req = p9_tag_lookup(chan->client, rc->tag);
                p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
        }

What's to prevent *another* vring_interrupt() (called from some kind of
IRQ handler) hitting on another CPU and competing with this one for the
queue?  While we are at it, both p9_tag_lookup() and p9_client_cb()
should be find with being called under spin_lock_irqsave, so why not
hold it outside of the loop?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2015-07-02 18:43                                 ` Al Viro
@ 2015-07-02 21:00                                   ` Dominique Martinet
  0 siblings, 0 replies; 69+ messages in thread
From: Dominique Martinet @ 2015-07-02 21:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Jeff Layton, Andrey Ryabinin, Linus Torvalds, linux-kernel,
	linux-fsdevel

Al Viro wrote on Thu, Jul 02, 2015:
> On Thu, Jul 02, 2015 at 07:56:29PM +0200, Dominique Martinet wrote:
> > Using cache=none here so behavious is likely different with cache, but
> > basically you can't get more than one tag per user thread accessing the
> > 9P mount...
> 
> Yes, and...?  You can get a lot more than one user thread...  Andrey is
> using trinity(1) on client, and that's *definitely* not single-threaded -
> the whole point is stressing the damn thing.

I have run trinity quite a bit and it doesn't fork bomb as far as I can
recall, with him running it with -C100 we're not quite at 2^16 yet?

I do agree it's a problem, just don't think it's the one we're
hitting -- I'll try again on a recent kernel to see if anything changed
with rdma/tcp as well, but I'm starting to doubt I'll get any luck with
anything other than virtio; which doesn't really help since it's not the
same order of latencies.

FWIW I don't *think* trinity can issue TFlush either without user
interaction, that's a really special call. It can only happen in rpc()
or zc_rpc() if it's interrupted by ERESTARTSYS which I understand as ^C?



(I'll look into making the pools use IDA unless someone else steps up,
sure. Thanks Jeff)

-- 
Dominique

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-03  8:19                                           ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-03  8:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On 07/02/2015 07:49 PM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>>  	req->tc->tag = tag-1;
>> +	if (req->status != REQ_STATUS_IDLE)
>> +		pr_err("using tag %d with odd status (%d)", tag, req->status);
> 
> Should be tag - 1 here, actually.

So, with this change:

[   81.654695] 9pnet: late delivery, tag 2 already freed
[   81.655186] 9pnet: mismatched reply [tag = 2]
[   85.014795] 9pnet: late delivery, tag 4 already freed
[   85.015753] 9pnet: late delivery, tag 4 already freed
[   85.016473] 9pnet: late delivery, tag 4 already freed
[   87.275842] 9pnet: mismatched reply [tag = 5]
[   88.943223] 9pnet: late delivery, tag 4 already freed
[   88.943790] 9pnet: late delivery, tag 4 already freed
[   88.944320] 9pnet: late delivery, tag 4 already freed
[   88.945050] 9pnet: mismatched reply [tag = 4]
[  105.551220] 9pnet: late delivery, tag 3 already freed
[  105.552065] 9pnet: late delivery, tag 3 already freed
[  114.528467] 9pnet: late delivery, tag 4 already freed
[  164.791295] 9pnet: late delivery, tag 2 already freed
[  164.792156] 9pnet: using tag 2 with odd status (4)
[  164.864398] 9pnet: mismatched reply [tag = 4]
[  166.419993] 9pnet: mismatched reply [tag = 3]
[  174.062625] 9pnet: late delivery, tag 3 already freed
[  174.063121] 9pnet: late delivery, tag 3 already freed
[  174.063637] 9pnet: late delivery, tag 3 already freed
[  174.064018] 9pnet: late delivery, tag 3 already freed
[  174.064547] 9pnet: mismatched reply [tag = 3]
[  175.418729] 9pnet: late delivery, tag 3 already freed
[  177.911727] 9pnet: mismatched reply [tag = 1]
[  235.748520] 9pnet: late delivery, tag 1 already freed
[  235.749595] 9pnet: using tag 1 with odd status (4)
[  243.496782] 9pnet: late delivery, tag 6 already freed
[  243.497697] 9pnet: late delivery, tag 6 already freed
[  243.499079] 9pnet: mismatched reply [tag = 6]
[  243.736388] 9pnet: late delivery, tag 1 already freed
[  243.740019] 9pnet: late delivery, tag 1 already freed
[  248.337712] 9pnet: late delivery, tag 2 already freed
[  249.872950] 9pnet: mismatched reply [tag = 9]
[  252.448858] 9pnet: mismatched reply [tag = 8]
[  252.668878] 9pnet: mismatched reply [tag = 11]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
@ 2015-07-03  8:19                                           ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-03  8:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 07/02/2015 07:49 PM, Al Viro wrote:
> On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
>>  	req->tc->tag = tag-1;
>> +	if (req->status != REQ_STATUS_IDLE)
>> +		pr_err("using tag %d with odd status (%d)", tag, req->status);
> 
> Should be tag - 1 here, actually.

So, with this change:

[   81.654695] 9pnet: late delivery, tag 2 already freed
[   81.655186] 9pnet: mismatched reply [tag = 2]
[   85.014795] 9pnet: late delivery, tag 4 already freed
[   85.015753] 9pnet: late delivery, tag 4 already freed
[   85.016473] 9pnet: late delivery, tag 4 already freed
[   87.275842] 9pnet: mismatched reply [tag = 5]
[   88.943223] 9pnet: late delivery, tag 4 already freed
[   88.943790] 9pnet: late delivery, tag 4 already freed
[   88.944320] 9pnet: late delivery, tag 4 already freed
[   88.945050] 9pnet: mismatched reply [tag = 4]
[  105.551220] 9pnet: late delivery, tag 3 already freed
[  105.552065] 9pnet: late delivery, tag 3 already freed
[  114.528467] 9pnet: late delivery, tag 4 already freed
[  164.791295] 9pnet: late delivery, tag 2 already freed
[  164.792156] 9pnet: using tag 2 with odd status (4)
[  164.864398] 9pnet: mismatched reply [tag = 4]
[  166.419993] 9pnet: mismatched reply [tag = 3]
[  174.062625] 9pnet: late delivery, tag 3 already freed
[  174.063121] 9pnet: late delivery, tag 3 already freed
[  174.063637] 9pnet: late delivery, tag 3 already freed
[  174.064018] 9pnet: late delivery, tag 3 already freed
[  174.064547] 9pnet: mismatched reply [tag = 3]
[  175.418729] 9pnet: late delivery, tag 3 already freed
[  177.911727] 9pnet: mismatched reply [tag = 1]
[  235.748520] 9pnet: late delivery, tag 1 already freed
[  235.749595] 9pnet: using tag 1 with odd status (4)
[  243.496782] 9pnet: late delivery, tag 6 already freed
[  243.497697] 9pnet: late delivery, tag 6 already freed
[  243.499079] 9pnet: mismatched reply [tag = 6]
[  243.736388] 9pnet: late delivery, tag 1 already freed
[  243.740019] 9pnet: late delivery, tag 1 already freed
[  248.337712] 9pnet: late delivery, tag 2 already freed
[  249.872950] 9pnet: mismatched reply [tag = 9]
[  252.448858] 9pnet: mismatched reply [tag = 8]
[  252.668878] 9pnet: mismatched reply [tag = 11]
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: running out of tags in 9P (was Re: [git pull] vfs part 2)
  2015-07-03  8:19                                           ` Andrey Ryabinin
  (?)
@ 2015-07-03  9:42                                           ` Al Viro
  2015-07-03 15:00                                               ` Al Viro
  -1 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2015-07-03  9:42 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Fri, Jul 03, 2015 at 11:19:31AM +0300, Andrey Ryabinin wrote:
> On 07/02/2015 07:49 PM, Al Viro wrote:
> > On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote:
> >>  	req->tc->tag = tag-1;
> >> +	if (req->status != REQ_STATUS_IDLE)
> >> +		pr_err("using tag %d with odd status (%d)", tag, req->status);
> > 
> > Should be tag - 1 here, actually.
> 
> So, with this change:
> 
> [   81.654695] 9pnet: late delivery, tag 2 already freed
> [   81.655186] 9pnet: mismatched reply [tag = 2]
> [   85.014795] 9pnet: late delivery, tag 4 already freed
> [   85.015753] 9pnet: late delivery, tag 4 already freed
> [   85.016473] 9pnet: late delivery, tag 4 already freed
> [   87.275842] 9pnet: mismatched reply [tag = 5]
> [   88.943223] 9pnet: late delivery, tag 4 already freed
> [   88.943790] 9pnet: late delivery, tag 4 already freed
> [   88.944320] 9pnet: late delivery, tag 4 already freed
> [   88.945050] 9pnet: mismatched reply [tag = 4]
> [  105.551220] 9pnet: late delivery, tag 3 already freed
> [  105.552065] 9pnet: late delivery, tag 3 already freed
> [  114.528467] 9pnet: late delivery, tag 4 already freed
> [  164.791295] 9pnet: late delivery, tag 2 already freed
> [  164.792156] 9pnet: using tag 2 with odd status (4)
> [  164.864398] 9pnet: mismatched reply [tag = 4]
> [  166.419993] 9pnet: mismatched reply [tag = 3]
> [  174.062625] 9pnet: late delivery, tag 3 already freed
> [  174.063121] 9pnet: late delivery, tag 3 already freed
> [  174.063637] 9pnet: late delivery, tag 3 already freed
> [  174.064018] 9pnet: late delivery, tag 3 already freed
> [  174.064547] 9pnet: mismatched reply [tag = 3]
> [  175.418729] 9pnet: late delivery, tag 3 already freed
> [  177.911727] 9pnet: mismatched reply [tag = 1]
> [  235.748520] 9pnet: late delivery, tag 1 already freed
> [  235.749595] 9pnet: using tag 1 with odd status (4)
> [  243.496782] 9pnet: late delivery, tag 6 already freed
> [  243.497697] 9pnet: late delivery, tag 6 already freed
> [  243.499079] 9pnet: mismatched reply [tag = 6]
> [  243.736388] 9pnet: late delivery, tag 1 already freed
> [  243.740019] 9pnet: late delivery, tag 1 already freed
> [  248.337712] 9pnet: late delivery, tag 2 already freed
> [  249.872950] 9pnet: mismatched reply [tag = 9]
> [  252.448858] 9pnet: mismatched reply [tag = 8]
> [  252.668878] 9pnet: mismatched reply [tag = 11]

FWIW, I'd reproduced it here as well.  With the addition of checking in
p9_client_cb() whether the tag is already freed in IDR, reporting ->status
when it's not "sent" and reporting Tflush, I've got this:

[ 2754.713015] 9pnet: flushing 1
[ 2755.516123] 9pnet: flush 1 [3]done
[ 2758.443265] 9pnet: flushing 16
[ 2768.655768] 9pnet: flush 16 [8]done
[ 2783.939538] 9pnet: flushing 30
[ 2786.067856] 9pnet: flush 30 [3]done
[ 2809.784119] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2809.948681] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Here we have delivery when that sucker is into p9_free_req(), but hasn't
removed from IDR yet.

[ 2809.948681] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2816.767861] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Ditto.

[ 2816.767861] 9pnet: flushing 4
[ 2816.769484] 9pnet: flush 4 [5]done
[ 2846.327753] 9pnet: flushing 3
[ 2854.876131] 9pnet: flush 3 [4]done
[ 2856.492801] 9pnet: flushing 6
[ 2857.675642] 9pnet: flush 6 [1]done
[ 2860.051701] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2860.052941] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR
[ 2860.680181] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

This has hit between p9_free_req() and reallocating that request.

[ 2909.911815] 9pnet: [ffff880113213000] delivery in 4, tag 1
[ 2909.913145] 9pnet: [ffff880113213000] late delivery, tag 1 already freed in IDR

... while _this_ one has raced with p9_free_req() in a different way.

[ 2910.852202] 9pnet: flushing 3
[ 2917.985941] 9pnet: flush 3 [4]done
[ 2937.600652] 9pnet: flushing 2
[ 2939.775354] 9pnet: flush 2 [4]done
[ 2961.521600] 9pnet: [ffff880113213000] delivery in 0, tag 1
[ 2962.320383] 9pnet: [ffff880113213000] using tag 1 with odd status (4)

Same at the very first one.

[ 2962.320383] 9pnet: [ffff880113213000] delivery in 4, tag 8
[ 3001.578372] 9pnet: [ffff880113213000] delivery in 4, tag 6

Those had hit before p9_free_req().

AFAICS, we get occasional stray responses from somewhere.  And no, it doesn't
seem to be related to flushes or to dropping chan->lock in req_done() (this
run had been with chan->lock taken on the outside of the loop).

What I really don't understand is WTF is it playing with p9_tag_lookup() -
it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
find req.  Why not simply pass req instead?  I had been wrong about that
p9_tag_lookup() being able to return NULL, but why bother with it at all?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)
@ 2015-07-03 15:00                                               ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-03 15:00 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs

On Fri, Jul 03, 2015 at 10:42:10AM +0100, Al Viro wrote:

> AFAICS, we get occasional stray responses from somewhere.  And no, it doesn't
> seem to be related to flushes or to dropping chan->lock in req_done() (this
> run had been with chan->lock taken on the outside of the loop).
> 
> What I really don't understand is WTF is it playing with p9_tag_lookup() -
> it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
> it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
> find req.  Why not simply pass req instead?  I had been wrong about that
> p9_tag_lookup() being able to return NULL, but why bother with it at all?


Got it.  What happens is that on zero-copy path a signal hitting in the
end of p9_virtio_zc_request() is treated as "it hadn't been sent, got
an error, fuck off and mark the tag ready for reuse".  No TFLUSH issued,
etc.  As the result, when reply finally *does* arrive (we had actually
sent the request), it plays hell on the entire thing - tag might very
well have been reused by then and an unrelated request sent with the
same tag.  Depending on the timing, results can get rather ugly.

There are still other bogosities found in this thread, and at the very
least we need to cope with genuine corrupted response from server, but
the patch below fixes the problem with stray responses here and stops the
"what do you mean, you'd written 4K?  I've only sent 30 bytes!" problems
here.  10 minutes of trinity running without triggering it, while without
that patch it triggers in 2-3 minutes.

Could you verify that the patch below deals with your setup as well?
If it does, I'm going to put it into tonight's pull request, after I get
some sleep...  Right now I'm about to crawl in direction of bed - 25 hours
of uptime is a bit too much... ;-/

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..8c4941d 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -843,7 +843,8 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	if (err < 0) {
 		if (err == -EIO)
 			c->status = Disconnected;
-		goto reterr;
+		if (err != -ERESTARTSYS)
+			goto reterr;
 	}
 	if (req->status == REQ_STATUS_ERROR) {
 		p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -1647,7 +1648,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)
@ 2015-07-03 15:00                                               ` Al Viro
  0 siblings, 0 replies; 69+ messages in thread
From: Al Viro @ 2015-07-03 15:00 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Linus Torvalds, LKML, linux-fsdevel, Aneesh Kumar K.V,
	Eric Van Hensbergen, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Fri, Jul 03, 2015 at 10:42:10AM +0100, Al Viro wrote:

> AFAICS, we get occasional stray responses from somewhere.  And no, it doesn't
> seem to be related to flushes or to dropping chan->lock in req_done() (this
> run had been with chan->lock taken on the outside of the loop).
> 
> What I really don't understand is WTF is it playing with p9_tag_lookup() -
> it's stashing req->tc via virtqueue_add_sgs() opaque data argument, fetches
> it back in req_done(), then picks ->tag from it and uses p9_tag_lookup() to
> find req.  Why not simply pass req instead?  I had been wrong about that
> p9_tag_lookup() being able to return NULL, but why bother with it at all?


Got it.  What happens is that on zero-copy path a signal hitting in the
end of p9_virtio_zc_request() is treated as "it hadn't been sent, got
an error, fuck off and mark the tag ready for reuse".  No TFLUSH issued,
etc.  As the result, when reply finally *does* arrive (we had actually
sent the request), it plays hell on the entire thing - tag might very
well have been reused by then and an unrelated request sent with the
same tag.  Depending on the timing, results can get rather ugly.

There are still other bogosities found in this thread, and at the very
least we need to cope with genuine corrupted response from server, but
the patch below fixes the problem with stray responses here and stops the
"what do you mean, you'd written 4K?  I've only sent 30 bytes!" problems
here.  10 minutes of trinity running without triggering it, while without
that patch it triggers in 2-3 minutes.

Could you verify that the patch below deals with your setup as well?
If it does, I'm going to put it into tonight's pull request, after I get
some sleep...  Right now I'm about to crawl in direction of bed - 25 hours
of uptime is a bit too much... ;-/

diff --git a/net/9p/client.c b/net/9p/client.c
index 6f4c4c8..8c4941d 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -843,7 +843,8 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	if (err < 0) {
 		if (err == -EIO)
 			c->status = Disconnected;
-		goto reterr;
+		if (err != -ERESTARTSYS)
+			goto reterr;
 	}
 	if (req->status == REQ_STATUS_ERROR) {
 		p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -1647,7 +1648,10 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		if (*err) {
 			trace_9p_protocol_dump(clnt, req->rc);
 			p9_free_req(clnt, req);
+			break;
 		}
+		if (rsize < count)
+			pr_err("mismatched reply [tag = %d]\n", req->tc->tag);
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)
@ 2015-07-03 19:56                                                 ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-03 19:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen, linux-nfs

2015-07-03 18:00 GMT+03:00 Al Viro <viro@zeniv.linux.org.uk>:
> Could you verify that the patch below deals with your setup as well?
> If it does, I'm going to put it into tonight's pull request, after I get
> some sleep...  Right now I'm about to crawl in direction of bed - 25 hours
> of uptime is a bit too much... ;-/
>

Works for me.
Tested-by: Andrey Ryabinin <a.ryabinin@samsung.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)
@ 2015-07-03 19:56                                                 ` Andrey Ryabinin
  0 siblings, 0 replies; 69+ messages in thread
From: Andrey Ryabinin @ 2015-07-03 19:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrey Ryabinin, Linus Torvalds, LKML, linux-fsdevel,
	Aneesh Kumar K.V, Eric Van Hensbergen,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

2015-07-03 18:00 GMT+03:00 Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>:
> Could you verify that the patch below deals with your setup as well?
> If it does, I'm going to put it into tonight's pull request, after I get
> some sleep...  Right now I'm about to crawl in direction of bed - 25 hours
> of uptime is a bit too much... ;-/
>

Works for me.
Tested-by: Andrey Ryabinin <a.ryabinin-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 20:08             ` Al Viro
@ 2012-03-31 21:37               ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 21:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 1:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Anyway, commit messages prettied up a bit, fix folded and the whole thing
> is pushed to the same place - i.e.

Ok, looks good, works for me.

Pulled and pushed out,

                   Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 19:48           ` Linus Torvalds
@ 2012-03-31 20:08             ` Al Viro
  2012-03-31 21:37               ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2012-03-31 20:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 12:48:22PM -0700, Linus Torvalds wrote:
> On Sat, Mar 31, 2012 at 12:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Variants:
> > ? ? ? ?1) drop that commit, fold obvious fix into it, leave it for for-next
> > once -rc1 is there; ask you to pull the branch with that sucker excluded
> > ? ? ? ?2) fold the fix into commit, ask you to pull the resulting branch
> > ? ? ? ?3) add fix to the end of existing branch, ask to pull
> 
> Let's go for (2). I'll take a new look after the pull request, but no
> guarantees that I'll then pull it.
> 
> Btw, I liked the do_lookup cleanups, but nicer commit logs would have
> been good. Especially the first lines. The whole
> 
>      untangling do_lookup(), part 1
>      untangling do_lookup(), part 2
>      untangling do_lookup(), part 3
>      untangling do_lookup(), part 4
>      untangling do_lookup(), part 5
>      untangling do_lookup(), part 6
>      untangling do_lookup(), part 7
>      untangling do_lookup(), part 8
>      untangling do_lookup(), part 9
> 
> in shortlogs (and gitk!) does turn me off. So since you have to redo
> the series anyway, can I ask you to perhaps try to make it a bit more
> descriptive?

OK...  Keep in mind that this is a splitup of what used to be a monolithic
patch from Miklos, so this series is basically "how do I convince myself
that his patch is correct (or find a hole in it)".  No holes found and
I like the resulting cleanup, so in it went...

Anyway, commit messages prettied up a bit, fix folded and the whole thing
is pushed to the same place - i.e.

git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (18):
      selinuxfs: merge dentry allocation into sel_make_dir()
      aio: merge aio_cancel_all() with wait_for_all_aios()
      aio: take final put_ioctx() into callers of io_destroy()
      pstore: trim pstore_get_inode()
      mtdchar: kill persistently held vfsmount
      get rid of pointless includes of ext2_fs.h
      new helper: ext2_image_size()
      migrate ext2_fs.h guts to fs/ext2/ext2.h
      ext3: move headers to fs/ext3/
      untangling do_lookup() - isolate !dentry stuff from the rest of it.
      untangling do_lookup() - expand the area under ->i_mutex
      untangling do_lookup() - eliminate a loop.
      untangling do_lookup() - get rid of need_reval in !dentry case
      untangling do_lookup() - massage !dentry case towards __lookup_hash()
      untangling do_lookup() - merge failure exits in !dentry case
      untangling do_lookup() - merge d_alloc_and_lookup() callers
      untangling do_lookup() - switch to calling __lookup_hash()
      untangling do_lookup() - take __lookup_hash()-calling case out of line.

J. Bruce Fields (1):
      vfs: fix out-of-date dentry_unhash() comment

Miklos Szeredi (4):
      vfs: fix d_need_lookup/d_revalidate order in do_lookup
      vfs: don't revalidate just looked up dentry
      vfs: move MAY_EXEC check from __lookup_hash()
      vfs: split __lookup_hash

Thierry Reding (1):
      ext2: No longer export ext2_fs.h to user space

Diffstat:
 arch/blackfin/kernel/setup.c              |    7 +-
 drivers/mtd/mtdchar.c                     |   53 +--
 fs/aio.c                                  |   32 +-
 fs/ext2/ext2.h                            |  631 +++++++++++++++++++++++++++++
 fs/ext2/xattr_security.c                  |    5 +-
 fs/ext2/xattr_trusted.c                   |    5 +-
 fs/ext2/xip.c                             |    2 -
 fs/ext3/acl.c                             |    8 +-
 fs/ext3/balloc.c                          |   10 +-
 fs/ext3/bitmap.c                          |    4 +-
 fs/ext3/dir.c                             |    7 +-
 include/linux/ext3_fs.h => fs/ext3/ext3.h |  488 +++++++++++++++++++----
 fs/ext3/ext3_jbd.c                        |    2 +-
 fs/ext3/file.c                            |    6 +-
 fs/ext3/fsync.c                           |    8 +-
 fs/ext3/hash.c                            |    4 +-
 fs/ext3/ialloc.c                          |   13 +-
 fs/ext3/inode.c                           |   12 +-
 fs/ext3/ioctl.c                           |    7 +-
 fs/ext3/namei.c                           |   14 +-
 fs/ext3/resize.c                          |    5 +-
 fs/ext3/super.c                           |   18 +-
 fs/ext3/symlink.c                         |    4 +-
 fs/ext3/xattr.c                           |    7 +-
 fs/ext3/xattr_security.c                  |    6 +-
 fs/ext3/xattr_trusted.c                   |    6 +-
 fs/ext3/xattr_user.c                      |    5 +-
 fs/gfs2/file.c                            |    1 -
 fs/namei.c                                |  195 ++++------
 fs/ocfs2/ioctl.c                          |    2 -
 fs/pstore/inode.c                         |   26 +-
 include/linux/Kbuild                      |    1 -
 include/linux/ext2_fs.h                   |  569 +-------------------------
 include/linux/ext2_fs_sb.h                |  126 ------
 include/linux/ext3_fs_i.h                 |  151 -------
 include/linux/ext3_fs_sb.h                |   91 -----
 include/linux/ext3_jbd.h                  |  229 -----------
 init/do_mounts_initrd.c                   |    1 -
 init/do_mounts_rd.c                       |    9 +-
 security/selinux/hooks.c                  |    9 +-
 security/selinux/selinuxfs.c              |  110 ++---
 41 files changed, 1249 insertions(+), 1640 deletions(-)
 rename include/linux/ext3_fs.h => fs/ext3/ext3.h (67%)
 delete mode 100644 include/linux/ext2_fs_sb.h
 delete mode 100644 include/linux/ext3_fs_i.h
 delete mode 100644 include/linux/ext3_fs_sb.h
 delete mode 100644 include/linux/ext3_jbd.h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 19:39         ` Al Viro
  2012-03-31 19:42           ` Al Viro
@ 2012-03-31 19:48           ` Linus Torvalds
  2012-03-31 20:08             ` Al Viro
  1 sibling, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 19:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 12:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Variants:
>        1) drop that commit, fold obvious fix into it, leave it for for-next
> once -rc1 is there; ask you to pull the branch with that sucker excluded
>        2) fold the fix into commit, ask you to pull the resulting branch
>        3) add fix to the end of existing branch, ask to pull

Let's go for (2). I'll take a new look after the pull request, but no
guarantees that I'll then pull it.

Btw, I liked the do_lookup cleanups, but nicer commit logs would have
been good. Especially the first lines. The whole

     untangling do_lookup(), part 1
     untangling do_lookup(), part 2
     untangling do_lookup(), part 3
     untangling do_lookup(), part 4
     untangling do_lookup(), part 5
     untangling do_lookup(), part 6
     untangling do_lookup(), part 7
     untangling do_lookup(), part 8
     untangling do_lookup(), part 9

in shortlogs (and gitk!) does turn me off. So since you have to redo
the series anyway, can I ask you to perhaps try to make it a bit more
descriptive?

                       Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 19:39         ` Al Viro
@ 2012-03-31 19:42           ` Al Viro
  2012-03-31 19:48           ` Linus Torvalds
  1 sibling, 0 replies; 69+ messages in thread
From: Al Viro @ 2012-03-31 19:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 08:39:13PM +0100, Al Viro wrote:
> On Sat, Mar 31, 2012 at 12:29:41PM -0700, Linus Torvalds wrote:
> > On Sat, Mar 31, 2012 at 11:57 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Works here... ?OK, that makes selinuxfs one the most likely suspect; let
> > > me check if I can reproduce any crap with that... ?Do you see anything
> > > selinux-related in the call chain, BTW?
> > 
> > Yup.The oops scrolled off my screen, but the call chain was:
> > 
> >  init_sel_fs -> kernel_mount_data -> vfs_kern_mount -> mount_fs ->
> > sel_mount -> mount_single -> (scrolled off)
> > 
> > and then the actual oops happens in d_alloc+0x17, which is
> > "parent->d_sb", so parent was NULL or something (the register state
> > also scrolled off, and I didn't check the fault address, so it could
> > have been some other invalid pointer too).
> 
> Sigh...  sel_make_dir(), last line.  Should be return dentry, is return 0...
> Fixes the problem here (I've managed to reproduce it).

IOW, the incremental is

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 364b784..d7018bf 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1804,7 +1804,7 @@ static struct dentry *sel_make_dir(struct dentry *dir, const char *name,
 	/* bump link count on parent directory, too */
 	inc_nlink(dir->d_inode);
 
-	return 0;
+	return dentry;
 }
 
 static int sel_fill_super(struct super_block *sb, void *data, int silent)

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 19:29         ` Linus Torvalds
  (?)
@ 2012-03-31 19:39         ` Al Viro
  2012-03-31 19:42           ` Al Viro
  2012-03-31 19:48           ` Linus Torvalds
  -1 siblings, 2 replies; 69+ messages in thread
From: Al Viro @ 2012-03-31 19:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 12:29:41PM -0700, Linus Torvalds wrote:
> On Sat, Mar 31, 2012 at 11:57 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Works here... ?OK, that makes selinuxfs one the most likely suspect; let
> > me check if I can reproduce any crap with that... ?Do you see anything
> > selinux-related in the call chain, BTW?
> 
> Yup.The oops scrolled off my screen, but the call chain was:
> 
>  init_sel_fs -> kernel_mount_data -> vfs_kern_mount -> mount_fs ->
> sel_mount -> mount_single -> (scrolled off)
> 
> and then the actual oops happens in d_alloc+0x17, which is
> "parent->d_sb", so parent was NULL or something (the register state
> also scrolled off, and I didn't check the fault address, so it could
> have been some other invalid pointer too).

Sigh...  sel_make_dir(), last line.  Should be return dentry, is return 0...
Fixes the problem here (I've managed to reproduce it).

The question is, what to do with it?  The fix is obvious and the damn thing
seems to work with it just fine.  OTOH, the testing it got had been obviously
worthless (should've set default to selinux).

Variants:
	1) drop that commit, fold obvious fix into it, leave it for for-next
once -rc1 is there; ask you to pull the branch with that sucker excluded
	2) fold the fix into commit, ask you to pull the resulting branch
	3) add fix to the end of existing branch, ask to pull

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 18:57     ` Al Viro
@ 2012-03-31 19:29         ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 19:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 11:57 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Works here...  OK, that makes selinuxfs one the most likely suspect; let
> me check if I can reproduce any crap with that...  Do you see anything
> selinux-related in the call chain, BTW?

Yup.The oops scrolled off my screen, but the call chain was:

 init_sel_fs -> kernel_mount_data -> vfs_kern_mount -> mount_fs ->
sel_mount -> mount_single -> (scrolled off)

and then the actual oops happens in d_alloc+0x17, which is
"parent->d_sb", so parent was NULL or something (the register state
also scrolled off, and I didn't check the fault address, so it could
have been some other invalid pointer too).

                   Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2012-03-31 19:29         ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 19:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 11:57 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Works here...  OK, that makes selinuxfs one the most likely suspect; let
> me check if I can reproduce any crap with that...  Do you see anything
> selinux-related in the call chain, BTW?

Yup.The oops scrolled off my screen, but the call chain was:

 init_sel_fs -> kernel_mount_data -> vfs_kern_mount -> mount_fs ->
sel_mount -> mount_single -> (scrolled off)

and then the actual oops happens in d_alloc+0x17, which is
"parent->d_sb", so parent was NULL or something (the register state
also scrolled off, and I didn't check the fault address, so it could
have been some other invalid pointer too).

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 18:31   ` Linus Torvalds
@ 2012-03-31 18:57     ` Al Viro
  2012-03-31 19:29         ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2012-03-31 18:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 11:31:58AM -0700, Linus Torvalds wrote:
> On Sat, Mar 31, 2012 at 11:28 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > It may be some other change than your pull... I will explore more, but
> > I thought I'd let you know that it's the prime suspect right now.
> 
> Verified. My tree just before the pull works fine (current tip plus a
> couple of my pending selinux avc optimizations). So it's your pull.

Works here...  OK, that makes selinuxfs one the most likely suspect; let
me check if I can reproduce any crap with that...  Do you see anything
selinux-related in the call chain, BTW?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31 18:28   ` Linus Torvalds
  (?)
@ 2012-03-31 18:31   ` Linus Torvalds
  2012-03-31 18:57     ` Al Viro
  -1 siblings, 1 reply; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 18:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Mar 31, 2012 at 11:28 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It may be some other change than your pull... I will explore more, but
> I thought I'd let you know that it's the prime suspect right now.

Verified. My tree just before the pull works fine (current tip plus a
couple of my pending selinux avc optimizations). So it's your pull.

                      Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2012-03-31  5:19 Al Viro
@ 2012-03-31 18:28   ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 18:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Fri, Mar 30, 2012 at 10:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>        Miklos' first series (with do_lookup() rewrite split into edible
> chunks) + assorted bits and pieces.

Hmm. This oopses at boot in d_alloc() (or similar) for me. It's in
some initcall, very early, before user space has even started.

It may be some other change than your pull... I will explore more, but
I thought I'd let you know that it's the prime suspect right now.

                            Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
@ 2012-03-31 18:28   ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2012-03-31 18:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, linux-fsdevel

On Fri, Mar 30, 2012 at 10:19 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>        Miklos' first series (with do_lookup() rewrite split into edible
> chunks) + assorted bits and pieces.

Hmm. This oopses at boot in d_alloc() (or similar) for me. It's in
some initcall, very early, before user space has even started.

It may be some other change than your pull... I will explore more, but
I thought I'd let you know that it's the prime suspect right now.

                            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [git pull] vfs part 2
@ 2012-03-31  5:19 Al Viro
  2012-03-31 18:28   ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2012-03-31  5:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	Miklos' first series (with do_lookup() rewrite split into edible
chunks) + assorted bits and pieces.
Please, pull from
git:git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (18):
      selinuxfs: merge dentry allocation into sel_make_dir()
      aio: merge aio_cancel_all() with wait_for_all_aios()
      aio: take final put_ioctx() into callers of io_destroy()
      pstore: trim pstore_get_inode()
      mtdchar: kill persistently held vfsmount
      get rid of pointless includes of ext2_fs.h
      new helper: ext2_image_size()
      migrate ext2_fs.h guts to fs/ext2/ext2.h
      ext3: move headers to fs/ext3/
      untangling do_lookup(), part 1
      untangling do_lookup(), part 2
      untangling do_lookup(), part 3
      untangling do_lookup(), part 4
      untangling do_lookup(), part 5
      untangling do_lookup(), part 6
      untangling do_lookup(), part 7
      untangling do_lookup(), part 8
      untangling do_lookup(), part 9

J. Bruce Fields (1):
      vfs: fix out-of-date dentry_unhash() comment

Miklos Szeredi (4):
      vfs: fix d_need_lookup/d_revalidate order in do_lookup
      vfs: don't revalidate just looked up dentry
      vfs: move MAY_EXEC check from __lookup_hash()
      vfs: split __lookup_hash

Thierry Reding (1):
      ext2: No longer export ext2_fs.h to user space

Diffstat:
 arch/blackfin/kernel/setup.c              |    7 +-
 drivers/mtd/mtdchar.c                     |   53 +--
 fs/aio.c                                  |   32 +-
 fs/ext2/ext2.h                            |  631 +++++++++++++++++++++++++++++
 fs/ext2/xattr_security.c                  |    5 +-
 fs/ext2/xattr_trusted.c                   |    5 +-
 fs/ext2/xip.c                             |    2 -
 fs/ext3/acl.c                             |    8 +-
 fs/ext3/balloc.c                          |   10 +-
 fs/ext3/bitmap.c                          |    4 +-
 fs/ext3/dir.c                             |    7 +-
 include/linux/ext3_fs.h => fs/ext3/ext3.h |  488 +++++++++++++++++++----
 fs/ext3/ext3_jbd.c                        |    2 +-
 fs/ext3/file.c                            |    6 +-
 fs/ext3/fsync.c                           |    8 +-
 fs/ext3/hash.c                            |    4 +-
 fs/ext3/ialloc.c                          |   13 +-
 fs/ext3/inode.c                           |   12 +-
 fs/ext3/ioctl.c                           |    7 +-
 fs/ext3/namei.c                           |   14 +-
 fs/ext3/resize.c                          |    5 +-
 fs/ext3/super.c                           |   18 +-
 fs/ext3/symlink.c                         |    4 +-
 fs/ext3/xattr.c                           |    7 +-
 fs/ext3/xattr_security.c                  |    6 +-
 fs/ext3/xattr_trusted.c                   |    6 +-
 fs/ext3/xattr_user.c                      |    5 +-
 fs/gfs2/file.c                            |    1 -
 fs/namei.c                                |  195 ++++------
 fs/ocfs2/ioctl.c                          |    2 -
 fs/pstore/inode.c                         |   26 +-
 include/linux/Kbuild                      |    1 -
 include/linux/ext2_fs.h                   |  569 +-------------------------
 include/linux/ext2_fs_sb.h                |  126 ------
 include/linux/ext3_fs_i.h                 |  151 -------
 include/linux/ext3_fs_sb.h                |   91 -----
 include/linux/ext3_jbd.h                  |  229 -----------
 init/do_mounts_initrd.c                   |    1 -
 init/do_mounts_rd.c                       |    9 +-
 security/selinux/hooks.c                  |    9 +-
 security/selinux/selinuxfs.c              |  108 ++---
 41 files changed, 1248 insertions(+), 1639 deletions(-)
 rename include/linux/ext3_fs.h => fs/ext3/ext3.h (67%)
 delete mode 100644 include/linux/ext2_fs_sb.h
 delete mode 100644 include/linux/ext3_fs_i.h
 delete mode 100644 include/linux/ext3_fs_sb.h
 delete mode 100644 include/linux/ext3_jbd.h

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [git pull] vfs part 2
  2010-03-05 16:29 Al Viro
@ 2010-03-05 19:53 ` Linus Torvalds
  0 siblings, 0 replies; 69+ messages in thread
From: Linus Torvalds @ 2010-03-05 19:53 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel



On Fri, 5 Mar 2010, Al Viro wrote:
>
> Al Viro (18):
>       bail out with ELOOP earlier in do_link loop

    "Cry me a river if that hurts you.  Please, do.
     And post a video of that, while you are at it."

I'm waiting for the youtube links ;)

		Linus

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [git pull] vfs part 2
@ 2010-03-05 16:29 Al Viro
  2010-03-05 19:53 ` Linus Torvalds
  0 siblings, 1 reply; 69+ messages in thread
From: Al Viro @ 2010-03-05 16:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

	Next part of VFS stuff: getting do_filp_open() to relative
sanity.  There will be more piles (the next one is dealing with
write_inode mess, in an hour or so).  Please, pull from 
git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus

Shortlog:
Al Viro (18):
      beginning to untangle do_filp_open()
      gut do_filp_open() a bit more (do_last separation)
      Shift releasing nd->root from do_last() to its caller
      unroll do_last: loop in do_filp_open()
      postpone __putname() until after do_last()
      pull the common predecessors into do_last()
      bail out with ELOOP earlier in do_link loop
      pull more into do_last()
      Don't pass mangled open_flag to finish_open()
      Get rid of passing mangled flag to do_last()
      Leave mangled flag only for setting nd.intent.open.flag
      Pull handling of LAST_BIND into do_last(), clean up ok: part in do_filp_open()
      Kill is_link argument of do_last()
      Unify exits in O_CREAT handling
      Turn do_link spaghetty into a normal loop
      Finish pulling of -ESTALE handling to upper level in do_filp_open()
      Get rid of symlink body copying
      Switch !O_CREAT case to use of do_last()

Diffstat:
 fs/namei.c |  523 ++++++++++++++++++++++++++++++------------------------------
 1 files changed, 263 insertions(+), 260 deletions(-)

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2015-07-03 19:56 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-15 18:14 [git pull] vfs part 2 Al Viro
2015-04-23 10:16 ` Andrey Ryabinin
2015-05-25  8:30   ` Andrey Ryabinin
2015-06-21 21:12   ` Al Viro
2015-06-21 21:12     ` Al Viro
2015-06-21 21:16     ` Linus Torvalds
2015-06-21 21:16       ` Linus Torvalds
2015-06-21 21:35       ` Al Viro
2015-06-21 21:35         ` Al Viro
2015-06-22 12:02     ` Andrey Ryabinin
2015-06-22 12:02       ` Andrey Ryabinin
2015-07-01  6:27       ` Al Viro
2015-07-01  7:50         ` Andrey Ryabinin
2015-07-01  8:27           ` Al Viro
2015-07-01  8:41             ` Andrey Ryabinin
2015-07-01  8:55               ` Al Viro
2015-07-01 11:25                 ` Andrey Ryabinin
2015-07-01 18:44                   ` Al Viro
2015-07-02  3:20                     ` Al Viro
2015-07-02  4:10                       ` running out of tags in 9P (was Re: [git pull] vfs part 2) Al Viro
2015-07-02  7:50                         ` Andrey Ryabinin
2015-07-02  7:50                           ` Andrey Ryabinin
2015-07-02  7:59                           ` Al Viro
2015-07-02  7:59                             ` Al Viro
2015-07-02  8:19                             ` Andrey Ryabinin
2015-07-02  8:25                               ` Al Viro
2015-07-02  8:42                                 ` Al Viro
2015-07-02  8:42                                   ` Al Viro
2015-07-02 12:19                                   ` Andrey Ryabinin
2015-07-02 12:19                                     ` Andrey Ryabinin
2015-07-02 16:43                                     ` Al Viro
2015-07-02 16:43                                       ` Al Viro
2015-07-02 16:49                                       ` Al Viro
2015-07-02 16:49                                         ` Al Viro
2015-07-03  8:19                                         ` Andrey Ryabinin
2015-07-03  8:19                                           ` Andrey Ryabinin
2015-07-03  9:42                                           ` Al Viro
2015-07-03 15:00                                             ` [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC " Al Viro
2015-07-03 15:00                                               ` Al Viro
2015-07-03 19:56                                               ` Andrey Ryabinin
2015-07-03 19:56                                                 ` Andrey Ryabinin
2015-07-02 20:26                                       ` running out of tags in 9P " Andrey Ryabinin
     [not found]                         ` <5594E5EB.4030808@samsung.com>
2015-07-02  7:50                           ` Al Viro
2015-07-02 12:00                       ` [git pull] vfs part 2 Jeff Layton
2015-07-02 12:07                         ` Jeff Layton
2015-07-02 16:45                           ` Al Viro
2015-07-02 17:01                             ` Jeff Layton
2015-07-02 17:56                               ` Dominique Martinet
2015-07-02 18:43                                 ` Al Viro
2015-07-02 21:00                                   ` Dominique Martinet
2015-07-02 18:59                                 ` Jeff Layton
2015-07-02 20:36                                 ` Andrey Ryabinin
2015-07-02 18:40                               ` Al Viro
2015-07-02 19:16                                 ` Linus Torvalds
2015-07-02 20:44                                   ` Al Viro
  -- strict thread matches above, loose matches on Subject: below --
2012-03-31  5:19 Al Viro
2012-03-31 18:28 ` Linus Torvalds
2012-03-31 18:28   ` Linus Torvalds
2012-03-31 18:31   ` Linus Torvalds
2012-03-31 18:57     ` Al Viro
2012-03-31 19:29       ` Linus Torvalds
2012-03-31 19:29         ` Linus Torvalds
2012-03-31 19:39         ` Al Viro
2012-03-31 19:42           ` Al Viro
2012-03-31 19:48           ` Linus Torvalds
2012-03-31 20:08             ` Al Viro
2012-03-31 21:37               ` Linus Torvalds
2010-03-05 16:29 Al Viro
2010-03-05 19:53 ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.