All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in 4.6.0-git - bisected to commit dd254f5a382c
@ 2016-05-23 21:30 Larry Finger
  2016-05-24  0:18 ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Larry Finger @ 2016-05-23 21:30 UTC (permalink / raw)
  To: LKML, Al Viro

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

The mainline kernels past 4.6.0 fail hang when logging in. There are no error 
messages, and the machine seems to be waiting for some event that never happens.

The problem has been bisected to commit dd254f5a382c ("fold checks into 
iterate_and_advance()"). The bisection has been verified.

The problem is the call from iov_iter_advance(). When I reinstated the old macro 
with a new name and used it in that routine, the system works. Obviously, the 
call that seems to be incorrect has some benefits. My quich-and-dirty patch is 
attached.

I will be willing to test any patch you prepare.

Thanks,

Larry

-- 
If I was stranded on an island and the only way to get off
the island was to make a pretty UI, I’d die there.

Linus Torvalds

[-- Attachment #2: fix_regression.patch --]
[-- Type: text/x-patch, Size: 1552 bytes --]

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 28cb431..614911c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -139,6 +139,43 @@
 	}							\
 }
 
+#define iterate_and_advance_nocheck(i, n, v, I, B, K) {		\
+	size_t skip = i->iov_offset;				\
+	if (unlikely(i->type & ITER_BVEC)) {			\
+		const struct bio_vec *bvec;			\
+		struct bio_vec v;				\
+		iterate_bvec(i, n, v, bvec, skip, (B))		\
+		if (skip == bvec->bv_len) {			\
+			bvec++;					\
+			skip = 0;				\
+		}						\
+		i->nr_segs -= bvec - i->bvec;			\
+		i->bvec = bvec;					\
+	} else if (unlikely(i->type & ITER_KVEC)) {		\
+		const struct kvec *kvec;			\
+		struct kvec v;					\
+		iterate_kvec(i, n, v, kvec, skip, (K))		\
+		if (skip == kvec->iov_len) {			\
+			kvec++;					\
+			skip = 0;				\
+		}						\
+		i->nr_segs -= kvec - i->kvec;			\
+		i->kvec = kvec;					\
+	} else {						\
+		const struct iovec *iov;			\
+		struct iovec v;					\
+		iterate_iovec(i, n, v, iov, skip, (I))		\
+		if (skip == iov->iov_len) {			\
+			iov++;					\
+			skip = 0;				\
+		}						\
+		i->nr_segs -= iov - i->iov;			\
+		i->iov = iov;					\
+	}							\
+	i->count -= n;						\
+	i->iov_offset = skip;					\
+}
+ 
 static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
@@ -488,7 +525,7 @@ EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);
 
 void iov_iter_advance(struct iov_iter *i, size_t size)
 {
-	iterate_and_advance(i, size, v, 0, 0, 0)
+	iterate_and_advance_nocheck(i, size, v, 0, 0, 0)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-23 21:30 Regression in 4.6.0-git - bisected to commit dd254f5a382c Larry Finger
@ 2016-05-24  0:18 ` Al Viro
  2016-05-24  2:55   ` Larry Finger
  2016-05-24 16:10   ` Larry Finger
  0 siblings, 2 replies; 21+ messages in thread
From: Al Viro @ 2016-05-24  0:18 UTC (permalink / raw)
  To: Larry Finger; +Cc: LKML

On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
> The mainline kernels past 4.6.0 fail hang when logging in. There are no
> error messages, and the machine seems to be waiting for some event that
> never happens.
> 
> The problem has been bisected to commit dd254f5a382c ("fold checks into
> iterate_and_advance()"). The bisection has been verified.
> 
> The problem is the call from iov_iter_advance(). When I reinstated the old
> macro with a new name and used it in that routine, the system works.
> Obviously, the call that seems to be incorrect has some benefits. My
> quich-and-dirty patch is attached.
> 
> I will be willing to test any patch you prepare.

Hangs where and how?  A reproducer, please...  This is really weird - the
only change there is in the cases when
	* iov_iter_advance(i, n) is called with n greater than the remaining
amount.  It's a bug, plain and simple - old variant would've been left in
seriously buggered state and at the very least we want to catch any such
places for the sake of backports
	* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
but the old one dereferences i->iov[0], which be pointing beyond the end of
array by that point.  The value read from there was not used by the old code,
at that.

	Could you slap WARN_ON(size > i->count) in the very beginning of
iov_iter_advance() (the mainline variant) and see what triggers on your
reproducer?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24  0:18 ` Al Viro
@ 2016-05-24  2:55   ` Larry Finger
  2016-05-24 16:10   ` Larry Finger
  1 sibling, 0 replies; 21+ messages in thread
From: Larry Finger @ 2016-05-24  2:55 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML

On 05/23/2016 07:18 PM, Al Viro wrote:
> On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
>> The mainline kernels past 4.6.0 fail hang when logging in. There are no
>> error messages, and the machine seems to be waiting for some event that
>> never happens.
>>
>> The problem has been bisected to commit dd254f5a382c ("fold checks into
>> iterate_and_advance()"). The bisection has been verified.
>>
>> The problem is the call from iov_iter_advance(). When I reinstated the old
>> macro with a new name and used it in that routine, the system works.
>> Obviously, the call that seems to be incorrect has some benefits. My
>> quich-and-dirty patch is attached.
>>
>> I will be willing to test any patch you prepare.
>
> Hangs where and how?  A reproducer, please...  This is really weird - the
> only change there is in the cases when
> 	* iov_iter_advance(i, n) is called with n greater than the remaining
> amount.  It's a bug, plain and simple - old variant would've been left in
> seriously buggered state and at the very least we want to catch any such
> places for the sake of backports
> 	* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
> but the old one dereferences i->iov[0], which be pointing beyond the end of
> array by that point.  The value read from there was not used by the old code,
> at that.
>
> 	Could you slap WARN_ON(size > i->count) in the very beginning of
> iov_iter_advance() (the mainline variant) and see what triggers on your
> reproducer?

The hang is when you try to log in. It asks for a password and the system never 
returns, and nothing is logged. The system will switch between the various 
CTRL-ALT-Fn screens, but that is about the most it will do.

Adding WARN_ON(size > i->count) showed nothing. I got the same result for 
WARN_ON(!i->count). A WARN_ON(!size) does trigger the following traceback:

[   15.030907] ------------[ cut here ]------------
[   15.030913] WARNING: CPU: 0 PID: 353 at lib/iov_iter.c:529 
iov_iter_advance+0xf6/0x240
[   15.030914] Modules linked in: af_packet nfs fscache arc4 rtsx_pci_sdmmc 
mmc_core rtsx_pci_ms memstick x86_pkg_temp_thermal kvm_intel iwlmvm kvm mac80211 
snd_hda_c
odec_generic irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper iwlwifi 
ablk_helper cryptd snd_h
da_intel snd_hda_codec e1000e snd_hwdep snd_hda_core snd_pcm cfg80211 pcspkr 
serio_raw rtsx_pci snd_timer xhci_pci snd lpc_ich ptp mfd_core pps_core xhci_hcd 
soundcor
e thermal toshiba_acpi toshiba_bluetooth sparse_keymap wmi rfkill battery 
acpi_cpufreq ac processor dm_mod i915 i2c_algo_bit drm_kms_helper syscopyarea 
sysfillrect sy
simgblt fb_sys_fops drm sr_mod cdrom video button sg autofs4
[   15.030965] CPU: 0 PID: 353 Comm: systemd-journal Not tainted 
4.6.0-09084-g75b5796-dirty #89
[   15.030966] Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 
   04/17/2014
[   15.030968]  0000000000000000 ffff88021fc07d40 ffffffff813e4d1e 0000000000000000
[   15.030972]  0000000000000000 ffff88021fc07d80 ffffffff810702b1 00000211c4105ac0
[   15.030975]  ffff88021fc07e08 0000000000000000 ffff88021fc07f08 ffffffff814bacc0
[   15.030978] Call Trace:
[   15.030981]  [<ffffffff813e4d1e>] dump_stack+0x67/0x99
[   15.030985]  [<ffffffff810702b1>] __warn+0xd1/0xf0
[   15.030989]  [<ffffffff814bacc0>] ? tty_compat_ioctl+0xe0/0xe0
[   15.030991]  [<ffffffff8107039d>] warn_slowpath_null+0x1d/0x20
[   15.030994]  [<ffffffff813f7716>] iov_iter_advance+0xf6/0x240
[   15.030997]  [<ffffffff81223161>] do_loop_readv_writev+0x51/0xc0
[   15.030999]  [<ffffffff814bacc0>] ? tty_compat_ioctl+0xe0/0xe0
[   15.031002]  [<ffffffff812245ff>] do_readv_writev+0x1ef/0x210
[   15.031006]  [<ffffffff81238c86>] ? do_vfs_ioctl+0x96/0x6a0
[   15.031008]  [<ffffffff8122484f>] vfs_writev+0x3f/0x50
[   15.031010]  [<ffffffff812248b5>] do_writev+0x55/0xd0
[   15.031013]  [<ffffffff812259a0>] SyS_writev+0x10/0x20
[   15.031016]  [<ffffffff81794b65>] entry_SYSCALL_64_fastpath+0x18/0xa8
[   15.031019] ---[ end trace 8c776b094504066d ]---

Two of these are logged for each boot.

If I make iov_iter_advance() look as follows, my system will boot:

void iov_iter_advance(struct iov_iter *i, size_t size)
{
         WARN_ON(!size);
         if (size)
                 iterate_and_advance(i, size, v, 0, 0, 0)
         else
                 iterate_and_advance_nocheck(i, size, v, 0, 0, 0)
}
EXPORT_SYMBOL(iov_iter_advance);

Larry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24  0:18 ` Al Viro
  2016-05-24  2:55   ` Larry Finger
@ 2016-05-24 16:10   ` Larry Finger
  2016-05-24 16:28     ` Al Viro
  2016-05-24 19:13     ` Matthew McClintock
  1 sibling, 2 replies; 21+ messages in thread
From: Larry Finger @ 2016-05-24 16:10 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML

On 05/23/2016 07:18 PM, Al Viro wrote:
> On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
>> The mainline kernels past 4.6.0 fail hang when logging in. There are no
>> error messages, and the machine seems to be waiting for some event that
>> never happens.
>>
>> The problem has been bisected to commit dd254f5a382c ("fold checks into
>> iterate_and_advance()"). The bisection has been verified.
>>
>> The problem is the call from iov_iter_advance(). When I reinstated the old
>> macro with a new name and used it in that routine, the system works.
>> Obviously, the call that seems to be incorrect has some benefits. My
>> quich-and-dirty patch is attached.
>>
>> I will be willing to test any patch you prepare.
>
> Hangs where and how?  A reproducer, please...  This is really weird - the
> only change there is in the cases when
> 	* iov_iter_advance(i, n) is called with n greater than the remaining
> amount.  It's a bug, plain and simple - old variant would've been left in
> seriously buggered state and at the very least we want to catch any such
> places for the sake of backports
> 	* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
> but the old one dereferences i->iov[0], which be pointing beyond the end of
> array by that point.  The value read from there was not used by the old code,
> at that.
>
> 	Could you slap WARN_ON(size > i->count) in the very beginning of
> iov_iter_advance() (the mainline variant) and see what triggers on your
> reproducer?

As I wrote earlier, i->count was greater than zero, but size was zero, which 
caused the bulk of iterate_and_advance() to be skipped.

For now, the following one-line hack allows my system to boot:

diff --git a/fs/read_write.c b/fs/read_write.c
index 933b53a..d5d64d9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, 
struct iov_iter *iter,
                 ret += nr;
                 if (nr != iovec.iov_len)
                         break;
+               nr = max_t(ssize_t, nr, 1);
                 iov_iter_advance(iter, nr);
         }

I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be 
zero, but it seems to have been exposed by commit dd254f5a382c.

Larry

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 16:10   ` Larry Finger
@ 2016-05-24 16:28     ` Al Viro
  2016-05-24 18:39       ` Larry Finger
  2016-05-24 19:13     ` Matthew McClintock
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2016-05-24 16:28 UTC (permalink / raw)
  To: Larry Finger; +Cc: LKML

On Tue, May 24, 2016 at 11:10:09AM -0500, Larry Finger wrote:

> For now, the following one-line hack allows my system to boot:
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 933b53a..d5d64d9 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp,
> struct iov_iter *iter,
>                 ret += nr;
>                 if (nr != iovec.iov_len)
>                         break;
> +               nr = max_t(ssize_t, nr, 1);
>                 iov_iter_advance(iter, nr);
>         }
> 
> I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be
> zero, but it seems to have been exposed by commit dd254f5a382c.

This is definitely broken.  What happens is that something calls readv()
or writev() with one of the iovecs in the middle of the vector having
zero ->iov_len.  The call of iov_iter_advance(iter, 0) used to step to
the next iovec in such situation; the change in this commit leaves that to the
next primitive called, and most of the time that works just fine.
do_loop_readv_writev(), though, is picking the size to be passed to
->write()/->read() from the current iovec, and if it had been zero from
the very beginning... guess what's going to happen.

I'm somewhat curious about the source of that syscall - it's a good testcase
to add to ltp/xfstests, but it smells very odd for normal userland code.
It certainly needs to be fixed kernel-side, but the code issuing that is
probably worth looking into.  Oddities are like mushrooms - found one, look
around for more and don't assume that the rest will be of the same species...

I'm looking into iterate_and_advance right now; hopefully will have a sane
replacement shortly...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 16:28     ` Al Viro
@ 2016-05-24 18:39       ` Larry Finger
  0 siblings, 0 replies; 21+ messages in thread
From: Larry Finger @ 2016-05-24 18:39 UTC (permalink / raw)
  To: Al Viro; +Cc: LKML

On 05/24/2016 11:28 AM, Al Viro wrote:
> On Tue, May 24, 2016 at 11:10:09AM -0500, Larry Finger wrote:
>
>> For now, the following one-line hack allows my system to boot:
>>
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 933b53a..d5d64d9 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp,
>> struct iov_iter *iter,
>>                  ret += nr;
>>                  if (nr != iovec.iov_len)
>>                          break;
>> +               nr = max_t(ssize_t, nr, 1);
>>                  iov_iter_advance(iter, nr);
>>          }
>>
>> I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be
>> zero, but it seems to have been exposed by commit dd254f5a382c.
>
> This is definitely broken.  What happens is that something calls readv()
> or writev() with one of the iovecs in the middle of the vector having
> zero ->iov_len.  The call of iov_iter_advance(iter, 0) used to step to
> the next iovec in such situation; the change in this commit leaves that to the
> next primitive called, and most of the time that works just fine.
> do_loop_readv_writev(), though, is picking the size to be passed to
> ->write()/->read() from the current iovec, and if it had been zero from
> the very beginning... guess what's going to happen.
>
> I'm somewhat curious about the source of that syscall - it's a good testcase
> to add to ltp/xfstests, but it smells very odd for normal userland code.
> It certainly needs to be fixed kernel-side, but the code issuing that is
> probably worth looking into.  Oddities are like mushrooms - found one, look
> around for more and don't assume that the rest will be of the same species...
>
> I'm looking into iterate_and_advance right now; hopefully will have a sane
> replacement shortly...

I did a pull from mainline this morning (commit 84787c572d402) and the problem 
has somehow been fixed. I did not see any likely candidates in the commit 
messages, and I'm bisecting again to see which change did it.

Larry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 16:10   ` Larry Finger
  2016-05-24 16:28     ` Al Viro
@ 2016-05-24 19:13     ` Matthew McClintock
  2016-05-24 19:16       ` Larry Finger
  1 sibling, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-24 19:13 UTC (permalink / raw)
  To: Larry Finger; +Cc: Al Viro, LKML

I’m seeing this too, same commit if you want another person to test/reproduce.

-M

> On May 24, 2016, at 11:10 AM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> 
> On 05/23/2016 07:18 PM, Al Viro wrote:
>> On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
>>> The mainline kernels past 4.6.0 fail hang when logging in. There are no
>>> error messages, and the machine seems to be waiting for some event that
>>> never happens.
>>> 
>>> The problem has been bisected to commit dd254f5a382c ("fold checks into
>>> iterate_and_advance()"). The bisection has been verified.
>>> 
>>> The problem is the call from iov_iter_advance(). When I reinstated the old
>>> macro with a new name and used it in that routine, the system works.
>>> Obviously, the call that seems to be incorrect has some benefits. My
>>> quich-and-dirty patch is attached.
>>> 
>>> I will be willing to test any patch you prepare.
>> 
>> Hangs where and how?  A reproducer, please...  This is really weird - the
>> only change there is in the cases when
>> 	* iov_iter_advance(i, n) is called with n greater than the remaining
>> amount.  It's a bug, plain and simple - old variant would've been left in
>> seriously buggered state and at the very least we want to catch any such
>> places for the sake of backports
>> 	* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
>> but the old one dereferences i->iov[0], which be pointing beyond the end of
>> array by that point.  The value read from there was not used by the old code,
>> at that.
>> 
>> 	Could you slap WARN_ON(size > i->count) in the very beginning of
>> iov_iter_advance() (the mainline variant) and see what triggers on your
>> reproducer?
> 
> As I wrote earlier, i->count was greater than zero, but size was zero, which caused the bulk of iterate_and_advance() to be skipped.
> 
> For now, the following one-line hack allows my system to boot:
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 933b53a..d5d64d9 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
>                ret += nr;
>                if (nr != iovec.iov_len)
>                        break;
> +               nr = max_t(ssize_t, nr, 1);
>                iov_iter_advance(iter, nr);
>        }
> 
> I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be zero, but it seems to have been exposed by commit dd254f5a382c.
> 
> Larry
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 19:13     ` Matthew McClintock
@ 2016-05-24 19:16       ` Larry Finger
  2016-05-24 19:25         ` Matthew McClintock
  0 siblings, 1 reply; 21+ messages in thread
From: Larry Finger @ 2016-05-24 19:16 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Al Viro, LKML

On 05/24/2016 02:13 PM, Matthew McClintock wrote:
> I’m seeing this too, same commit if you want another person to test/reproduce.

If you do a pull today, does that fix your problem?

Larry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 19:16       ` Larry Finger
@ 2016-05-24 19:25         ` Matthew McClintock
  2016-05-24 19:36           ` Larry Finger
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-24 19:25 UTC (permalink / raw)
  To: Larry Finger; +Cc: Al Viro, LKML

On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> 
> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
>> I’m seeing this too, same commit if you want another person to test/reproduce.
> 
> If you do a pull today, does that fix your problem?

Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711

-M

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 19:25         ` Matthew McClintock
@ 2016-05-24 19:36           ` Larry Finger
  2016-05-24 22:31             ` Matthew McClintock
  0 siblings, 1 reply; 21+ messages in thread
From: Larry Finger @ 2016-05-24 19:36 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Al Viro, LKML

On 05/24/2016 02:25 PM, Matthew McClintock wrote:
> On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>
>> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
>>> I’m seeing this too, same commit if you want another person to test/reproduce.
>>
>> If you do a pull today, does that fix your problem?
>
> Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711

Commit 84787c572d402644dca4874aba73324d9f8e3948 is working for me. I have a 
fixup in lib/iov_iter.c with a dump_stack() call if the fixup was needed. That 
dump is not triggered. I do not seem to have a56f489502e yet.

Larry

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 19:36           ` Larry Finger
@ 2016-05-24 22:31             ` Matthew McClintock
  2016-05-24 23:41               ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-24 22:31 UTC (permalink / raw)
  To: Larry Finger; +Cc: Al Viro, LKML


> On May 24, 2016, at 2:36 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> 
> On 05/24/2016 02:25 PM, Matthew McClintock wrote:
>> On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>> 
>>> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
>>>> I’m seeing this too, same commit if you want another person to test/reproduce.
>>> 
>>> If you do a pull today, does that fix your problem?
>> 
>> Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711
> 
> Commit 84787c572d402644dca4874aba73324d9f8e3948 is working for me. I have a fixup in lib/iov_iter.c with a dump_stack() call if the fixup was needed. That dump is not triggered. I do not seem to have a56f489502e yet.

Still seeing the issue on top of tree and the above commit. Re-ran bisection just to be sure.

-M

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 22:31             ` Matthew McClintock
@ 2016-05-24 23:41               ` Al Viro
  2016-05-25  0:58                 ` Matthew McClintock
  2016-05-25  6:24                 ` Al Viro
  0 siblings, 2 replies; 21+ messages in thread
From: Al Viro @ 2016-05-24 23:41 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Larry Finger, LKML

On Tue, May 24, 2016 at 05:31:51PM -0500, Matthew McClintock wrote:
> 
> > On May 24, 2016, at 2:36 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> > 
> > On 05/24/2016 02:25 PM, Matthew McClintock wrote:
> >> On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> >>> 
> >>> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
> >>>> I’m seeing this too, same commit if you want another person to test/reproduce.
> >>> 
> >>> If you do a pull today, does that fix your problem?
> >> 
> >> Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711
> > 
> > Commit 84787c572d402644dca4874aba73324d9f8e3948 is working for me. I have a fixup in lib/iov_iter.c with a dump_stack() call if the fixup was needed. That dump is not triggered. I do not seem to have a56f489502e yet.
> 
> Still seeing the issue on top of tree and the above commit. Re-ran bisection just to be sure.

Guys, the bug is real and definitely still there.
	char c;
	struct iovec v[2] = {{&c, 0}, {&c, 1}};
	readv(0, v, 2);
will trigger it just fine with stdin on e.g. tty.  It needs fixing and I'll
post a fix as soon as it gets through the local testing.  In the meanwhile,
I would like to know what in userland is doing that kind of call - kernel
certainly shouldn't end up in an infinite loop on that, but it's bloody odd
and I wonder what's going on in userland code to result in that call.

Again, I understand what's going on kernel-side; the only tricky part is how
to fix it without bringing the nasal daemons back.  I think I have a solution
and I'm going to post it tonight if it survives the local beating.  In any
case, the testcase above deserves being added to LTP - it's a real regression.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 23:41               ` Al Viro
@ 2016-05-25  0:58                 ` Matthew McClintock
  2016-05-25  1:10                   ` Al Viro
  2016-05-25  6:24                 ` Al Viro
  1 sibling, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-25  0:58 UTC (permalink / raw)
  To: Al Viro; +Cc: Larry Finger, LKML


> On May 24, 2016, at 6:41 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> Again, I understand what's going on kernel-side; the only tricky part is how
> to fix it without bringing the nasal daemons back.  I think I have a solution
> and I'm going to post it tonight if it survives the local beating.  In any
> case, the testcase above deserves being added to LTP - it's a real regression.

I’m running a simple busybox rootfs with the following init script:

https://gist.github.com/7b12fdb5d7def9a835291a79c060fa07

And inittab:

https://gist.github.com/5a840b7eaa48f321836125f15147d0e9

Happy to turn on ftrace, dyndbg, and provide more logs.

-M

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  0:58                 ` Matthew McClintock
@ 2016-05-25  1:10                   ` Al Viro
  2016-05-25  1:20                     ` Matthew McClintock
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2016-05-25  1:10 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Larry Finger, LKML

On Tue, May 24, 2016 at 07:58:13PM -0500, Matthew McClintock wrote:
> 
> > On May 24, 2016, at 6:41 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > Again, I understand what's going on kernel-side; the only tricky part is how
> > to fix it without bringing the nasal daemons back.  I think I have a solution
> > and I'm going to post it tonight if it survives the local beating.  In any
> > case, the testcase above deserves being added to LTP - it's a real regression.
> 
> I’m running a simple busybox rootfs with the following init script:
> 
> https://gist.github.com/7b12fdb5d7def9a835291a79c060fa07
> 
> And inittab:
> 
> https://gist.github.com/5a840b7eaa48f321836125f15147d0e9
> 
> Happy to turn on ftrace, dyndbg, and provide more logs.

Slap the WARN_ON(!size); in the very beginning of iov_iter_advance(), see
where it's triggered...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  1:10                   ` Al Viro
@ 2016-05-25  1:20                     ` Matthew McClintock
  2016-05-25  1:28                       ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-25  1:20 UTC (permalink / raw)
  To: Al Viro; +Cc: Larry Finger, LKML


> On May 24, 2016, at 8:10 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> Slap the WARN_ON(!size); in the very beginning of iov_iter_advance(), see
> where it's triggered...

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 28cb431..d89e154 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -488,6 +488,7 @@ EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);

 void iov_iter_advance(struct iov_iter *i, size_t size)
 {
+       WARN_ON(!size);
        iterate_and_advance(i, size, v, 0, 0, 0)
 }
 EXPORT_SYMBOL(iov_iter_advance);

[    1.359869] This architecture does not have kernel memory protection.
init started: BusyBox v1.24.1 ()
starting pid 78, tty '': '/etc/init.d/rcS'
[    1.435863] random: udevadm urandom read with 0 bits of entropy available
[    1.448116] ------------[ cut here ]------------
[    1.448193] WARNING: CPU: 1 PID: 88 at lib/iov_iter.c:491 iov_iter_advance+0xf0/0x1b8
[    1.451973] Modules linked in:
[    1.462753] CPU: 1 PID: 88 Comm: udevd Not tainted 4.6.0 #195
[    1.462793] Hardware name: Qualcomm (Flattened Device Tree)
[    1.468346] [<c021a978>] (unwind_backtrace) from [<c02158a8>] (show_stack+0x20/0x24)
[    1.473713] [<c02158a8>] (show_stack) from [<c044f310>] (dump_stack+0x90/0xa4)
[    1.481701] [<c044f310>] (dump_stack) from [<c0228130>] (__warn+0xf8/0x110)
[    1.488727] [<c0228130>] (__warn) from [<c0228218>] (warn_slowpath_null+0x30/0x38)
[    1.495588] [<c0228218>] (warn_slowpath_null) from [<c04607e8>] (iov_iter_advance+0xf0/0x1b8)
[    1.503244] [<c04607e8>] (iov_iter_advance) from [<c034114c>] (do_readv_writev+0x2d0/0x370)
[    1.511827] [<c034114c>] (do_readv_writev) from [<c034123c>] (vfs_readv+0x50/0x68)
[    1.519983] [<c034123c>] (vfs_readv) from [<c03412b0>] (do_readv+0x5c/0xb8)
[    1.527621] [<c03412b0>] (do_readv) from [<c0341f18>] (SyS_readv+0x1c/0x20)
[    1.534485] [<c0341f18>] (SyS_readv) from [<c0210f80>] (ret_fast_syscall+0x0/0x3c)
[    1.541556] ---[ end trace eef892a602dbe329 ]---

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  1:20                     ` Matthew McClintock
@ 2016-05-25  1:28                       ` Al Viro
  2016-05-25  2:06                         ` Matthew McClintock
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2016-05-25  1:28 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Larry Finger, LKML

On Tue, May 24, 2016 at 08:20:46PM -0500, Matthew McClintock wrote:
> 
> > On May 24, 2016, at 8:10 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > Slap the WARN_ON(!size); in the very beginning of iov_iter_advance(), see
> > where it's triggered...
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 28cb431..d89e154 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -488,6 +488,7 @@ EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);
> 
>  void iov_iter_advance(struct iov_iter *i, size_t size)
>  {
> +       WARN_ON(!size);
>         iterate_and_advance(i, size, v, 0, 0, 0)
>  }
>  EXPORT_SYMBOL(iov_iter_advance);
> 
> [    1.359869] This architecture does not have kernel memory protection.
> init started: BusyBox v1.24.1 ()
> starting pid 78, tty '': '/etc/init.d/rcS'
> [    1.435863] random: udevadm urandom read with 0 bits of entropy available
> [    1.448116] ------------[ cut here ]------------
> [    1.448193] WARNING: CPU: 1 PID: 88 at lib/iov_iter.c:491 iov_iter_advance+0xf0/0x1b8

The next obvious question is which binary it is and what's the return
address to userland; make that
	if (!size)
		printk(KERN_ERR "crap in %s[%x]",
			current->comm,
			current_pt_regs()->rip);
(in the same place)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  1:28                       ` Al Viro
@ 2016-05-25  2:06                         ` Matthew McClintock
  2016-05-25  3:21                           ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew McClintock @ 2016-05-25  2:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Larry Finger, LKML


> On May 24, 2016, at 8:28 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> The next obvious question is which binary it is and what's the return
> address to userland; make that
> 	if (!size)
> 		printk(KERN_ERR "crap in %s[%x]",
> 			current->comm,
> 			current_pt_regs()->rip);
> (in the same place)

This is an ARM board, no rip. Quick log without, can re-run again later tonight or maybe tomorrow:

[    8.008054] crap in udevd

-M

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  2:06                         ` Matthew McClintock
@ 2016-05-25  3:21                           ` Al Viro
  0 siblings, 0 replies; 21+ messages in thread
From: Al Viro @ 2016-05-25  3:21 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Larry Finger, LKML

On Tue, May 24, 2016 at 09:06:03PM -0500, Matthew McClintock wrote:
> 
> > On May 24, 2016, at 8:28 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > The next obvious question is which binary it is and what's the return
> > address to userland; make that
> > 	if (!size)
> > 		printk(KERN_ERR "crap in %s[%x]",
> > 			current->comm,
> > 			current_pt_regs()->rip);
> > (in the same place)
> 
> This is an ARM board, no rip. Quick log without, can re-run again later tonight or maybe tomorrow:

probably ->ARM_pc, then...

> [    8.008054] crap in udevd

Indeed.  Which version of that Fine Piece Of Software is it?  Again, the
kernel-side bug is real and needs fixing, but "udev does something bogus"
is not surprising in the slightest...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-24 23:41               ` Al Viro
  2016-05-25  0:58                 ` Matthew McClintock
@ 2016-05-25  6:24                 ` Al Viro
  2016-05-25 14:28                   ` Larry Finger
  2016-05-25 15:27                   ` Matthew McClintock
  1 sibling, 2 replies; 21+ messages in thread
From: Al Viro @ 2016-05-25  6:24 UTC (permalink / raw)
  To: Matthew McClintock; +Cc: Larry Finger, LKML

On Wed, May 25, 2016 at 12:41:33AM +0100, Al Viro wrote:
> On Tue, May 24, 2016 at 05:31:51PM -0500, Matthew McClintock wrote:
> > 
> > > On May 24, 2016, at 2:36 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> > > 
> > > On 05/24/2016 02:25 PM, Matthew McClintock wrote:
> > >> On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> > >>> 
> > >>> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
> > >>>> I’m seeing this too, same commit if you want another person to test/reproduce.
> > >>> 
> > >>> If you do a pull today, does that fix your problem?
> > >> 
> > >> Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711
> > > 
> > > Commit 84787c572d402644dca4874aba73324d9f8e3948 is working for me. I have a fixup in lib/iov_iter.c with a dump_stack() call if the fixup was needed. That dump is not triggered. I do not seem to have a56f489502e yet.
> > 
> > Still seeing the issue on top of tree and the above commit. Re-ran bisection just to be sure.
> 
> Guys, the bug is real and definitely still there.
> 	char c;
> 	struct iovec v[2] = {{&c, 0}, {&c, 1}};
> 	readv(0, v, 2);
> will trigger it just fine with stdin on e.g. tty.  It needs fixing and I'll
> post a fix as soon as it gets through the local testing.  In the meanwhile,
> I would like to know what in userland is doing that kind of call - kernel
> certainly shouldn't end up in an infinite loop on that, but it's bloody odd
> and I wonder what's going on in userland code to result in that call.
> 
> Again, I understand what's going on kernel-side; the only tricky part is how
> to fix it without bringing the nasal daemons back.  I think I have a solution
> and I'm going to post it tonight if it survives the local beating.  In any
> case, the testcase above deserves being added to LTP - it's a real regression.

FWIW, the reproducer is
#include <sys/uio.>
main()
{
	char c;
	struct iovec v[2] = {{&c,0},{&c,1}};
	readv(0, v, 2);
}
ran with stdin from tty.  Fix for that is simply

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 28cb431..0cd5227 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -101,7 +101,7 @@
 #define iterate_and_advance(i, n, v, I, B, K) {			\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
-	if (n) {						\
+	if (i->count) {						\
 		size_t skip = i->iov_offset;			\
 		if (unlikely(i->type & ITER_BVEC)) {		\
 			const struct bio_vec *bvec;		\

Could you see if your reproducer is fixed by that?

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  6:24                 ` Al Viro
@ 2016-05-25 14:28                   ` Larry Finger
  2016-05-25 15:27                   ` Matthew McClintock
  1 sibling, 0 replies; 21+ messages in thread
From: Larry Finger @ 2016-05-25 14:28 UTC (permalink / raw)
  To: Al Viro, Matthew McClintock; +Cc: LKML

On 05/25/2016 01:24 AM, Al Viro wrote:
> On Wed, May 25, 2016 at 12:41:33AM +0100, Al Viro wrote:
>> On Tue, May 24, 2016 at 05:31:51PM -0500, Matthew McClintock wrote:
>>>
>>>> On May 24, 2016, at 2:36 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>>
>>>> On 05/24/2016 02:25 PM, Matthew McClintock wrote:
>>>>> On May 24, 2016, at 2:16 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>>>>
>>>>>> On 05/24/2016 02:13 PM, Matthew McClintock wrote:
>>>>>>> I’m seeing this too, same commit if you want another person to test/reproduce.
>>>>>>
>>>>>> If you do a pull today, does that fix your problem?
>>>>>
>>>>> Hmm, no. Which commit am I looking for? I’m on a56f489502e28caac56c8a0735549740f0ae0711
>>>>
>>>> Commit 84787c572d402644dca4874aba73324d9f8e3948 is working for me. I have a fixup in lib/iov_iter.c with a dump_stack() call if the fixup was needed. That dump is not triggered. I do not seem to have a56f489502e yet.
>>>
>>> Still seeing the issue on top of tree and the above commit. Re-ran bisection just to be sure.
>>
>> Guys, the bug is real and definitely still there.
>> 	char c;
>> 	struct iovec v[2] = {{&c, 0}, {&c, 1}};
>> 	readv(0, v, 2);
>> will trigger it just fine with stdin on e.g. tty.  It needs fixing and I'll
>> post a fix as soon as it gets through the local testing.  In the meanwhile,
>> I would like to know what in userland is doing that kind of call - kernel
>> certainly shouldn't end up in an infinite loop on that, but it's bloody odd
>> and I wonder what's going on in userland code to result in that call.
>>
>> Again, I understand what's going on kernel-side; the only tricky part is how
>> to fix it without bringing the nasal daemons back.  I think I have a solution
>> and I'm going to post it tonight if it survives the local beating.  In any
>> case, the testcase above deserves being added to LTP - it's a real regression.
>
> FWIW, the reproducer is
> #include <sys/uio.>
> main()
> {
> 	char c;
> 	struct iovec v[2] = {{&c,0},{&c,1}};
> 	readv(0, v, 2);
> }
> ran with stdin from tty.  Fix for that is simply
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 28cb431..0cd5227 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -101,7 +101,7 @@
>   #define iterate_and_advance(i, n, v, I, B, K) {			\
>   	if (unlikely(i->count < n))				\
>   		n = i->count;					\
> -	if (n) {						\
> +	if (i->count) {						\
>   		size_t skip = i->iov_offset;			\
>   		if (unlikely(i->type & ITER_BVEC)) {		\
>   			const struct bio_vec *bvec;		\
>
> Could you see if your reproducer is fixed by that?

Yes, that change fixes my reproducer. If this is the final fix, you may add a 
Tested-by: to the commit.

Larry

>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c
  2016-05-25  6:24                 ` Al Viro
  2016-05-25 14:28                   ` Larry Finger
@ 2016-05-25 15:27                   ` Matthew McClintock
  1 sibling, 0 replies; 21+ messages in thread
From: Matthew McClintock @ 2016-05-25 15:27 UTC (permalink / raw)
  To: Al Viro; +Cc: Larry Finger, LKML

On May 25, 2016, at 1:24 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 28cb431..0cd5227 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -101,7 +101,7 @@
> #define iterate_and_advance(i, n, v, I, B, K) {			\
> 	if (unlikely(i->count < n))				\
> 		n = i->count;					\
> -	if (n) {						\
> +	if (i->count) {						\
> 		size_t skip = i->iov_offset;			\
> 		if (unlikely(i->type & ITER_BVEC)) {		\
> 			const struct bio_vec *bvec;		\
> 
> Could you see if your reproducer is fixed by that?

Yes, this fixes my issue.

-M

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-05-25 15:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 21:30 Regression in 4.6.0-git - bisected to commit dd254f5a382c Larry Finger
2016-05-24  0:18 ` Al Viro
2016-05-24  2:55   ` Larry Finger
2016-05-24 16:10   ` Larry Finger
2016-05-24 16:28     ` Al Viro
2016-05-24 18:39       ` Larry Finger
2016-05-24 19:13     ` Matthew McClintock
2016-05-24 19:16       ` Larry Finger
2016-05-24 19:25         ` Matthew McClintock
2016-05-24 19:36           ` Larry Finger
2016-05-24 22:31             ` Matthew McClintock
2016-05-24 23:41               ` Al Viro
2016-05-25  0:58                 ` Matthew McClintock
2016-05-25  1:10                   ` Al Viro
2016-05-25  1:20                     ` Matthew McClintock
2016-05-25  1:28                       ` Al Viro
2016-05-25  2:06                         ` Matthew McClintock
2016-05-25  3:21                           ` Al Viro
2016-05-25  6:24                 ` Al Viro
2016-05-25 14:28                   ` Larry Finger
2016-05-25 15:27                   ` Matthew McClintock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.