All of lore.kernel.org
 help / color / mirror / Atom feed
* [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
@ 2014-08-31 23:07 Nix
  2014-09-01 11:09 ` Oliver Neukum
  0 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-08-31 23:07 UTC (permalink / raw)
  To: oliver; +Cc: linux-kernel

So I upgraded to 3.16.1 and found that the Simtec Entropy Key (a cdc-acm
device) was no longer operational:

fold:~# ekeydctl stats 1
BytesRead=0
BytesWritten=0
ConnectionNonces=0
ConnectionPackets=0
ConnectionRekeys=0
ConnectionResets=0
ConnectionTime=65
EntropyRate=0
FipsFrameRate=0
FrameByteLast=0
FramesOk=0
FramingErrors=0
KeyDbsdShannonPerByteL=0
KeyDbsdShannonPerByteR=0
KeyEnglishBadness=No failure
KeyRawBadness=0
KeyRawShannonPerByteL=0
KeyRawShannonPerByteR=0
KeyRawShannonPerByteX=0
KeyShortBadness=efm_ok
KeyTemperatureC=-273.15
KeyTemperatureF=-459.67
KeyTemperatureK=0
KeyVoltage=0
PacketErrors=0
PacketOK=0
ReadRate=0
TotalEntropy=0
WriteRate=0

This device streams data continuously at at rate of several KiB/s, so
normally we would never expect to see a report of zero bytes read or
written if the key were functional (nor, indeed, a key temperature of
absolute zero!)

It appears that cdc-acm has broken such that no data is received from
this device any more (though it's still being detected, according to the
dmesg log). Something goes very askew with the entropy key -- even after
rebooting back to an earlier kernel, a physical disconnection and
reconnection of the entropy key is needed to make it work again. Whether
this is some sort of cdc-acm-level protocol problem, or a key-level
problem caused by interrupted communication, I have no clue.

3.15.8 works fine.

I'll do a bisection of the cdc-acm changes since 3.15 tomorrow night and
see if I can find the commit at fault.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-08-31 23:07 [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16 Nix
@ 2014-09-01 11:09 ` Oliver Neukum
  2014-09-04 23:40   ` Nix
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Neukum @ 2014-09-01 11:09 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel


> I'll do a bisection of the cdc-acm changes since 3.15 tomorrow night and
> see if I can find the commit at fault.

Thank you for the report. Please let me know the results of your
bisection.

	Regards
		Oliver




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-01 11:09 ` Oliver Neukum
@ 2014-09-04 23:40   ` Nix
  2014-09-05  7:59     ` Oliver Neukum
  2014-10-11 19:05     ` [3.16.1 BISECTED " Nix
  0 siblings, 2 replies; 35+ messages in thread
From: Nix @ 2014-09-04 23:40 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-kernel

On 1 Sep 2014, Oliver Neukum stated:

>
>> I'll do a bisection of the cdc-acm changes since 3.15 tomorrow night and
>> see if I can find the commit at fault.
>
> Thank you for the report. Please let me know the results of your
> bisection.

Bisection underway (fifth attempt -- I *may* have characterized it well
enough after a few hours of thrashing at it to bisect accurately this
time).

Some more random info.

btw, when the Entropy Key has ended up in a messed up state due to this
bug, we sometimes see

[    2.330158] usb 2-1: new full-speed USB device number 2 using ohci-pci
[    2.552465] usb 2-1: device descriptor read/64, error -62
[    2.870142] usb 2-1: device descriptor read/64, error -62
[    3.190150] usb 2-1: new full-speed USB device number 3 using ohci-pci
[    3.410137] usb 2-1: device descriptor read/64, error -62
[    3.740142] usb 2-1: device descriptor read/64, error -62
[    4.060146] usb 2-1: new full-speed USB device number 4 using ohci-pci
[    4.520133] usb 2-1: device not accepting address 4, error -62
[    4.730139] usb 2-1: new full-speed USB device number 5 using ohci-pci
[    5.180117] usb 2-1: device not accepting address 5, error -62
[    5.215194] hub 2-0:1.0: unable to enumerate USB device on port 1

when starting up a working kernel (the key then doesn't work until
physically disconnected and reconnected again).

More generally, the problem may be at *shutdown* -- something goes wrong
during link suspension or something, such that the link never comes up
again until physically reconnected. So a straight bisect is misleading
-- the error may have been in the *last* kernel tested -- and even then,
some kernels (e.g. the 3.15.0 merge base) appear capable of making it
work fine. But even this is not consistent: sometimes a kernel that
works fine if you repeatedly reboot it (such as 3.15) malfunctions when
you reboot into 3.16 -- but sometimes a newly plugged USB key on a 3.16
kernel malfunctions upon reboot, even if you reboot into a working
kernel such as 3.15 (and it then proceeds to work indefinitely if you
unplug and replug it and stick with 3.15.x, but upon rebooting into
3.16.x it goes wrong again).

So sometimes a faulty kernel makes the key go wrong when you restart
into another kernel (faulty or not), and sometimes it makes a key go
wrong when it is restarted into. There doesn't seem to be any
consistency to this that I've spotted, at least not yet.

Upon physical reconnection, the USB key works again, even on afflicted
kernels.

I'm working around this confusing morass by rebooting into each test
kernel, unplugging and replugging the entropy key if it was fubared,
then rebooting into the same kernel again and seeing if it was still
fubared. But this is not terribly fast, particularly not on a headless
compact-flash-based Geode box which doesn't even complete booting
without the entropy source which this bug cuts off :) so it'll be
sometime tomorrow before I can get this bisection done, I'm afraid.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-04 23:40   ` Nix
@ 2014-09-05  7:59     ` Oliver Neukum
  2014-09-05 15:17       ` Nix
  2014-10-11 19:05     ` [3.16.1 BISECTED " Nix
  1 sibling, 1 reply; 35+ messages in thread
From: Oliver Neukum @ 2014-09-05  7:59 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel

On Fri, 2014-09-05 at 00:40 +0100, Nix wrote:
> I'm working around this confusing morass by rebooting into each test
> kernel, unplugging and replugging the entropy key if it was fubared,
> then rebooting into the same kernel again and seeing if it was still
> fubared. But this is not terribly fast, particularly not on a headless
> compact-flash-based Geode box which doesn't even complete booting
> without the entropy source which this bug cuts off :) so it'll be
> sometime tomorrow before I can get this bisection done, I'm afraid.

Ugh. My sympathies. I cannot suggest a better method, I am afraid.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-05  7:59     ` Oliver Neukum
@ 2014-09-05 15:17       ` Nix
  2014-09-08  7:21         ` Oliver Neukum
  0 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-09-05 15:17 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-kernel

On 5 Sep 2014, Oliver Neukum verbalised:

> On Fri, 2014-09-05 at 00:40 +0100, Nix wrote:
>> I'm working around this confusing morass by rebooting into each test
>> kernel, unplugging and replugging the entropy key if it was fubared,
>> then rebooting into the same kernel again and seeing if it was still
>> fubared. But this is not terribly fast, particularly not on a headless
>> compact-flash-based Geode box which doesn't even complete booting
>> without the entropy source which this bug cuts off :) so it'll be
>> sometime tomorrow before I can get this bisection done, I'm afraid.
>
> Ugh. My sympathies. I cannot suggest a better method, I am afraid.

Well, that method doesn't work. I've found pairs of kernels (e.g.
59a3d4c3631e553357b7305dc09db1990aa6757c and
b05d59dfceaea72565b1648af929b037b0f96d7f) where each kernel works on its
own (rebooting from that kernel into the same kernel keeps a working
key, so I would normally assume that each kernel is OK) but rebooting
from the first into the second yields a broken one if it was working
before (so one of them must, in fact, be broken, but I have no clue
which one).

So I can't figure out how to bisect this.

Any suggestions as to what failure-test I might use, or what other
methods I might use to figure out what's going wrong? Not knowing
anything about USB doesn't help here. I don't know for sure that this is
a cdc-acm problem -- bisecting just the cdc-acm driver was fruitless --
so it might be something more generally USBish.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-05 15:17       ` Nix
@ 2014-09-08  7:21         ` Oliver Neukum
  2014-09-08  7:58           ` Nix
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Neukum @ 2014-09-08  7:21 UTC (permalink / raw)
  To: Nix; +Cc: linux-kernel

On Fri, 2014-09-05 at 16:17 +0100, Nix wrote:
> On 5 Sep 2014, Oliver Neukum verbalised:
> 
> > On Fri, 2014-09-05 at 00:40 +0100, Nix wrote:
> >> I'm working around this confusing morass by rebooting into each test
> >> kernel, unplugging and replugging the entropy key if it was fubared,
> >> then rebooting into the same kernel again and seeing if it was still
> >> fubared. But this is not terribly fast, particularly not on a headless
> >> compact-flash-based Geode box which doesn't even complete booting
> >> without the entropy source which this bug cuts off :) so it'll be
> >> sometime tomorrow before I can get this bisection done, I'm afraid.
> >
> > Ugh. My sympathies. I cannot suggest a better method, I am afraid.
> 
> Well, that method doesn't work. I've found pairs of kernels (e.g.
> 59a3d4c3631e553357b7305dc09db1990aa6757c and
> b05d59dfceaea72565b1648af929b037b0f96d7f) where each kernel works on its
> own (rebooting from that kernel into the same kernel keeps a working
> key, so I would normally assume that each kernel is OK) but rebooting
> from the first into the second yields a broken one if it was working
> before (so one of them must, in fact, be broken, but I have no clue
> which one).
> 
> So I can't figure out how to bisect this.
> 
> Any suggestions as to what failure-test I might use, or what other
> methods I might use to figure out what's going wrong? Not knowing
> anything about USB doesn't help here. I don't know for sure that this is
> a cdc-acm problem -- bisecting just the cdc-acm driver was fruitless --
> so it might be something more generally USBish.
> 

Do your kernels work if you start with a known good kernel e.g.
3.15 and then reboot?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-08  7:21         ` Oliver Neukum
@ 2014-09-08  7:58           ` Nix
  0 siblings, 0 replies; 35+ messages in thread
From: Nix @ 2014-09-08  7:58 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: linux-kernel

On 8 Sep 2014, Oliver Neukum said:

> On Fri, 2014-09-05 at 16:17 +0100, Nix wrote:
>> On 5 Sep 2014, Oliver Neukum verbalised:
>> 
>> > On Fri, 2014-09-05 at 00:40 +0100, Nix wrote:
>> >> I'm working around this confusing morass by rebooting into each test
>> >> kernel, unplugging and replugging the entropy key if it was fubared,
>> >> then rebooting into the same kernel again and seeing if it was still
>> >> fubared. But this is not terribly fast, particularly not on a headless
>> >> compact-flash-based Geode box which doesn't even complete booting
>> >> without the entropy source which this bug cuts off :) so it'll be
>> >> sometime tomorrow before I can get this bisection done, I'm afraid.
>> >
>> > Ugh. My sympathies. I cannot suggest a better method, I am afraid.
>> 
>> Well, that method doesn't work. I've found pairs of kernels (e.g.
>> 59a3d4c3631e553357b7305dc09db1990aa6757c and
>> b05d59dfceaea72565b1648af929b037b0f96d7f) where each kernel works on its
>> own (rebooting from that kernel into the same kernel keeps a working
>> key, so I would normally assume that each kernel is OK) but rebooting
>> from the first into the second yields a broken one if it was working
>> before (so one of them must, in fact, be broken, but I have no clue
>> which one).
>> 
>> So I can't figure out how to bisect this.
>> 
>> Any suggestions as to what failure-test I might use, or what other
>> methods I might use to figure out what's going wrong? Not knowing
>> anything about USB doesn't help here. I don't know for sure that this is
>> a cdc-acm problem -- bisecting just the cdc-acm driver was fruitless --
>> so it might be something more generally USBish.
>
> Do your kernels work if you start with a known good kernel e.g.
> 3.15 and then reboot?

That case works -- aha, so I could orchestrate it by going

3.15 -> test -> test -> reset to 3.15 -> test -> test ...

i.e. a triple reboot cycle. Should have thought of that.

God, what a pain :)

I'll give it a try tonight.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-09-04 23:40   ` Nix
  2014-09-05  7:59     ` Oliver Neukum
@ 2014-10-11 19:05     ` Nix
  2014-10-11 19:51       ` Paul Martin
  1 sibling, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-11 19:05 UTC (permalink / raw)
  To: Oliver Neukum, Johan Hovold, Greg Kroah-Hartman, Paul Martin; +Cc: linux-kernel

[Cc:ed someone who knows the people behind the Entropy Key: they're not
 being manufactured at the moment, but he might want to know anyway]

On 5 Sep 2014, nix@esperi.org.uk stated:

> On 1 Sep 2014, Oliver Neukum stated:
>
>>> I'll do a bisection of the cdc-acm changes since 3.15 tomorrow night and
>>> see if I can find the commit at fault.
>>
>> Thank you for the report. Please let me know the results of your
>> bisection.
>
> Bisection underway (fifth attempt -- I *may* have characterized it well
> enough after a few hours of thrashing at it to bisect accurately this
> time).
[...]
> More generally, the problem may be at *shutdown* -- something goes wrong
> during link suspension or something, such that the link never comes up
> again until physically reconnected. So a straight bisect is misleading
> -- the error may have been in the *last* kernel tested -- and even then,
> some kernels (e.g. the 3.15.0 merge base) appear capable of making it
> work fine. But even this is not consistent: sometimes a kernel that
> works fine if you repeatedly reboot it (such as 3.15) malfunctions when
> you reboot into 3.16 -- but sometimes a newly plugged USB key on a 3.16
> kernel malfunctions upon reboot, even if you reboot into a working
> kernel such as 3.15 (and it then proceeds to work indefinitely if you
> unplug and replug it and stick with 3.15.x, but upon rebooting into
> 3.16.x it goes wrong again).

*Finally* bisected, not helped by the fact that I sometimes needed up to
five reboots (!) to see a failure. The guilty commit is this one:

commit 0943d8ead30e9474034cc5e92225ab0fd29fd0d4
Author: Johan Hovold <jhovold@gmail.com>
Date:   Mon May 26 19:23:51 2014 +0200

    USB: cdc-acm: use tty-port dtr_rts

    Add dtr_rts tty-port operation which implements proper DTR/RTS handling
    (e.g. only lower DTR/RTS during shutdown if HUPCL is set).

    Note that modem-control locking still needs to be added throughout the
    driver.

    Signed-off-by: Johan Hovold <jhovold@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

To re-describe this failure for the people who weren't in the thread: in
3.16.x I often see this output when asking the ekey daemon for the state
of my Simtec Entropy Key (a cdc-acm-based random number generator) after
rebooting my ohci-based Soekris net5501:

fold:~# ekeydctl stats 1
BytesRead=0
BytesWritten=0
ConnectionNonces=0
ConnectionPackets=0
ConnectionRekeys=0
ConnectionResets=0
ConnectionTime=65
EntropyRate=0
FipsFrameRate=0
FrameByteLast=0
FramesOk=0
FramingErrors=0
KeyDbsdShannonPerByteL=0
KeyDbsdShannonPerByteR=0
KeyEnglishBadness=No failure
KeyRawBadness=0
KeyRawShannonPerByteL=0
KeyRawShannonPerByteR=0
KeyRawShannonPerByteX=0
KeyShortBadness=efm_ok
KeyTemperatureC=-273.15
KeyTemperatureF=-459.67
KeyTemperatureK=0
KeyVoltage=0
PacketErrors=0
PacketOK=0
ReadRate=0
TotalEntropy=0
WriteRate=0

This device streams data continuously at at rate of several KiB/s, so
normally we would never expect to see a report of zero bytes read or
written if the key were functional (nor, indeed, a key temperature of
absolute zero!). This failure never occurred in 3.15.x nor any earlier
kernel. (Note: the 'no failure' message above is sent *from the key* to
indicate that the random numbers can be trusted: it is a bit unfortunate
that the code for 'No failure' is 0, which is also the default value
before anything is received from the key. In this case, we're just
seeing the daemon's initialization-time default. As BytesRead indicates,
the key is not talking to us.)

The symptoms are such that it is the kernel you reboot *from* that
causes the failure, not the one you reboot into: once the key fails it
never recovers without physical removal and reinsertion (or, one
presumes, a poweroff of the whole machine, but I haven't tried that)

This is not a consistent failure: sometimes it can take up to four
reboots for the key to fail. As a result, the bisection took forever (I
had to wait until I had a spare weekend day to devote to it). Despite
the errative nature, I'm fairly confident this commit is at fault: with
it reverted, I have restarted a couple of dozen times without failure
symptoms.

(I speculate that the device's firmware may be terminally confused by
having something try to hang it up, since it's not a modem nor anything
like one, as the boot messages correctly proclaim. The firmware isn't
open, so I can't check.)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-11 19:05     ` [3.16.1 BISECTED " Nix
@ 2014-10-11 19:51       ` Paul Martin
  2014-10-11 22:24         ` Nix
  0 siblings, 1 reply; 35+ messages in thread
From: Paul Martin @ 2014-10-11 19:51 UTC (permalink / raw)
  To: Nix; +Cc: Oliver Neukum, Johan Hovold, Greg Kroah-Hartman, linux-kernel

Having been privy to the firmware of the eKey, it is very simplisting,
with no implementation whatsoever of any flow control.

-- 
Paul Martin <pm@debian.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-11 19:51       ` Paul Martin
@ 2014-10-11 22:24         ` Nix
  2014-10-12 11:14           ` Paul Martin
  2014-10-12 18:58           ` Johan Hovold
  0 siblings, 2 replies; 35+ messages in thread
From: Nix @ 2014-10-11 22:24 UTC (permalink / raw)
  To: Paul Martin; +Cc: Oliver Neukum, Johan Hovold, Greg Kroah-Hartman, linux-kernel

On 11 Oct 2014, Paul Martin spake thusly:

> Having been privy to the firmware of the eKey, it is very simplisting,
> with no implementation whatsoever of any flow control.

That's what I thought. (Why would something that just provides data at a
constant rate way below that of even the slowest USB bus *need* flow
control?)

One presumes therefore that the kernel suddenly trying to do flow
control on shutdown would fubar the firmware's internal state, leading
to the symptoms I see.

So, the question becomes, is there a way to spot this general 'no flow
control on this device' thing from the kernel side, or do we need a
blacklist? Or, perhaps, if this is commonplace for cdc-acm devices, a
whitelist? I can't imagine it's *that* commonplace or someone would have
spotted this already in the months and months it took me to do the
bisection.

Maybe all non-modem cdc-acm devices should eschew flow control, or
something? (This is a genuine guess and is almost certainly wrong.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-11 22:24         ` Nix
@ 2014-10-12 11:14           ` Paul Martin
  2014-10-12 18:58           ` Johan Hovold
  1 sibling, 0 replies; 35+ messages in thread
From: Paul Martin @ 2014-10-12 11:14 UTC (permalink / raw)
  To: Nix
  Cc: Oliver Neukum, Johan Hovold, Greg Kroah-Hartman, linux-kernel,
	Daniel Silverstone

On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> On 11 Oct 2014, Paul Martin spake thusly:
> 
> > Having been privy to the firmware of the eKey, it is very simplistic,
> > with no implementation whatsoever of any flow control.
> 
> That's what I thought. (Why would something that just provides data at a
> constant rate way below that of even the slowest USB bus *need* flow
> control?)
> 
> One presumes therefore that the kernel suddenly trying to do flow
> control on shutdown would fubar the firmware's internal state, leading
> to the symptoms I see.
> 
> So, the question becomes, is there a way to spot this general 'no flow
> control on this device' thing from the kernel side, or do we need a
> blacklist? Or, perhaps, if this is commonplace for cdc-acm devices, a
> whitelist? I can't imagine it's *that* commonplace or someone would have
> spotted this already in the months and months it took me to do the
> bisection.
> 
> Maybe all non-modem cdc-acm devices should eschew flow control, or
> something? (This is a genuine guess and is almost certainly wrong.)
> 

I'm going to pass this on to Daniel Silverstone, who did the actual
coding of the firmware.

-- 
Paul Martin <pm@debian.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-11 22:24         ` Nix
  2014-10-12 11:14           ` Paul Martin
@ 2014-10-12 18:58           ` Johan Hovold
  2014-10-12 21:36             ` Nix
  1 sibling, 1 reply; 35+ messages in thread
From: Johan Hovold @ 2014-10-12 18:58 UTC (permalink / raw)
  To: Nix
  Cc: Paul Martin, Oliver Neukum, Johan Hovold, Greg Kroah-Hartman,
	linux-kernel

On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> On 11 Oct 2014, Paul Martin spake thusly:
> 
> > Having been privy to the firmware of the eKey, it is very simplisting,
> > with no implementation whatsoever of any flow control.
> 
> That's what I thought. (Why would something that just provides data at a
> constant rate way below that of even the slowest USB bus *need* flow
> control?)
> 
> One presumes therefore that the kernel suddenly trying to do flow
> control on shutdown would fubar the firmware's internal state, leading
> to the symptoms I see.

The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
the commit you refer to. One thing it did change however is that this is
now only done if HUPCL is set. Might setting that flag be enough to
prevent the device firmware from crashing?

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-12 18:58           ` Johan Hovold
@ 2014-10-12 21:36             ` Nix
  2014-10-14  8:34               ` Johan Hovold
  0 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-12 21:36 UTC (permalink / raw)
  To: Johan Hovold; +Cc: Paul Martin, Oliver Neukum, Greg Kroah-Hartman, linux-kernel

On 12 Oct 2014, Johan Hovold verbalised:

> On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
>> On 11 Oct 2014, Paul Martin spake thusly:
>> 
>> > Having been privy to the firmware of the eKey, it is very simplisting,
>> > with no implementation whatsoever of any flow control.
>> 
>> That's what I thought. (Why would something that just provides data at a
>> constant rate way below that of even the slowest USB bus *need* flow
>> control?)
>> 
>> One presumes therefore that the kernel suddenly trying to do flow
>> control on shutdown would fubar the firmware's internal state, leading
>> to the symptoms I see.
>
> The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
> the commit you refer to. One thing it did change however is that this is
> now only done if HUPCL is set. Might setting that flag be enough to
> prevent the device firmware from crashing?

If I read the ekeyd 1.1.5 source code correctly, this is already
happening:

,----[ host/stream.c:estream_open() ]
|     } else if (S_ISCHR(sbuf.st_mode)) {
|         /* Open the file as a character device/tty */
|         fd = open(uri, O_RDWR | O_NOCTTY);
|         if ((fd != -1) && (isatty(fd))) {
|             if (tcgetattr(fd, &settings) == 0 ) {
|                 settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
|                                       CREAD | PARODD | CRTSCTS);
|                 settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
|                                       ISTRIP | INLCR | IGNCR | ICRNL | IXON |
|                                       IXOFF  | IXANY | IMAXBEL);
|                 settings.c_iflag |= IGNBRK;
|                 settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
|                 settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
|                                       ECHOE | ECHOK | ECHONL | NOFLSH |
|                                       TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
|                 settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
| #ifdef EKEY_FULL_TERMIOS
|                 settings.c_cflag &= ~(CBAUD);
| 		  settings.c_iflag &= ~(IUTF8 | IUCLC);
|                 settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
|                                       BSDLY | VTDLY | FFDLY | OLCUC );
|                 settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
|                 settings.c_lflag &= ~(XCASE);
| #endif
|                 settings.c_cflag |= B115200;
|                 if (tcsetattr(fd, TCSANOW, &settings) < 0) {
`----

Note the HUPCL in there.

I have checked: this code is being executed against a symlink that
points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
succeeding on the kernel I'm running now, but of course that's 3.16.5
with this commit reverted...)

Of course the daemon is stopped before the reboot, closing the device.
(But even if it wasn't, one would assume that the fact the system was
rebooting would be considered tantamount to a close!)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-12 21:36             ` Nix
@ 2014-10-14  8:34               ` Johan Hovold
  2014-10-14 14:44                 ` Nix
                                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Johan Hovold @ 2014-10-14  8:34 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Oliver Neukum, Greg Kroah-Hartman,
	linux-kernel

On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> On 12 Oct 2014, Johan Hovold verbalised:
> 
> > On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
> >> On 11 Oct 2014, Paul Martin spake thusly:
> >> 
> >> > Having been privy to the firmware of the eKey, it is very simplisting,
> >> > with no implementation whatsoever of any flow control.
> >> 
> >> That's what I thought. (Why would something that just provides data at a
> >> constant rate way below that of even the slowest USB bus *need* flow
> >> control?)
> >> 
> >> One presumes therefore that the kernel suddenly trying to do flow
> >> control on shutdown would fubar the firmware's internal state, leading
> >> to the symptoms I see.
> >
> > The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
> > the commit you refer to. One thing it did change however is that this is
> > now only done if HUPCL is set. Might setting that flag be enough to
> > prevent the device firmware from crashing?
> 
> If I read the ekeyd 1.1.5 source code correctly, this is already
> happening:
> 
> ,----[ host/stream.c:estream_open() ]
> |     } else if (S_ISCHR(sbuf.st_mode)) {
> |         /* Open the file as a character device/tty */
> |         fd = open(uri, O_RDWR | O_NOCTTY);
> |         if ((fd != -1) && (isatty(fd))) {
> |             if (tcgetattr(fd, &settings) == 0 ) {
> |                 settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
> |                                       CREAD | PARODD | CRTSCTS);
> |                 settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
> |                                       ISTRIP | INLCR | IGNCR | ICRNL | IXON |
> |                                       IXOFF  | IXANY | IMAXBEL);
> |                 settings.c_iflag |= IGNBRK;
> |                 settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
> |                 settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
> |                                       ECHOE | ECHOK | ECHONL | NOFLSH |
> |                                       TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
> |                 settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
> | #ifdef EKEY_FULL_TERMIOS
> |                 settings.c_cflag &= ~(CBAUD);
> | 		  settings.c_iflag &= ~(IUTF8 | IUCLC);
> |                 settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
> |                                       BSDLY | VTDLY | FFDLY | OLCUC );
> |                 settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
> |                 settings.c_lflag &= ~(XCASE);
> | #endif
> |                 settings.c_cflag |= B115200;
> |                 if (tcsetattr(fd, TCSANOW, &settings) < 0) {
> `----
> 
> Note the HUPCL in there.
> 
> I have checked: this code is being executed against a symlink that
> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> succeeding on the kernel I'm running now, but of course that's 3.16.5
> with this commit reverted...)

You could verify that by enabling debugging in the cdc-acm driver and
making sure that the corresponding control messages are indeed sent on
close.

But you haven't seen any fw crashes since you reverted the commit in
question?

Another thing you could try is to add back the 

	acm_set_control(acm, 0);

just after the dev_info message in probe.

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-14  8:34               ` Johan Hovold
@ 2014-10-14 14:44                 ` Nix
  2014-10-17 13:21                 ` Nix
  2014-10-22  9:31                 ` Nix
  2 siblings, 0 replies; 35+ messages in thread
From: Nix @ 2014-10-14 14:44 UTC (permalink / raw)
  To: Johan Hovold; +Cc: Paul Martin, Oliver Neukum, Greg Kroah-Hartman, linux-kernel

On 14 Oct 2014, Johan Hovold spake thusly:

> On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> On 12 Oct 2014, Johan Hovold verbalised:
>> 
>> > On Sat, Oct 11, 2014 at 11:24:59PM +0100, Nix wrote:
>> >> On 11 Oct 2014, Paul Martin spake thusly:
>> >> 
>> >> > Having been privy to the firmware of the eKey, it is very simplisting,
>> >> > with no implementation whatsoever of any flow control.
>> >> 
>> >> That's what I thought. (Why would something that just provides data at a
>> >> constant rate way below that of even the slowest USB bus *need* flow
>> >> control?)
>> >> 
>> >> One presumes therefore that the kernel suddenly trying to do flow
>> >> control on shutdown would fubar the firmware's internal state, leading
>> >> to the symptoms I see.
>> >
>> > The cdc-acm driver was dropping DTR/RTS on shutdown (close) also before
>> > the commit you refer to. One thing it did change however is that this is
>> > now only done if HUPCL is set. Might setting that flag be enough to
>> > prevent the device firmware from crashing?
>> 
>> If I read the ekeyd 1.1.5 source code correctly, this is already
>> happening:
>> 
>> ,----[ host/stream.c:estream_open() ]
>> |     } else if (S_ISCHR(sbuf.st_mode)) {
>> |         /* Open the file as a character device/tty */
>> |         fd = open(uri, O_RDWR | O_NOCTTY);
>> |         if ((fd != -1) && (isatty(fd))) {
>> |             if (tcgetattr(fd, &settings) == 0 ) {
>> |                 settings.c_cflag &= ~(CSIZE | CSTOPB | PARENB | CLOCAL |
>> |                                       CREAD | PARODD | CRTSCTS);
>> |                 settings.c_iflag &= ~(BRKINT | IGNPAR | PARMRK | INPCK |
>> |                                       ISTRIP | INLCR | IGNCR | ICRNL | IXON |
>> |                                       IXOFF  | IXANY | IMAXBEL);
>> |                 settings.c_iflag |= IGNBRK;
>> |                 settings.c_oflag &= ~(OPOST | OCRNL | ONOCR | ONLRET);
>> |                 settings.c_lflag &= ~(ISIG | ICANON | IEXTEN | ECHO |
>> |                                       ECHOE | ECHOK | ECHONL | NOFLSH |
>> |                                       TOSTOP | ECHOPRT | ECHOCTL | ECHOKE);
>> |                 settings.c_cflag |= CS8 | HUPCL | CREAD | CLOCAL;
>> | #ifdef EKEY_FULL_TERMIOS
>> |                 settings.c_cflag &= ~(CBAUD);
>> | 		  settings.c_iflag &= ~(IUTF8 | IUCLC);
>> |                 settings.c_oflag &= ~(OFILL | OFDEL | NLDLY | CRDLY | TABDLY |
>> |                                       BSDLY | VTDLY | FFDLY | OLCUC );
>> |                 settings.c_oflag |= NL0 | CR0 | TAB0 | BS0 | VT0 | FF0;
>> |                 settings.c_lflag &= ~(XCASE);
>> | #endif
>> |                 settings.c_cflag |= B115200;
>> |                 if (tcsetattr(fd, TCSANOW, &settings) < 0) {
>> `----
>> 
>> Note the HUPCL in there.
>> 
>> I have checked: this code is being executed against a symlink that
>> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> with this commit reverted...)
>
> You could verify that by enabling debugging in the cdc-acm driver and
> making sure that the corresponding control messages are indeed sent on
> close.

Yeah, good idea -- in the specific context of the system rebooting, in
particular (though I'll also see if just killing the daemon causes this
problem, something I haven't tested). I was assuming it would be hard to
see it before the reboot process cleared the screen -- but of course
this machine has a serial console so that's not important.

> But you haven't seen any fw crashes since you reverted the commit in
> question?

Not one.

> Another thing you could try is to add back the 
>
> 	acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

I'll try that tonight.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-14  8:34               ` Johan Hovold
  2014-10-14 14:44                 ` Nix
@ 2014-10-17 13:21                 ` Nix
  2014-10-19 13:45                   ` Johan Hovold
  2014-10-22  9:31                 ` Nix
  2 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-17 13:21 UTC (permalink / raw)
  To: Johan Hovold; +Cc: Paul Martin, Oliver Neukum, Greg Kroah-Hartman, linux-kernel

On 14 Oct 2014, Johan Hovold outgrape:
> Another thing you could try is to add back the 
>
> 	acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

"Add back" suggests that this line existed before this change. It
didn't, as far as I can see. Probing has

acm_set_control(acm, acm->ctrlout);

*shutdown* has

acm_set_control(acm, acm->ctrlout = 0);

Did you mean that I should try adding back the acm_set_control() during
shutdown, or the one during probe?

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-17 13:21                 ` Nix
@ 2014-10-19 13:45                   ` Johan Hovold
  0 siblings, 0 replies; 35+ messages in thread
From: Johan Hovold @ 2014-10-19 13:45 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Oliver Neukum, Greg Kroah-Hartman,
	linux-kernel

On Fri, Oct 17, 2014 at 02:21:35PM +0100, Nix wrote:
> On 14 Oct 2014, Johan Hovold outgrape:
> > Another thing you could try is to add back the 
> >
> > 	acm_set_control(acm, 0);
> >
> > just after the dev_info message in probe.
> 
> "Add back" suggests that this line existed before this change. It
> didn't, as far as I can see. Probing has
> 
> acm_set_control(acm, acm->ctrlout);
> 
> *shutdown* has
> 
> acm_set_control(acm, acm->ctrlout = 0);
> 
> Did you mean that I should try adding back the acm_set_control() during
> shutdown, or the one during probe?

After the dev_info in probe as I wrote. acm->ctrlout will always be 0 at
that point, could have been more clear about that.

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-14  8:34               ` Johan Hovold
  2014-10-14 14:44                 ` Nix
  2014-10-17 13:21                 ` Nix
@ 2014-10-22  9:31                 ` Nix
  2014-10-22 10:14                   ` Johan Hovold
  2 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-22  9:31 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel

On 14 Oct 2014, Johan Hovold verbalised:

> On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> I have checked: this code is being executed against a symlink that
>> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> with this commit reverted...)
>
> You could verify that by enabling debugging in the cdc-acm driver and
> making sure that the corresponding control messages are indeed sent on
> close.

I have a debugging dump at
<http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
voluminous because the ekeyd is constantly doing USB reads, but the end
says

Oct 22 10:19:13 fold kern debug: : [   88.423970] cdc_acm 2-1:1.0: acm_tty_close
Oct 22 10:19:13 fold kern debug: : [   88.424012] cdc_acm 2-1:1.0: acm_port_shutdown
Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result 0
Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.447588] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
Oct 22 10:19:13 fold kern debug: : [   88.447613] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.448575] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
Oct 22 10:19:13 fold kern debug: : [   88.448599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.449576] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
Oct 22 10:19:13 fold kern debug: : [   88.449599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.450578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
Oct 22 10:19:13 fold kern debug: : [   88.450602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.451573] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
Oct 22 10:19:13 fold kern debug: : [   88.451596] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.452574] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
Oct 22 10:19:13 fold kern debug: : [   88.452597] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.453567] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
Oct 22 10:19:13 fold kern debug: : [   88.453588] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.454570] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
Oct 22 10:19:13 fold kern debug: : [   88.454592] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.462591] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
Oct 22 10:19:13 fold kern debug: : [   88.462619] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.463568] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
Oct 22 10:19:13 fold kern debug: : [   88.463590] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.464564] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
Oct 22 10:19:13 fold kern debug: : [   88.464585] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.465578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
Oct 22 10:19:13 fold kern debug: : [   88.465602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 22 10:19:13 fold kern debug: : [   88.466566] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
Oct 22 10:19:13 fold kern debug: : [   88.466587] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2

which looks, hm, a bit suspicious to me.

> But you haven't seen any fw crashes since you reverted the commit in
> question?

Not a one.

> Another thing you could try is to add back the 
>
> 	acm_set_control(acm, 0);
>
> just after the dev_info message in probe.

Tried that (with, obviously, the commit not reverted) -- rebooted, and

BytesRead=0
BytesWritten=0
ConnectionNonces=0
ConnectionPackets=0
ConnectionRekeys=0
ConnectionResets=0
ConnectionTime=46
EntropyRate=0
FipsFrameRate=0
FrameByteLast=0
FramesOk=0
FramingErrors=0
KeyDbsdShannonPerByteL=0
KeyDbsdShannonPerByteR=0
KeyEnglishBadness=No failure
KeyRawBadness=0
KeyRawShannonPerByteL=0
KeyRawShannonPerByteR=0
KeyRawShannonPerByteX=0
KeyShortBadness=efm_ok
KeyTemperatureC=-273.15
KeyTemperatureF=-459.67
KeyTemperatureK=0
KeyVoltage=0
PacketErrors=0
PacketOK=0
ReadRate=0
TotalEntropy=0
WriteRate=0

So that doesn't help.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-22  9:31                 ` Nix
@ 2014-10-22 10:14                   ` Johan Hovold
  2014-10-22 14:00                     ` Nix
  2014-10-22 15:36                     ` Nix
  0 siblings, 2 replies; 35+ messages in thread
From: Johan Hovold @ 2014-10-22 10:14 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel

On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> On 14 Oct 2014, Johan Hovold verbalised:
> 
> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> I have checked: this code is being executed against a symlink that
> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> with this commit reverted...)
> >
> > You could verify that by enabling debugging in the cdc-acm driver and
> > making sure that the corresponding control messages are indeed sent on
> > close.
> 
> I have a debugging dump at
> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly

What kernel were you using here? The log seems to suggest that it was
generated with the commit in question reverted.

> voluminous because the ekeyd is constantly doing USB reads, but the end
> says

You can enable debugging selectively using the debugfs control file of
dynamic debug, see:

	Documentation/dynamic-debug-howto.txt

> Oct 22 10:19:13 fold kern debug: : [   88.423970] cdc_acm 2-1:1.0: acm_tty_close
> Oct 22 10:19:13 fold kern debug: : [   88.424012] cdc_acm 2-1:1.0: acm_port_shutdown
> Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result 0

So DTR/RTS is indeed lowered (i.e. HUPCL is set).

> Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.440038] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.447588] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.447613] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.448575] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.448599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.449576] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.449599] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.450578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.450602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.451573] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.451596] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.452574] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.452597] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.453567] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.453588] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.454570] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.454592] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.462591] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.462619] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.463568] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.463590] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.464564] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.464585] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.465578] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.465602] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 22 10:19:13 fold kern debug: : [   88.466566] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
> Oct 22 10:19:13 fold kern debug: : [   88.466587] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> 
> which looks, hm, a bit suspicious to me.

That's normal, it's just the urbs being killed (stopped) at close.

> > But you haven't seen any fw crashes since you reverted the commit in
> > question?
> 
> Not a one.
> 
> > Another thing you could try is to add back the 
> >
> > 	acm_set_control(acm, 0);
> >
> > just after the dev_info message in probe.
> 
> Tried that (with, obviously, the commit not reverted) -- rebooted, and
> 
> BytesRead=0
> BytesWritten=0
> ConnectionNonces=0
> ConnectionPackets=0
> ConnectionRekeys=0
> ConnectionResets=0
> ConnectionTime=46
> EntropyRate=0
> FipsFrameRate=0
> FrameByteLast=0
> FramesOk=0
> FramingErrors=0
> KeyDbsdShannonPerByteL=0
> KeyDbsdShannonPerByteR=0
> KeyEnglishBadness=No failure
> KeyRawBadness=0
> KeyRawShannonPerByteL=0
> KeyRawShannonPerByteR=0
> KeyRawShannonPerByteX=0
> KeyShortBadness=efm_ok
> KeyTemperatureC=-273.15
> KeyTemperatureF=-459.67
> KeyTemperatureK=0
> KeyVoltage=0
> PacketErrors=0
> PacketOK=0
> ReadRate=0
> TotalEntropy=0
> WriteRate=0
> 
> So that doesn't help.

Good to know. It was a bit of a long shot.

It seems the only thing that have changed then is that DTR/RTS is now
raised after the read urbs have been submitted during open (rather than
after the ctrlurb was submitted).

Perhaps this slight change in timing causes the device to misbehave
occasionally.

What kernel version are you using? And do you have autosuspend enabled?

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-22 10:14                   ` Johan Hovold
@ 2014-10-22 14:00                     ` Nix
  2014-10-22 15:36                     ` Nix
  1 sibling, 0 replies; 35+ messages in thread
From: Nix @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel

On 22 Oct 2014, Johan Hovold uttered the following:

> On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> On 14 Oct 2014, Johan Hovold verbalised:
>> 
>> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> I have checked: this code is being executed against a symlink that
>> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> with this commit reverted...)
>> >
>> > You could verify that by enabling debugging in the cdc-acm driver and
>> > making sure that the corresponding control messages are indeed sent on
>> > close.
>> 
>> I have a debugging dump at
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>
> What kernel were you using here? The log seems to suggest that it was
> generated with the commit in question reverted.

Wurgle. I think I was probably using the wrong one, as you suggest.
Hell. That's no use at all is it. Still, at least you know what it looks
like when it works. :)

I'll retry with the right one.

> What kernel version are you using? And do you have autosuspend enabled?

3.16.6 (but it's happened for 3.16.* obviously). No autosuspend (no
*way* can this box suspend, but it only draws a couple of watts so I
don't care).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-22 10:14                   ` Johan Hovold
  2014-10-22 14:00                     ` Nix
@ 2014-10-22 15:36                     ` Nix
  2014-10-24 11:14                       ` Johan Hovold
  1 sibling, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-22 15:36 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel

On 22 Oct 2014, Johan Hovold outgrape:

> On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> On 14 Oct 2014, Johan Hovold verbalised:
>> 
>> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> I have checked: this code is being executed against a symlink that
>> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> with this commit reverted...)
>> >
>> > You could verify that by enabling debugging in the cdc-acm driver and
>> > making sure that the corresponding control messages are indeed sent on
>> > close.
>> 
>> I have a debugging dump at
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>
> What kernel were you using here? The log seems to suggest that it was
> generated with the commit in question reverted.

Look now :) (the previous log is still there in cdc-acm-reverted.log.)

This contains two boots -- one on which the USB key worked, and the next
(with an identical kernel) with which the key was broken. (I'm not sure
whether this problem happens at startup or shutdown time, so it seemed
best to provide both.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-22 15:36                     ` Nix
@ 2014-10-24 11:14                       ` Johan Hovold
  2014-10-24 15:08                         ` Nix
  2014-10-31 16:44                         ` Nix
  0 siblings, 2 replies; 35+ messages in thread
From: Johan Hovold @ 2014-10-24 11:14 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

[ +CC: linux-usb ]

On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
> On 22 Oct 2014, Johan Hovold outgrape:
> 
> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> >> On 14 Oct 2014, Johan Hovold verbalised:
> >> 
> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> >> I have checked: this code is being executed against a symlink that
> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> >> with this commit reverted...)
> >> >
> >> > You could verify that by enabling debugging in the cdc-acm driver and
> >> > making sure that the corresponding control messages are indeed sent on
> >> > close.
> >> 
> >> I have a debugging dump at
> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
> >
> > What kernel were you using here? The log seems to suggest that it was
> > generated with the commit in question reverted.
> 
> Look now :) (the previous log is still there in cdc-acm-reverted.log.)

Unfortunately, it seems the logs are incomplete. There are lots of
entries missing (e.g. "acm_tty_install" when opening, but also some
"acm_submit_read_urb"), some of which were there in your first log.

Also you can disable the "acm_tty_write - count" output, which isn't
really useful here.

> This contains two boots -- one on which the USB key worked, and the next
> (with an identical kernel) with which the key was broken. (I'm not sure
> whether this problem happens at startup or shutdown time, so it seemed
> best to provide both.)

That's a good idea.

Is it only after reboot you've seen the device fail? What if you
physically disconnect and reconnect the device instead, or simply
repeatedly open and close it?

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-24 11:14                       ` Johan Hovold
@ 2014-10-24 15:08                         ` Nix
  2014-10-31 16:44                         ` Nix
  1 sibling, 0 replies; 35+ messages in thread
From: Nix @ 2014-10-24 15:08 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On 24 Oct 2014, Johan Hovold told this:

> [ +CC: linux-usb ]
> On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> On 22 Oct 2014, Johan Hovold outgrape:
>> 
>> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >> 
>> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> I have checked: this code is being executed against a symlink that
>> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> with this commit reverted...)
>> >> >
>> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> > making sure that the corresponding control messages are indeed sent on
>> >> > close.
>> >> 
>> >> I have a debugging dump at
>> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >
>> > What kernel were you using here? The log seems to suggest that it was
>> > generated with the commit in question reverted.
>> 
>> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>
> Unfortunately, it seems the logs are incomplete. There are lots of
> entries missing (e.g. "acm_tty_install" when opening, but also some
> "acm_submit_read_urb"), some of which were there in your first log.

Curious. It's a straight cut-and-paste from the syslog. Maybe the kmsg
buffer overflowed, but I start the ekey daemon *after* I start syslogd,
so that seems unlikely.

I'll have another look.

> Is it only after reboot you've seen the device fail?

Yes (though sometimes after reboot of an unaffected kernel into an
affected one! i.e. sometimes the first boot into an affected kernel is
broken).

>                                                      What if you
> physically disconnect and reconnect the device instead,

That makes it work.

>                                                         or simply
> repeatedly open and close it?

IIRC -- but I'll have to check this tomorrow when I look at the log
problem, so don't take it as gospel -- that doesn't affect it: I can
stop and restart the daemon repeatedly and, if it wasn't working before,
it's not working afterwards: if it was working before, it'll be working
afterwards.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-24 11:14                       ` Johan Hovold
  2014-10-24 15:08                         ` Nix
@ 2014-10-31 16:44                         ` Nix
  2014-11-05 11:56                           ` Johan Hovold
  1 sibling, 1 reply; 35+ messages in thread
From: Nix @ 2014-10-31 16:44 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

Sorry for the delay: illness and work-related release time flurries.

On 24 Oct 2014, Johan Hovold told this:

> [ +CC: linux-usb ]
>
> On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> On 22 Oct 2014, Johan Hovold outgrape:
>> 
>> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >> 
>> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> I have checked: this code is being executed against a symlink that
>> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> with this commit reverted...)
>> >> >
>> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> > making sure that the corresponding control messages are indeed sent on
>> >> > close.
>> >> 
>> >> I have a debugging dump at
>> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >
>> > What kernel were you using here? The log seems to suggest that it was
>> > generated with the commit in question reverted.
>> 
>> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>
> Unfortunately, it seems the logs are incomplete. There are lots of
> entries missing (e.g. "acm_tty_install" when opening, but also some
> "acm_submit_read_urb"), some of which were there in your first log.

OK. What we have now in
<http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
the dmesg buffer size so the top isn't being cut off any more. It took a
lot of boots for it to fail this time: about a dozen. The log contains
the boot that failed and the one before.

(To my uneducated eye, the initial traffic to/from the key looks *very*
different in the second boot. Something is clearly wrong by this point,
but that's not much of a surprise!)

>> This contains two boots -- one on which the USB key worked, and the next
>> (with an identical kernel) with which the key was broken. (I'm not sure
>> whether this problem happens at startup or shutdown time, so it seemed
>> best to provide both.)
>
> That's a good idea.
>
> Is it only after reboot you've seen the device fail?

So far.

>                                                      What if you
> physically disconnect and reconnect the device instead, or simply

That fixes it, in fact it's the only way to fix it once it's broken by
this bug.

> repeatedly open and close it?

Hm. Good idea.

... no, that doesn't help. Additional log from that shows a lot of what
looks like error returns:

Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: acm_tty_close
Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
Oct 31 16:38:08 fold kern debug: : [  173.130557] cdc_acm 2-1:1.0: acm_port_shutdown
Oct 31 16:38:08 fold kern debug: : [  173.131475] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
Oct 31 16:38:08 fold kern debug: : [  173.132474] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.132519] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.133473] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.133510] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.134471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.134507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.135471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.135509] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.136472] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.136507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.137471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.137517] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.138471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.138520] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.139469] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.139515] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.140470] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
Oct 31 16:38:08 fold kern debug: : [  173.140491] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.141469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
Oct 31 16:38:08 fold kern debug: : [  173.141489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.142469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
Oct 31 16:38:08 fold kern debug: : [  173.142490] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.143468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
Oct 31 16:38:08 fold kern debug: : [  173.143488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.144468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
Oct 31 16:38:08 fold kern debug: : [  173.144488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.145467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
Oct 31 16:38:08 fold kern debug: : [  173.145488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.146467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
Oct 31 16:38:08 fold kern debug: : [  173.146487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.147477] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
Oct 31 16:38:08 fold kern debug: : [  173.147498] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.153496] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
Oct 31 16:38:08 fold kern debug: : [  173.153524] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.154468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
Oct 31 16:38:08 fold kern debug: : [  173.154489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.155466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
Oct 31 16:38:08 fold kern debug: : [  173.155487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.156466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
Oct 31 16:38:08 fold kern debug: : [  173.156487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.157466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
Oct 31 16:38:08 fold kern debug: : [  173.157487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.158466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
Oct 31 16:38:08 fold kern debug: : [  173.158486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.159465] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 14, len 0
Oct 31 16:38:08 fold kern debug: : [  173.159486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.160475] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 15, len 0
Oct 31 16:38:08 fold kern debug: : [  173.160496] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.160567] tty ttyACM0: acm_tty_open
Oct 31 16:38:08 fold kern debug: : [  173.160588] cdc_acm 2-1:1.0: acm_port_activate
Oct 31 16:38:08 fold kern debug: : [  173.160626] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 0
Oct 31 16:38:08 fold kern debug: : [  173.160651] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 1
Oct 31 16:38:08 fold kern debug: : [  173.160674] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 2
Oct 31 16:38:08 fold kern debug: : [  173.160796] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 3
Oct 31 16:38:08 fold kern debug: : [  173.160820] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 4
Oct 31 16:38:08 fold kern debug: : [  173.160842] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 5
Oct 31 16:38:08 fold kern debug: : [  173.160863] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 6
Oct 31 16:38:08 fold kern debug: : [  173.160884] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 7
Oct 31 16:38:08 fold kern debug: : [  173.160906] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 8
Oct 31 16:38:08 fold kern debug: : [  173.160927] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 9
Oct 31 16:38:08 fold kern debug: : [  173.160948] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 10
Oct 31 16:38:08 fold kern debug: : [  173.160969] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 11
Oct 31 16:38:08 fold kern debug: : [  173.160991] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 12
Oct 31 16:38:08 fold kern debug: : [  173.161012] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 13
Oct 31 16:38:08 fold kern debug: : [  173.161033] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 14
Oct 31 16:38:08 fold kern debug: : [  173.161054] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 15
Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-10-31 16:44                         ` Nix
@ 2014-11-05 11:56                           ` Johan Hovold
  2014-11-05 15:14                             ` Nix
  0 siblings, 1 reply; 35+ messages in thread
From: Johan Hovold @ 2014-11-05 11:56 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
> Sorry for the delay: illness and work-related release time flurries.
> 
> On 24 Oct 2014, Johan Hovold told this:
> 
> > [ +CC: linux-usb ]
> >
> > On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
> >> On 22 Oct 2014, Johan Hovold outgrape:
> >> 
> >> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
> >> >> On 14 Oct 2014, Johan Hovold verbalised:
> >> >> 
> >> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
> >> >> >> I have checked: this code is being executed against a symlink that
> >> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
> >> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
> >> >> >> with this commit reverted...)
> >> >> >
> >> >> > You could verify that by enabling debugging in the cdc-acm driver and
> >> >> > making sure that the corresponding control messages are indeed sent on
> >> >> > close.
> >> >> 
> >> >> I have a debugging dump at
> >> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
> >> >
> >> > What kernel were you using here? The log seems to suggest that it was
> >> > generated with the commit in question reverted.
> >> 
> >> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
> >
> > Unfortunately, it seems the logs are incomplete. There are lots of
> > entries missing (e.g. "acm_tty_install" when opening, but also some
> > "acm_submit_read_urb"), some of which were there in your first log.
> 
> OK. What we have now in
> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
> pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
> acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
> the dmesg buffer size so the top isn't being cut off any more. It took a
> lot of boots for it to fail this time: about a dozen. The log contains
> the boot that failed and the one before.
> 
> (To my uneducated eye, the initial traffic to/from the key looks *very*
> different in the second boot. Something is clearly wrong by this point,
> but that's not much of a surprise!)

The log appears incomplete again, this time it seems the last part is
completely missing (the device is never closed for example). The device
still seems to be responding.

> >> This contains two boots -- one on which the USB key worked, and the next
> >> (with an identical kernel) with which the key was broken. (I'm not sure
> >> whether this problem happens at startup or shutdown time, so it seemed
> >> best to provide both.)
> >
> > That's a good idea.
> >
> > Is it only after reboot you've seen the device fail?
> 
> So far.
> 
> >                                                      What if you
> > physically disconnect and reconnect the device instead, or simply
> 
> That fixes it, in fact it's the only way to fix it once it's broken by
> this bug.

I didn't mean whether it would get the device working again, but rather
whether you could kill it this way.

> > repeatedly open and close it?
> 
> Hm. Good idea.

Same here, but the below test was also informative.
 
> ... no, that doesn't help. Additional log from that shows a lot of what
> looks like error returns:
> 
> Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: acm_tty_close
> Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110

Yeah, it seems your device firmware has crashed. It stops responding to
control requests.

> Oct 31 16:38:08 fold kern debug: : [  173.130557] cdc_acm 2-1:1.0: acm_port_shutdown
> Oct 31 16:38:08 fold kern debug: : [  173.131475] cdc_acm 2-1:1.0: acm_ctrl_irq - urb shutting down with status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.132474] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.132519] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.133473] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.133510] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.134471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.134507] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.135471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.135509] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.136472] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.136507] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.137471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.137517] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.138471] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.138520] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.139469] cdc_acm 2-1:1.1: acm_write_bulk - len 0/1, status -2
> Oct 31 16:38:08 fold kern debug: : [  173.139515] cdc_acm 2-1:1.1: acm_softint
> Oct 31 16:38:08 fold kern debug: : [  173.140470] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 0, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.140491] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.141469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 1, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.141489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.142469] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 2, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.142490] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.143468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 3, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.143488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.144468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 4, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.144488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.145467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 5, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.145488] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.146467] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 6, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.146487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.147477] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 7, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.147498] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.153496] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 8, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.153524] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.154468] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 9, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.154489] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.155466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 10, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.155487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.156466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 11, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.156487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.157466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 12, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.157487] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.158466] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 13, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.158486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.159465] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 14, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.159486] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.160475] cdc_acm 2-1:1.1: acm_read_bulk_callback - urb 15, len 0
> Oct 31 16:38:08 fold kern debug: : [  173.160496] cdc_acm 2-1:1.1: acm_read_bulk_callback - non-zero urb status: -2
> Oct 31 16:38:08 fold kern debug: : [  173.160567] tty ttyACM0: acm_tty_open
> Oct 31 16:38:08 fold kern debug: : [  173.160588] cdc_acm 2-1:1.0: acm_port_activate
> Oct 31 16:38:08 fold kern debug: : [  173.160626] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 0
> Oct 31 16:38:08 fold kern debug: : [  173.160651] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 1
> Oct 31 16:38:08 fold kern debug: : [  173.160674] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 2
> Oct 31 16:38:08 fold kern debug: : [  173.160796] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 3
> Oct 31 16:38:08 fold kern debug: : [  173.160820] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 4
> Oct 31 16:38:08 fold kern debug: : [  173.160842] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 5
> Oct 31 16:38:08 fold kern debug: : [  173.160863] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 6
> Oct 31 16:38:08 fold kern debug: : [  173.160884] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 7
> Oct 31 16:38:08 fold kern debug: : [  173.160906] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 8
> Oct 31 16:38:08 fold kern debug: : [  173.160927] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 9
> Oct 31 16:38:08 fold kern debug: : [  173.160948] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 10
> Oct 31 16:38:08 fold kern debug: : [  173.160969] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 11
> Oct 31 16:38:08 fold kern debug: : [  173.160991] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 12
> Oct 31 16:38:08 fold kern debug: : [  173.161012] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 13
> Oct 31 16:38:08 fold kern debug: : [  173.161033] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 14
> Oct 31 16:38:08 fold kern debug: : [  173.161054] cdc_acm 2-1:1.1: acm_submit_read_urb - urb 15

The above is all normal, but

> Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62

here's another timeout. It's dead.

Did you get anywhere with trying to look at the device firmware?

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-05 11:56                           ` Johan Hovold
@ 2014-11-05 15:14                             ` Nix
  2014-11-05 15:46                               ` Daniel Silverstone
  2014-11-05 18:14                               ` Johan Hovold
  0 siblings, 2 replies; 35+ messages in thread
From: Nix @ 2014-11-05 15:14 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On 5 Nov 2014, Johan Hovold stated:

> On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
>> Sorry for the delay: illness and work-related release time flurries.
>> 
>> On 24 Oct 2014, Johan Hovold told this:
>> 
>> > [ +CC: linux-usb ]
>> >
>> > On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
>> >> On 22 Oct 2014, Johan Hovold outgrape:
>> >> 
>> >> > On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
>> >> >> On 14 Oct 2014, Johan Hovold verbalised:
>> >> >> 
>> >> >> > On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
>> >> >> >> I have checked: this code is being executed against a symlink that
>> >> >> >> points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
>> >> >> >> succeeding on the kernel I'm running now, but of course that's 3.16.5
>> >> >> >> with this commit reverted...)
>> >> >> >
>> >> >> > You could verify that by enabling debugging in the cdc-acm driver and
>> >> >> > making sure that the corresponding control messages are indeed sent on
>> >> >> > close.
>> >> >> 
>> >> >> I have a debugging dump at
>> >> >> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log>; it's fairly
>> >> >
>> >> > What kernel were you using here? The log seems to suggest that it was
>> >> > generated with the commit in question reverted.
>> >> 
>> >> Look now :) (the previous log is still there in cdc-acm-reverted.log.)
>> >
>> > Unfortunately, it seems the logs are incomplete. There are lots of
>> > entries missing (e.g. "acm_tty_install" when opening, but also some
>> > "acm_submit_read_urb"), some of which were there in your first log.
>> 
>> OK. What we have now in
>> <http://www.esperi.org.uk/~nix/temporary/cdc-acm.log> is a log from the
>> pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
>> acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
>> the dmesg buffer size so the top isn't being cut off any more. It took a
>> lot of boots for it to fail this time: about a dozen. The log contains
>> the boot that failed and the one before.
>> 
>> (To my uneducated eye, the initial traffic to/from the key looks *very*
>> different in the second boot. Something is clearly wrong by this point,
>> but that's not much of a surprise!)
>
> The log appears incomplete again, this time it seems the last part is
> completely missing (the device is never closed for example). The device
> still seems to be responding.

Nope -- by the time I clipped the log, the device was definitely
nonresponsive.

I've appended the remaining log, just in case. This is the same as the
snapshot I added to the email itself last time: a close-and-open as I
tried restarting the daemon, and another close as part of system
shutdown.

>> >                                                      What if you
>> > physically disconnect and reconnect the device instead, or simply
>> 
>> That fixes it, in fact it's the only way to fix it once it's broken by
>> this bug.
>
> I didn't mean whether it would get the device working again, but rather
> whether you could kill it this way.

Never seen it fail after a physical disconnection.

>> ... no, that doesn't help. Additional log from that shows a lot of what
>> looks like error returns:
>> 
>> Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: acm_tty_close
>> Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
>
> Yeah, it seems your device firmware has crashed. It stops responding to
> control requests.

Not surprising: I was fairly sure we were provoking a key firmware crash
or something like that. This is a device with no support for flow
control at all, after all, so I'm not terribly surprised that trying to
do flow control confuses it.

> The above is all normal, but
>
>> Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62
>
> here's another timeout. It's dead.

Again, not surprising.

> Did you get anywhere with trying to look at the device firmware?

Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
of the firmware I have is baked into the sealed key. :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-05 15:14                             ` Nix
@ 2014-11-05 15:46                               ` Daniel Silverstone
  2014-11-05 18:14                               ` Johan Hovold
  1 sibling, 0 replies; 35+ messages in thread
From: Daniel Silverstone @ 2014-11-05 15:46 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On Wed, Nov 05, 2014 at 15:14:49 +0000, Nix wrote:
> > Did you get anywhere with trying to look at the device firmware?
> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
> of the firmware I have is baked into the sealed key. :)

Sadly I can't look at the exact firmware of the device, but I can tell you it
was based on ST Electronics' virtual COM port example in their *old* STM32
development kits.

Fortunately I do have a record of that example code here:
https://git.gitano.org.uk/stm32-virtcomport.git/

I'm not a USB guru though.

D.

-- 
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-05 15:14                             ` Nix
  2014-11-05 15:46                               ` Daniel Silverstone
@ 2014-11-05 18:14                               ` Johan Hovold
  2014-11-06 13:49                                 ` Nix
  1 sibling, 1 reply; 35+ messages in thread
From: Johan Hovold @ 2014-11-05 18:14 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
> On 5 Nov 2014, Johan Hovold stated:
> 
> > On Fri, Oct 31, 2014 at 04:44:46PM +0000, Nix wrote:
> >> Sorry for the delay: illness and work-related release time flurries.
> >> 
> >> On 24 Oct 2014, Johan Hovold told this:
> >> 

> > The log appears incomplete again, this time it seems the last part is
> > completely missing (the device is never closed for example). The device
> > still seems to be responding.
> 
> Nope -- by the time I clipped the log, the device was definitely
> nonresponsive.
>
> I've appended the remaining log, just in case. This is the same as the
> snapshot I added to the email itself last time: a close-and-open as I
> tried restarting the daemon, and another close as part of system
> shutdown.

Ok, yeah, there's the crash.

> >> >                                                      What if you
> >> > physically disconnect and reconnect the device instead, or simply
> >> 
> >> That fixes it, in fact it's the only way to fix it once it's broken by
> >> this bug.
> >
> > I didn't mean whether it would get the device working again, but rather
> > whether you could kill it this way.
> 
> Never seen it fail after a physical disconnection.

Ok.

And it has to include an enumeration, since just opening and closing it
(restarting the daemon) repeatedly seems to work?

> >> ... no, that doesn't help. Additional log from that shows a lot of what
> >> looks like error returns:
> >> 
> >> Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: acm_tty_close
> >> Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
> >
> > Yeah, it seems your device firmware has crashed. It stops responding to
> > control requests.
> 
> Not surprising: I was fairly sure we were provoking a key firmware crash
> or something like that. This is a device with no support for flow
> control at all, after all, so I'm not terribly surprised that trying to
> do flow control confuses it.

Only weird thing is that it has been coping with those calls for a long
time. Only the slightly changed timings seems to have triggered this
issue.

> > The above is all normal, but
> >
> >> Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62
> >
> > here's another timeout. It's dead.
> 
> Again, not surprising.
> 
> > Did you get anywhere with trying to look at the device firmware?
> 
> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
> of the firmware I have is baked into the sealed key. :)

Ah, ok. I guess we need a new quirk then. There's already a quirk in the
driver to suppress error from when trying to set the control lines.

But that doesn't help this device, which happily accepts the requests
and then dies at random times.

Could you try two more things (too make sure line control is really the
culprit):

1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
that fix the stability issue?

2. Can you verify that the patch below works? Did I use the correct
VID/PID? Could you provide "lsusb -v" output (for the capability flags)
as well?

Note that the patch is against the usb-linus branch of the usb tree,
which also contains another fix which could affect this device
(set_termios will now be called an extra time on every open). You also
need this one, which have not yet been applied:

	http://marc.info/?l=linux-usb&m=141520959813505&w=2

Thanks,
Johan


>From 76abc8a7eda9ea1978ee3527c773210c28332317 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan@kernel.org>
Date: Wed, 5 Nov 2014 18:51:54 +0100
Subject: [PATCH] USB: cdc-acm: add quirk for control-line state requests

Add new quirk for devices that cannot handle control-line state
requests.

Note that we currently send these requests to all devices, regardless of
whether they claim to support it, but that errors are only logged if
support is claimed.

Since commit 0943d8ead30e ("USB: cdc-acm: use tty-port dtr_rts"), which
only changed the timings for these requests slightly, this has been
reported to cause occasional firmware crashes on Simtec Electronics
Entropy Key devices after re-enumeration. Enable the quirk for this
device.

Reported-by: Nix <nix@esperi.org.uk>
Cc: stable <stable@vger.kernel.org>	# v3.16
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/usb/class/cdc-acm.c | 14 ++++++++++++--
 drivers/usb/class/cdc-acm.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 9d6495424b06..077d58ac3dcb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -148,8 +148,15 @@ static int acm_ctrl_msg(struct acm *acm, int request, int value,
 /* devices aren't required to support these requests.
  * the cdc acm descriptor tells whether they do...
  */
-#define acm_set_control(acm, control) \
-	acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE, control, NULL, 0)
+static inline int acm_set_control(struct acm *acm, int control)
+{
+	if (acm->quirks & QUIRK_CONTROL_LINE_STATE)
+		return -EOPNOTSUPP;
+
+	return acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE,
+			control, NULL, 0);
+}
+
 #define acm_set_line(acm, line) \
 	acm_ctrl_msg(acm, USB_CDC_REQ_SET_LINE_CODING, 0, line, sizeof *(line))
 #define acm_send_break(acm, ms) \
@@ -1320,6 +1327,7 @@ made_compressed_probe:
 	tty_port_init(&acm->port);
 	acm->port.ops = &acm_port_ops;
 	init_usb_anchor(&acm->delayed);
+	acm->quirks = quirks;
 
 	buf = usb_alloc_coherent(usb_dev, ctrlsize, GFP_KERNEL, &acm->ctrl_dma);
 	if (!buf) {
@@ -1687,6 +1695,8 @@ static const struct usb_device_id acm_ids[] = {
 	{ USB_DEVICE(0x0572, 0x1328), /* Shiro / Aztech USB MODEM UM-3100 */
 	.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
 	},
+	{ USB_DEVICE(0x20df, 0x0001), /* Simtec Electronics Entropy Key */
+	.driver_info = QUIRK_CONTROL_LINE_STATE, },
 	{ USB_DEVICE(0x2184, 0x001c) },	/* GW Instek AFG-2225 */
 	{ USB_DEVICE(0x22b8, 0x6425), /* Motorola MOTOMAGX phones */
 	},
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index fc75651afe1c..d3251ebd09e2 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -121,6 +121,7 @@ struct acm {
 	unsigned int throttle_req:1;			/* throttle requested */
 	u8 bInterval;
 	struct usb_anchor delayed;			/* writes queued for a device about to be woken */
+	unsigned long quirks;
 };
 
 #define CDC_DATA_INTERFACE_TYPE	0x0a
@@ -132,3 +133,4 @@ struct acm {
 #define NOT_A_MODEM			BIT(3)
 #define NO_DATA_INTERFACE		BIT(4)
 #define IGNORE_DEVICE			BIT(5)
+#define QUIRK_CONTROL_LINE_STATE	BIT(6)
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-05 18:14                               ` Johan Hovold
@ 2014-11-06 13:49                                 ` Nix
  2014-11-06 17:04                                   ` Johan Hovold
  0 siblings, 1 reply; 35+ messages in thread
From: Nix @ 2014-11-06 13:49 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On 5 Nov 2014, Johan Hovold told this:

> On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
>> >> >                                                      What if you
>> >> > physically disconnect and reconnect the device instead, or simply
>> >> 
>> >> That fixes it, in fact it's the only way to fix it once it's broken by
>> >> this bug.
>> >
>> > I didn't mean whether it would get the device working again, but rather
>> > whether you could kill it this way.
>> 
>> Never seen it fail after a physical disconnection.
>
> Ok.
>
> And it has to include an enumeration, since just opening and closing it
> (restarting the daemon) repeatedly seems to work?

Well, it's more accurate to say that restarting the daemon doesn't make
it fail, but won't make it start working if it's already gone nuts
either. It normally copes fine with the daemon stopping and starting,
yes (and the daemon copes fine with keys being connected and
disconnected).

>> > Yeah, it seems your device firmware has crashed. It stops responding to
>> > control requests.
>> 
>> Not surprising: I was fairly sure we were provoking a key firmware crash
>> or something like that. This is a device with no support for flow
>> control at all, after all, so I'm not terribly surprised that trying to
>> do flow control confuses it.
>
> Only weird thing is that it has been coping with those calls for a long
> time. Only the slightly changed timings seems to have triggered this
> issue.

Yeah. I get the strong impression from Daniel that the USB side of this
was done by taking something that worked on the kernel of the day,
adding the key-specific stuff to it, and making sure that it still
worked. i.e. this was not a from-spec reimplementation, it was using an
existing (old) stack. If that stack makes iffy assumptions, so will the
entropy key.

>> Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
>> of the firmware I have is baked into the sealed key. :)

As his email noted, no he can't :) but you do now have a link to the
thing it was based on.

> Ah, ok. I guess we need a new quirk then. There's already a quirk in the
> driver to suppress error from when trying to set the control lines.
>
> But that doesn't help this device, which happily accepts the requests
> and then dies at random times.

Yeah. Strange. (I thought it was it's 'right after', but you seem to
have disproved that. It's only 'sometime later'.)

> Could you try two more things (too make sure line control is really the
> culprit):
>
> 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
> that fix the stability issue?

Definitely not. I got a hang after the first reboot out of an afflicted
kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
reboot anyway?)

> 2. Can you verify that the patch below works? Did I use the correct
> VID/PID? Could you provide "lsusb -v" output (for the capability flags)
> as well?

The VID/PID are right.

The patch seems to work. I tested against the usb testing tree directly,
since that was easier than adjusting it to apply against 3.16. Sixteen
reboots, no failures, looks to be fixed.

lsusb output:

Bus 002 Device 003: ID 20df:0001 Simtec Electronics Entropy Key [UDEKEY01]
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            2 Communications
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x20df Simtec Electronics
  idProduct          0x0001 Entropy Key [UDEKEY01]
  bcdDevice            2.00
  iManufacturer           1 Simtec Electronics
  iProduct                2 Entropy Key
  iSerial                 3 M/9tBjBLNzE2RSFD
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           67
    bNumInterfaces          2
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xc0
      Self Powered
    MaxPower               76mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         2 Communications
      bInterfaceSubClass      2 Abstract (modem)
      bInterfaceProtocol      1 AT-commands (v.25ter)
      iInterface              0
      CDC Header:
        bcdCDC               1.10
      CDC Call Management:
        bmCapabilities       0x00
        bDataInterface          1
      CDC ACM:
        bmCapabilities       0x00
      CDC Union:
        bMasterInterface        0
        bSlaveInterface         1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval             255
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass        10 CDC Data
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
Device Status:     0x0000
  (Bus Powered)

> Note that the patch is against the usb-linus branch of the usb tree,

OK. (I presume this is gregkh's tree, not yours?)

> which also contains another fix which could affect this device
> (set_termios will now be called an extra time on every open). You also
> need this one, which have not yet been applied:
>
> 	http://marc.info/?l=linux-usb&m=141520959813505&w=2

It had been applied by the time I tested this :)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-06 13:49                                 ` Nix
@ 2014-11-06 17:04                                   ` Johan Hovold
  2014-11-06 17:08                                     ` [PATCH] USB: cdc-acm: add quirk for control-line state requests Johan Hovold
  2014-11-06 17:14                                     ` [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16 Nix
  0 siblings, 2 replies; 35+ messages in thread
From: Johan Hovold @ 2014-11-06 17:04 UTC (permalink / raw)
  To: Nix
  Cc: Johan Hovold, Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On Thu, Nov 06, 2014 at 01:49:13PM +0000, Nix wrote:
> On 5 Nov 2014, Johan Hovold told this:
> > On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:

> > Could you try two more things (too make sure line control is really the
> > culprit):
> >
> > 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
> > that fix the stability issue?
> 
> Definitely not. I got a hang after the first reboot out of an afflicted
> kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
> reboot anyway?)

It's actually only the timings related to the control-lines being raised
on open that has changed, so this would seem consistent with that.

> > 2. Can you verify that the patch below works? Did I use the correct
> > VID/PID? Could you provide "lsusb -v" output (for the capability flags)
> > as well?
> 
> The VID/PID are right.
> 
> The patch seems to work. I tested against the usb testing tree directly,
> since that was easier than adjusting it to apply against 3.16. Sixteen
> reboots, no failures, looks to be fixed.

Great, thanks for testing.

> > Note that the patch is against the usb-linus branch of the usb tree,
> 
> OK. (I presume this is gregkh's tree, not yours?)

Yes, Greg's tree, could have been more clear about that.

> > which also contains another fix which could affect this device
> > (set_termios will now be called an extra time on every open). You also
> > need this one, which have not yet been applied:
> >
> > 	http://marc.info/?l=linux-usb&m=141520959813505&w=2
> 
> It had been applied by the time I tested this :)

Greg is fast. :)

Thanks for tracking this down. That bisect cannot have been fun given
the low failure rate (sometimes only one in ten reboots?).

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] USB: cdc-acm: add quirk for control-line state requests
  2014-11-06 17:04                                   ` Johan Hovold
@ 2014-11-06 17:08                                     ` Johan Hovold
  2014-11-07  9:05                                       ` Oliver Neukum
  2014-11-06 17:14                                     ` [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16 Nix
  1 sibling, 1 reply; 35+ messages in thread
From: Johan Hovold @ 2014-11-06 17:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-usb, Nix, Paul Martin, Daniel Silverstone, Oliver Neukum,
	linux-kernel, Johan Hovold, stable

Add new quirk for devices that cannot handle control-line state
requests.

Note that we currently send these requests to all devices, regardless of
whether they claim to support it, but that errors are only logged if
support is claimed.

Since commit 0943d8ead30e ("USB: cdc-acm: use tty-port dtr_rts"), which
only changed the timings for these requests slightly, this has been
reported to cause occasional firmware crashes on Simtec Electronics
Entropy Key devices after re-enumeration. Enable the quirk for this
device.

Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Cc: stable <stable@vger.kernel.org>	# v3.16
Signed-off-by: Johan Hovold <johan@kernel.org>
---

Greg, 

I believe this should into v3.18 as it fixes a reported regression with
these devices.

Johan


 drivers/usb/class/cdc-acm.c | 14 ++++++++++++--
 drivers/usb/class/cdc-acm.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 9d6495424b06..077d58ac3dcb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -148,8 +148,15 @@ static int acm_ctrl_msg(struct acm *acm, int request, int value,
 /* devices aren't required to support these requests.
  * the cdc acm descriptor tells whether they do...
  */
-#define acm_set_control(acm, control) \
-	acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE, control, NULL, 0)
+static inline int acm_set_control(struct acm *acm, int control)
+{
+	if (acm->quirks & QUIRK_CONTROL_LINE_STATE)
+		return -EOPNOTSUPP;
+
+	return acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE,
+			control, NULL, 0);
+}
+
 #define acm_set_line(acm, line) \
 	acm_ctrl_msg(acm, USB_CDC_REQ_SET_LINE_CODING, 0, line, sizeof *(line))
 #define acm_send_break(acm, ms) \
@@ -1320,6 +1327,7 @@ made_compressed_probe:
 	tty_port_init(&acm->port);
 	acm->port.ops = &acm_port_ops;
 	init_usb_anchor(&acm->delayed);
+	acm->quirks = quirks;
 
 	buf = usb_alloc_coherent(usb_dev, ctrlsize, GFP_KERNEL, &acm->ctrl_dma);
 	if (!buf) {
@@ -1687,6 +1695,8 @@ static const struct usb_device_id acm_ids[] = {
 	{ USB_DEVICE(0x0572, 0x1328), /* Shiro / Aztech USB MODEM UM-3100 */
 	.driver_info = NO_UNION_NORMAL, /* has no union descriptor */
 	},
+	{ USB_DEVICE(0x20df, 0x0001), /* Simtec Electronics Entropy Key */
+	.driver_info = QUIRK_CONTROL_LINE_STATE, },
 	{ USB_DEVICE(0x2184, 0x001c) },	/* GW Instek AFG-2225 */
 	{ USB_DEVICE(0x22b8, 0x6425), /* Motorola MOTOMAGX phones */
 	},
diff --git a/drivers/usb/class/cdc-acm.h b/drivers/usb/class/cdc-acm.h
index fc75651afe1c..d3251ebd09e2 100644
--- a/drivers/usb/class/cdc-acm.h
+++ b/drivers/usb/class/cdc-acm.h
@@ -121,6 +121,7 @@ struct acm {
 	unsigned int throttle_req:1;			/* throttle requested */
 	u8 bInterval;
 	struct usb_anchor delayed;			/* writes queued for a device about to be woken */
+	unsigned long quirks;
 };
 
 #define CDC_DATA_INTERFACE_TYPE	0x0a
@@ -132,3 +133,4 @@ struct acm {
 #define NOT_A_MODEM			BIT(3)
 #define NO_DATA_INTERFACE		BIT(4)
 #define IGNORE_DEVICE			BIT(5)
+#define QUIRK_CONTROL_LINE_STATE	BIT(6)
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16
  2014-11-06 17:04                                   ` Johan Hovold
  2014-11-06 17:08                                     ` [PATCH] USB: cdc-acm: add quirk for control-line state requests Johan Hovold
@ 2014-11-06 17:14                                     ` Nix
  1 sibling, 0 replies; 35+ messages in thread
From: Nix @ 2014-11-06 17:14 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Paul Martin, Daniel Silverstone, Oliver Neukum,
	Greg Kroah-Hartman, linux-kernel, linux-usb

On 6 Nov 2014, Johan Hovold said:

> On Thu, Nov 06, 2014 at 01:49:13PM +0000, Nix wrote:
>> On 5 Nov 2014, Johan Hovold told this:
>> > On Wed, Nov 05, 2014 at 03:14:49PM +0000, Nix wrote:
>
>> > Could you try two more things (too make sure line control is really the
>> > culprit):
>> >
>> > 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
>> > that fix the stability issue?
>> 
>> Definitely not. I got a hang after the first reboot out of an afflicted
>> kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
>> reboot anyway?)
>
> It's actually only the timings related to the control-lines being raised
> on open that has changed, so this would seem consistent with that.

Urgh. No wonder it was intermittent.

> Thanks for tracking this down. That bisect cannot have been fun given
> the low failure rate (sometimes only one in ten reboots?).

It often failed quite fast, but yes, the negative case was hard to
prove: I had to rewind twice. I counted reboots because I'm a flaming
aspie pedant. 173 reboots that took... thank goodness it replicated on
the machine of mine that's fastest to reboot!

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests
  2014-11-06 17:08                                     ` [PATCH] USB: cdc-acm: add quirk for control-line state requests Johan Hovold
@ 2014-11-07  9:05                                       ` Oliver Neukum
  2014-11-07  9:16                                         ` Johan Hovold
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Neukum @ 2014-11-07  9:05 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Greg Kroah-Hartman, linux-usb, Nix, Paul Martin,
	Daniel Silverstone, linux-kernel, stable

On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> Add new quirk for devices that cannot handle control-line state
> requests.
> 
> Note that we currently send these requests to all devices, regardless
> of
> whether they claim to support it, but that errors are only logged if
> support is claimed.

That makes me wonder whether we should do this. What do you think?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests
  2014-11-07  9:05                                       ` Oliver Neukum
@ 2014-11-07  9:16                                         ` Johan Hovold
  2014-11-07 10:23                                           ` Oliver Neukum
  0 siblings, 1 reply; 35+ messages in thread
From: Johan Hovold @ 2014-11-07  9:16 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Johan Hovold, Greg Kroah-Hartman, linux-usb, Nix, Paul Martin,
	Daniel Silverstone, linux-kernel, stable

On Fri, Nov 07, 2014 at 10:05:12AM +0100, Oliver Neukum wrote:
> On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> > Add new quirk for devices that cannot handle control-line state
> > requests.
> > 
> > Note that we currently send these requests to all devices, regardless
> > of
> > whether they claim to support it, but that errors are only logged if
> > support is claimed.
> 
> That makes me wonder whether we should do this. What do you think?

My interpretation was that it's done this way as there may be devices
with broken CDC headers which fail to set the corresponding capability
bits, but still support the request (c.f. our recent not-a-modem
discussion).

In that case, always attempting the request, but only reporting errors
if support was claimed, makes sense.

As changing this behaviour now would risk breaking such devices, I
think black-listing (i.e. this patch) is preferred moving forward.

Johan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] USB: cdc-acm: add quirk for control-line state requests
  2014-11-07  9:16                                         ` Johan Hovold
@ 2014-11-07 10:23                                           ` Oliver Neukum
  0 siblings, 0 replies; 35+ messages in thread
From: Oliver Neukum @ 2014-11-07 10:23 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Greg Kroah-Hartman, linux-usb, Nix, Paul Martin,
	Daniel Silverstone, linux-kernel, stable

On Fri, 2014-11-07 at 10:16 +0100, Johan Hovold wrote:
> On Fri, Nov 07, 2014 at 10:05:12AM +0100, Oliver Neukum wrote:
> > On Thu, 2014-11-06 at 18:08 +0100, Johan Hovold wrote:
> > > Add new quirk for devices that cannot handle control-line state
> > > requests.
> > > 
> > > Note that we currently send these requests to all devices, regardless
> > > of
> > > whether they claim to support it, but that errors are only logged if
> > > support is claimed.
> > 
> > That makes me wonder whether we should do this. What do you think?
> 
> My interpretation was that it's done this way as there may be devices
> with broken CDC headers which fail to set the corresponding capability
> bits, but still support the request (c.f. our recent not-a-modem
> discussion).

Oh well, yes I don't like it, but we can't risk the change.

> In that case, always attempting the request, but only reporting errors
> if support was claimed, makes sense.
> 
> As changing this behaviour now would risk breaking such devices, I
> think black-listing (i.e. this patch) is preferred moving forward.

Unfortunately, yes.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-11-07 10:23 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-31 23:07 [3.16.1 REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16 Nix
2014-09-01 11:09 ` Oliver Neukum
2014-09-04 23:40   ` Nix
2014-09-05  7:59     ` Oliver Neukum
2014-09-05 15:17       ` Nix
2014-09-08  7:21         ` Oliver Neukum
2014-09-08  7:58           ` Nix
2014-10-11 19:05     ` [3.16.1 BISECTED " Nix
2014-10-11 19:51       ` Paul Martin
2014-10-11 22:24         ` Nix
2014-10-12 11:14           ` Paul Martin
2014-10-12 18:58           ` Johan Hovold
2014-10-12 21:36             ` Nix
2014-10-14  8:34               ` Johan Hovold
2014-10-14 14:44                 ` Nix
2014-10-17 13:21                 ` Nix
2014-10-19 13:45                   ` Johan Hovold
2014-10-22  9:31                 ` Nix
2014-10-22 10:14                   ` Johan Hovold
2014-10-22 14:00                     ` Nix
2014-10-22 15:36                     ` Nix
2014-10-24 11:14                       ` Johan Hovold
2014-10-24 15:08                         ` Nix
2014-10-31 16:44                         ` Nix
2014-11-05 11:56                           ` Johan Hovold
2014-11-05 15:14                             ` Nix
2014-11-05 15:46                               ` Daniel Silverstone
2014-11-05 18:14                               ` Johan Hovold
2014-11-06 13:49                                 ` Nix
2014-11-06 17:04                                   ` Johan Hovold
2014-11-06 17:08                                     ` [PATCH] USB: cdc-acm: add quirk for control-line state requests Johan Hovold
2014-11-07  9:05                                       ` Oliver Neukum
2014-11-07  9:16                                         ` Johan Hovold
2014-11-07 10:23                                           ` Oliver Neukum
2014-11-06 17:14                                     ` [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16 Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.