All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 3.14.17] inconsistent lock state
       [not found] <53F9CC77.70009@t-online.de>
@ 2014-08-24 17:50 ` Linus Torvalds
  2014-08-24 18:13   ` Arkadiusz Miskiewicz
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Linus Torvalds @ 2014-08-24 17:50 UTC (permalink / raw)
  To: Knut Petersen, Lan Tianyu, Rafael J. Wysocki, Thomas Renninger
  Cc: Linux Kernel Mailing List

On Sun, Aug 24, 2014 at 4:28 AM, Knut Petersen
<Knut_Petersen@t-online.de> wrote:
>
> Since months the postmaster wantonly blocks all mail traffic from the
> biggest german ISP t-online.de to all vger.kernel.org mailing lists,
> therefore I could not cc lkml.

Hmm.

> Please forward the following bug report to lkml and whomever it might of
> interest:

Added the guilty parties to the cc. The problem seems to be that first
/proc/acpi/event was removed in commit 1696d9dc57e0 ("ACPI: Remove the
old /proc/acpi/event interface") and then because that caused
problems, a horribly broken netlink interface was added instead in
commit 0bf6368ee8f2 ("ACPI / button: Add ACPI Button event via netlink
routine")

And that commit really seems to be horribly horribly broken.

It calls the netlink routines from interrupt context, which doesn't
work. Thus lockdep warns about "netlink_poll()" using bh-safe locking:

        spin_lock_bh(&sk->sk_receive_queue.lock);

but then __netlink_sendskb() is using that same queue lock from
interrupt context. Not some "subtly wrong" locking caught by lockdep,
but a major bug.

This seems to be going back to 3.14-rc7, which surprises me a bit.
It's been around for a while now, but I don't find a lot of reports.
And I don't see any subtle fixes for this anywhere, so it seems to be
still true today.

Rafael? Lan Tianyu? This is not some minor locking bug. This is a
*major* mistake unless I misread something.

                Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-24 17:50 ` [Bug 3.14.17] inconsistent lock state Linus Torvalds
@ 2014-08-24 18:13   ` Arkadiusz Miskiewicz
  2014-08-24 18:49   ` Linus Torvalds
  2014-08-25  2:53   ` Lan Tianyu
  2 siblings, 0 replies; 9+ messages in thread
From: Arkadiusz Miskiewicz @ 2014-08-24 18:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Knut Petersen, Lan Tianyu, Rafael J. Wysocki,
	Thomas Renninger

On Sunday 24 of August 2014, Linus Torvalds wrote:
> On Sun, Aug 24, 2014 at 4:28 AM, Knut Petersen
> 
> <Knut_Petersen@t-online.de> wrote:
> > Since months the postmaster wantonly blocks all mail traffic from the
> > biggest german ISP t-online.de to all vger.kernel.org mailing lists,
> > therefore I could not cc lkml.
> 
> Hmm.

What is worse postmasters attitude is like this:

"We are under no obligation to explain why you were banned nor to remove
the ban.

If you don't like this, you can run your own list server and on it determine
your own set of policies."

ps. that was citation from vger postmaster David Miller

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-24 17:50 ` [Bug 3.14.17] inconsistent lock state Linus Torvalds
  2014-08-24 18:13   ` Arkadiusz Miskiewicz
@ 2014-08-24 18:49   ` Linus Torvalds
  2014-08-24 19:04     ` David Miller
  2014-08-25  2:53   ` Lan Tianyu
  2 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-08-24 18:49 UTC (permalink / raw)
  To: Knut Petersen, Lan Tianyu, Rafael J. Wysocki, Thomas Renninger,
	David Miller
  Cc: Linux Kernel Mailing List

On Sun, Aug 24, 2014 at 10:50 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Aug 24, 2014 at 4:28 AM, Knut Petersen
> <Knut_Petersen@t-online.de> wrote:
>>
>> Since months the postmaster wantonly blocks all mail traffic from the
>> biggest german ISP t-online.de to all vger.kernel.org mailing lists,
>> therefore I could not cc lkml.
>
> Hmm.

Looks like t-online used to have serious spam problems, with lots of
bad users etc.

Judging by googling, that _seems_ to have been fixed, and maybe vger
should unblock it again. Davem?

            Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-24 18:49   ` Linus Torvalds
@ 2014-08-24 19:04     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2014-08-24 19:04 UTC (permalink / raw)
  To: torvalds; +Cc: Knut_Petersen, tianyu.lan, rafael.j.wysocki, trenn, linux-kernel

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 24 Aug 2014 11:49:31 -0700

> On Sun, Aug 24, 2014 at 10:50 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Sun, Aug 24, 2014 at 4:28 AM, Knut Petersen
>> <Knut_Petersen@t-online.de> wrote:
>>>
>>> Since months the postmaster wantonly blocks all mail traffic from the
>>> biggest german ISP t-online.de to all vger.kernel.org mailing lists,
>>> therefore I could not cc lkml.
>>
>> Hmm.
> 
> Looks like t-online used to have serious spam problems, with lots of
> bad users etc.
> 
> Judging by googling, that _seems_ to have been fixed, and maybe vger
> should unblock it again. Davem?

It's not that they are a source of SPAM, but rather their problem is
their content filter, about %30 of lkml postings get bounced back to
postmaster.

I really don't want t-online accounts subscribing to the mailing
lists, there are many alternative email services people could use
instead.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-24 17:50 ` [Bug 3.14.17] inconsistent lock state Linus Torvalds
  2014-08-24 18:13   ` Arkadiusz Miskiewicz
  2014-08-24 18:49   ` Linus Torvalds
@ 2014-08-25  2:53   ` Lan Tianyu
  2014-08-25  3:13     ` Linus Torvalds
  2 siblings, 1 reply; 9+ messages in thread
From: Lan Tianyu @ 2014-08-25  2:53 UTC (permalink / raw)
  To: Linus Torvalds, Knut Petersen, Rafael J. Wysocki, Thomas Renninger
  Cc: Linux Kernel Mailing List

On 2014年08月25日 01:50, Linus Torvalds wrote:
> Rafael? Lan Tianyu? This is not some minor locking bug. This is a
> *major* mistake unless I misread something.
> 

Hi Linus:

Sorry about this. We are resolving the issue in the other bug
report(https://lkml.org/lkml/2014/8/21/606) and I have proposed a fix
patch(http://marc.info/?l=linux-acpi&m=140869309231199&w=2).

It's my fault. ACPI button notify callback will be called in the
interrupt context when the button device is enumerated from ACPI FADT
table(So called fixed button device). The ACPI button device also can be
enumerated from ACPI namespace and its callback will be run in the
process context just like other ACPI devices' notify callbacks. These
two kind of butt devices uses the same callback. Originally, I assumed
all ACPI notify callbacks were run in the process context and didn't
check whether netlink routine can use in the interrupt context or not.
Sorry again.

>                 Linus
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-25  2:53   ` Lan Tianyu
@ 2014-08-25  3:13     ` Linus Torvalds
  2014-08-25  3:43       ` Lan Tianyu
       [not found]       ` <53FAE383.6050308@t-online.de>
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2014-08-25  3:13 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: Knut Petersen, Rafael J. Wysocki, Thomas Renninger,
	Linux Kernel Mailing List

On Sun, Aug 24, 2014 at 7:53 PM, Lan Tianyu <tianyu.lan@intel.com> wrote:
>
> Sorry about this. We are resolving the issue in the other bug
> report(https://lkml.org/lkml/2014/8/21/606) and I have proposed a fix
> patch(http://marc.info/?l=linux-acpi&m=140869309231199&w=2).

Ahh. Good. That patch looks fine to me, and while it makes me worry a
bit that some codepath expects the power/sleep button to be handled
immediately in interrupt context, I guess the actual callbacks have
never actually done anything but schedule other things to happen (ie
add events to some queue), and making the context be the same as the
other notify callbacks would seem to be a good thing regardless of
this particular bug.

Knut - can you please test the patch Lan pointed at? I realize it
doesn't seem to be entirely consistent for you (which is a bit
surprising, I wonder why lockdep doesn't trigger it consistently), but
it would be good to have more testing. Even if that patch looks
"obviously good" (tm) at a quick glance.

                  Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
  2014-08-25  3:13     ` Linus Torvalds
@ 2014-08-25  3:43       ` Lan Tianyu
       [not found]       ` <53FAE383.6050308@t-online.de>
  1 sibling, 0 replies; 9+ messages in thread
From: Lan Tianyu @ 2014-08-25  3:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Knut Petersen, Rafael J. Wysocki, Thomas Renninger,
	Linux Kernel Mailing List

On 2014年08月25日 11:13, Linus Torvalds wrote:
> On Sun, Aug 24, 2014 at 7:53 PM, Lan Tianyu <tianyu.lan@intel.com> wrote:
>>
>> Sorry about this. We are resolving the issue in the other bug
>> report(https://lkml.org/lkml/2014/8/21/606) and I have proposed a fix
>> patch(http://marc.info/?l=linux-acpi&m=140869309231199&w=2).
> 
> Ahh. Good. That patch looks fine to me, and while it makes me worry a
> bit that some codepath expects the power/sleep button to be handled
> immediately in interrupt context, I guess the actual callbacks have
> never actually done anything but schedule other things to happen (ie
> add events to some queue), and making the context be the same as the
> other notify callbacks would seem to be a good thing regardless of
> this particular bug.

Yes, I have the same opinion and the callback just reports power/sleep
button event to user space via input layer or ACPI netlink routines.

The button devices enumerated from ACPI namespace and FADT table share
the same notify callback and do the same things while they are running
different context. This seems not make sense.

> 
> Knut - can you please test the patch Lan pointed at? I realize it
> doesn't seem to be entirely consistent for you (which is a bit
> surprising, I wonder why lockdep doesn't trigger it consistently), but
> it would be good to have more testing. Even if that patch looks
> "obviously good" (tm) at a quick glance.

BTW, this bug only takes place on the machines with fixed button device.
This can be identified via check whether there are LNXPWRBN:00 or
LNXSLPBN:00 device nodes under /sys/bus/acpi/devices.

> 
>                   Linus
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 3.14.17] inconsistent lock state
       [not found]       ` <53FAE383.6050308@t-online.de>
@ 2014-08-25 16:36         ` Linus Torvalds
       [not found]           ` <53FBA94E.2080405@t-online.de>
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-08-25 16:36 UTC (permalink / raw)
  To: Knut Petersen
  Cc: Lan Tianyu, Rafael J. Wysocki, Thomas Renninger,
	Linux Kernel Mailing List

On Mon, Aug 25, 2014 at 12:19 AM, Knut Petersen
<Knut_Petersen@t-online.de> wrote:
>
> Testing some other kernels lurking around on the disk I realized that
> after kernel 3.11.5 and before kernel 3.12.9 both the power button
> and "shutdown -h now" lost the ability to power off the machine - the
> system is halted instead and needs a reset / 4 second power button pressing.

Hmm. Does "shutdown -p" work? But it might be interesting to see where
the behavior changed.

           Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] pci: power off broken by commit  4fc9bbf98 / stable 2ab0ff9b
       [not found]             ` <53FBBC26.2030501@oracle.com>
@ 2014-08-26  4:10               ` Bjorn Helgaas
  0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2014-08-26  4:10 UTC (permalink / raw)
  To: Khalid Aziz
  Cc: Knut Petersen, Linus Torvalds, cl91tp, koct9i, gregkh,
	Lan Tianyu, linux-kernel, linux-pci

[+cc linux-kernel, linux-pci]

On Mon, Aug 25, 2014 at 04:43:50PM -0600, Khalid Aziz wrote:
> On 08/25/2014 03:23 PM, Knut Petersen wrote:
> >On 25.08.2014 18:36, Linus Torvalds wrote:
> >>On Mon, Aug 25, 2014 at 12:19 AM, Knut Petersen
> >><Knut_Petersen@t-online.de> wrote:
> >>>Testing some other kernels lurking around on the disk I realized that
> >>>after kernel 3.11.5 and before kernel 3.12.9 both the power button
> >>>and "shutdown -h now" lost the ability to power off the machine - the
> >>>system is halted instead and needs a reset / 4 second power button
> >>>pressing.
> >>Hmm. Does "shutdown -p" work?
> >No. Suspending works as expected, but a normal power-off hangs, no
> >matter if
> >triggered by the power button or shutdown -h or -p.
> >>But it might be interesting to see where the behavior changed.
> >>
> >>            Linus
> >
> >Ok, I bisected and found the offending commit. Some people that authored
> >/ acked / were interested in
> >the commit are added to the cc. No cc to lkml and the pci list as
> >t-online.de is still banned from vger.
> >
> >After a regression report discussed in
> >https://bugzilla.kernel.org/show_bug.cgi?id=63861
> >a fix that was tested on several machines was introduced to the kernel.
> >Unfortunately
> >that fix (linux git 4fc9bbf98, linux stable git 2ab0ff9b) breaks
> >powering off on my
> >AOpen i915GMm-hfs / Pentium M Dothan machine reliably.
> >
> >Reverting is not really an option because it would break other machines,
> >e.g. the Acer Aspire V5-573G.
> 
> I would agree reverting is not a good option. There is a good number
> of machines that will not kexec a new kernel successfully or panic
> soon after successful kexec if ongoing DMAs are not stopped. That
> commit helps those machines without affecting the normal shutdown
> path. Your machine is the first one I have come across that requires
> bus mater bit to be cleared for a normal shutdown. A full reset
> going through BIOS reset should stop any ongoing DMA. This sounds
> more like a BIOS bug that can be worked around by clearing bus
> master bit on the offending device. Have you tried any kernels
> before 3.5.0? The first version of code to clear bus master bit went
> into 3.5.0 before it was refined to apply only to kexec path. My
> guess is power-off will hang with pre 3.5.0 kernels.
> 
> If we must clear bus master bit for kexec as well as normal
> shutdown, we need to do it in a better way than building
> blacklist/whitelist. A BIOS reset should never require bus master
> bit to be set or cleared, yet we have seen hangs doing it either
> way.

I'm not convinced we know what the real problem is.  I'm skeptical that
clearing Bus Master would be required for a simple power-off.

I repeated Khalid's analysis because I didn't read his email carefully
enough; sorry for the duplication.  According to Knut's bisection,

  - 4fc9bbf98fd6 ("PCI: Disable Bus Master only on kexec reboot ") hangs
    during power-off.  Here we don't touch Bus Master because we're not
    doing a kexec.

  - 4fc9bbf98fd6^ ("PCI: mvebu: Return 'unsupported' for Interrupt Line and
    Interrupt Pin") powers off reliably.  Here we clear Bus Master if the
    device is in D0.

Prior to v3.5 (when b566a22c2332 ("PCI: disable Bus Master on PCI device
shutdown") first appeared), we didn't touch Bus Master in
pci_device_shutdown().  So power-off should hang on v3.4 and older kernels
as well (as Khalid suggested).

But other AOpen i915GMm-HFS quirks were in the tree as early as v2.6.17, so
I would think a power-off hang would certainly have been reported sometime
between v2.6.17 (Jun 17, 2006) and v3.5 (Jul 21, 2012).

  - 22ab70d3262d ("drm/i915/lvds: Add AOpen i915GMm-HFS to the list of
    false-positive LVDS") appeard in v2.6.38.

  - 0b5bfa1cbefd ("ACPI: thermal: add DMI hooks to handle AOpen's broken
    Award BIOS") appeared in v2.6.23.

  - ede3531e8ce2 ("[ALSA] hda-codec - Fix Aopen i915GMm-HFS mobo") appeared
    in v2.6.17.

Maybe a driver bug was added some time after v3.4?  Some sort of bug that
makes power-off hang unless we clear Bus Master?  I know, I'm really
grasping at straws.

Knut, could you verify that power-off works on some v3.4 or older kernel,
and collect complete dmesg logs and "lspci -vv" output from 4fc9bbf98fd6
(where power-off hangs) and from that older kernel (if it exists)?

> >+ {
> >+ .callback = needs_busmaster_bit_switched_off_also_when_not_doing_kexec,
> >+ .ident = "AOpen motherboard i915GMm-HFS",
> >+ .matches = {
> >+ DMI_MATCH(DMI_BOARD_VENDOR, "AOpen"),
> >+ DMI_MATCH(DMI_BOARD_NAME, "i915GMm-HFS"),
> >+ },
> >+ },
> >
> >might be part of a solution if nobody has a better idea ... ok, probably
> >it would also be possible
> >to fix a driver for one of the devices listed below:
> >
> >00:00.0 Host bridge: Intel Corporation Mobile 915GM/PM/GMS/910GML
> >Express Processor to DRAM Controller (rev 04)
> >00:02.0 VGA compatible controller: Intel Corporation Mobile
> >915GM/GMS/910GML Express Graphics Controller (rev 04)
> >00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML
> >Express Graphics Controller (rev 04)
> >00:1b.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) High Definition Audio Controller (rev 04)
> >00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) PCI Express Port 1 (rev 04)
> >00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) PCI Express Port 2 (rev 04)
> >00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) PCI Express Port 3 (rev 04)
> >00:1c.3 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) PCI Express Port 4 (rev 04)
> >00:1d.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) USB UHCI #1 (rev 04)
> >00:1d.1 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) USB UHCI #2 (rev 04)
> >00:1d.2 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) USB UHCI #3 (rev 04)
> >00:1d.3 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) USB UHCI #4 (rev 04)
> >00:1d.7 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> >Family) USB2 EHCI Controller (rev 04)
> >00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev d4)
> >00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface
> >Bridge (rev 04)
> >00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA
> >Controller (rev 04)
> >00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> >SMBus Controller (rev 04)
> >02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> >Gigabit Ethernet Controller (rev 19)
> >03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> >Gigabit Ethernet Controller (rev 19)
> >04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA
> >Raid II Controller (rev 01)
> >05:04.0 Network controller: Cologne Chip Designs GmbH ISDN network
> >controller [HFC-PCI] (rev 02)
> >05:05.0 Multimedia video controller: Conexant Systems, Inc.
> >CX23880/1/2/3 PCI Video and Audio Decoder (rev 05)
> >05:05.1 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI
> >Video and Audio Decoder [Audio Port] (rev 05)
> >05:05.2 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI
> >Video and Audio Decoder [MPEG Port] (rev 05)
> >05:05.4 Multimedia controller: Conexant Systems, Inc. CX23880/1/2/3 PCI
> >Video and Audio Decoder [IR Port] (rev 05)
> >
> >cu,
> >  knut
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-26  4:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <53F9CC77.70009@t-online.de>
2014-08-24 17:50 ` [Bug 3.14.17] inconsistent lock state Linus Torvalds
2014-08-24 18:13   ` Arkadiusz Miskiewicz
2014-08-24 18:49   ` Linus Torvalds
2014-08-24 19:04     ` David Miller
2014-08-25  2:53   ` Lan Tianyu
2014-08-25  3:13     ` Linus Torvalds
2014-08-25  3:43       ` Lan Tianyu
     [not found]       ` <53FAE383.6050308@t-online.de>
2014-08-25 16:36         ` Linus Torvalds
     [not found]           ` <53FBA94E.2080405@t-online.de>
     [not found]             ` <53FBBC26.2030501@oracle.com>
2014-08-26  4:10               ` [REGRESSION] pci: power off broken by commit 4fc9bbf98 / stable 2ab0ff9b Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.