All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-22 18:07 ` Tony Lindgren
  0 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-22 18:07 UTC (permalink / raw)
  To: Peter Ujfalusi, Lee Jones; +Cc: linux-omap, linux-arm-kernel

Hi,

Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
support bulk access") breaks at least omap4-duovero. I now get
tons of errors:

Skipping twl internal clock init and using bootloader value (unknown osc rate)
twl 0-0048: PIH (irq 332) nested IRQs
of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c@48070000/t
wl@4b[0]' - status (0)
omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16

It seems the regmap irqs don't work with use_single_rw?

Also seems that twl6040 does support bulk access as things have been
working earlier?

Anyways, can you please revert?

Regards,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-22 18:07 ` Tony Lindgren
  0 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-22 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
support bulk access") breaks at least omap4-duovero. I now get
tons of errors:

Skipping twl internal clock init and using bootloader value (unknown osc rate)
twl 0-0048: PIH (irq 332) nested IRQs
of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c at 48070000/t
wl at 4b[0]' - status (0)
omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16
twl6040 0-004b: Failed to read IRQ status: -16

It seems the regmap irqs don't work with use_single_rw?

Also seems that twl6040 does support bulk access as things have been
working earlier?

Anyways, can you please revert?

Regards,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-22 18:07 ` Tony Lindgren
@ 2016-09-23  7:20   ` Peter Ujfalusi
  -1 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23  7:20 UTC (permalink / raw)
  To: Tony Lindgren, Lee Jones; +Cc: linux-omap, linux-arm-kernel

On 09/22/16 21:07, Tony Lindgren wrote:
> Hi,
> 
> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> support bulk access") breaks at least omap4-duovero.

That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
The IRQ status is one register so even in bulk access it is one read. So the
use_single_rw should have no effect on the access.

The only time when regmap would try to use bulk access to twl6040 is when we
execute regcache_sync() on it after resuming the chip and this would fail at
the first time when it tries to restore more than one consecutive registers.

I just tested things on next-20160915 and I see no errors at all.

> I now get tons of errors:
> 
> Skipping twl internal clock init and using bootloader value (unknown osc rate)
> twl 0-0048: PIH (irq 332) nested IRQs
> of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c@48070000/t
> wl@4b[0]' - status (0)
> omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> 
> It seems the regmap irqs don't work with use_single_rw?
> 
> Also seems that twl6040 does support bulk access as things have been
> working earlier?

Things kind of worked as we had single register reads and writes, but when I
tried to power down/up the twl6040 the regcache_sync() would fail to restore
the registers at the first place when we had two or more registers to update
in bulk mode. In that case regmap gives an error, but I have not been checking
the return of the regcache_sync() so the chip comes up, but not all settings
were restored.

> Anyways, can you please revert?

Please don't.

On which linux-next version you are seeing this?

-- 
Péter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23  7:20   ` Peter Ujfalusi
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23  7:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/22/16 21:07, Tony Lindgren wrote:
> Hi,
> 
> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> support bulk access") breaks at least omap4-duovero.

That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
The IRQ status is one register so even in bulk access it is one read. So the
use_single_rw should have no effect on the access.

The only time when regmap would try to use bulk access to twl6040 is when we
execute regcache_sync() on it after resuming the chip and this would fail at
the first time when it tries to restore more than one consecutive registers.

I just tested things on next-20160915 and I see no errors at all.

> I now get tons of errors:
> 
> Skipping twl internal clock init and using bootloader value (unknown osc rate)
> twl 0-0048: PIH (irq 332) nested IRQs
> of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c at 48070000/t
> wl at 4b[0]' - status (0)
> omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> twl6040 0-004b: Failed to read IRQ status: -16
> 
> It seems the regmap irqs don't work with use_single_rw?
> 
> Also seems that twl6040 does support bulk access as things have been
> working earlier?

Things kind of worked as we had single register reads and writes, but when I
tried to power down/up the twl6040 the regcache_sync() would fail to restore
the registers at the first place when we had two or more registers to update
in bulk mode. In that case regmap gives an error, but I have not been checking
the return of the regcache_sync() so the chip comes up, but not all settings
were restored.

> Anyways, can you please revert?

Please don't.

On which linux-next version you are seeing this?

-- 
P?ter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23  7:20   ` Peter Ujfalusi
@ 2016-09-23 10:19     ` Peter Ujfalusi
  -1 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23 10:19 UTC (permalink / raw)
  To: Tony Lindgren, Lee Jones; +Cc: linux-omap, linux-arm-kernel

On 09/23/16 10:20, Peter Ujfalusi wrote:
> On 09/22/16 21:07, Tony Lindgren wrote:
>> Hi,
>>
>> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
>> support bulk access") breaks at least omap4-duovero.
> 
> That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> The IRQ status is one register so even in bulk access it is one read. So the
> use_single_rw should have no effect on the access.
> 
> The only time when regmap would try to use bulk access to twl6040 is when we
> execute regcache_sync() on it after resuming the chip and this would fail at
> the first time when it tries to restore more than one consecutive registers.
> 
> I just tested things on next-20160915 and I see no errors at all.

Retested on next-20160923 and I see no issue on Panda-ES or omap5-uevm.

My guess is that you receive an interrupt from twl6040 while the chip is
powered down, the regmap is set to cache only. But this has nothing to do with
the bulk or non bulk access... Looking at the commit logs and git blame I see
no recent change in the driver(s) for twl6040 which changed the way it loads
and/or powers up/down.

-- 
Péter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 10:19     ` Peter Ujfalusi
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/23/16 10:20, Peter Ujfalusi wrote:
> On 09/22/16 21:07, Tony Lindgren wrote:
>> Hi,
>>
>> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
>> support bulk access") breaks at least omap4-duovero.
> 
> That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> The IRQ status is one register so even in bulk access it is one read. So the
> use_single_rw should have no effect on the access.
> 
> The only time when regmap would try to use bulk access to twl6040 is when we
> execute regcache_sync() on it after resuming the chip and this would fail at
> the first time when it tries to restore more than one consecutive registers.
> 
> I just tested things on next-20160915 and I see no errors at all.

Retested on next-20160923 and I see no issue on Panda-ES or omap5-uevm.

My guess is that you receive an interrupt from twl6040 while the chip is
powered down, the regmap is set to cache only. But this has nothing to do with
the bulk or non bulk access... Looking at the commit logs and git blame I see
no recent change in the driver(s) for twl6040 which changed the way it loads
and/or powers up/down.

-- 
P?ter

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23  7:20   ` Peter Ujfalusi
@ 2016-09-23 14:26     ` Tony Lindgren
  -1 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-23 14:26 UTC (permalink / raw)
  To: Peter Ujfalusi; +Cc: linux-omap, Lee Jones, linux-arm-kernel, Santosh Shilimkar

* Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
> On 09/22/16 21:07, Tony Lindgren wrote:
> > Hi,
> > 
> > Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> > support bulk access") breaks at least omap4-duovero.
> 
> That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> The IRQ status is one register so even in bulk access it is one read. So the
> use_single_rw should have no effect on the access.

Oh it's just one register.

> The only time when regmap would try to use bulk access to twl6040 is when we
> execute regcache_sync() on it after resuming the chip and this would fail at
> the first time when it tries to restore more than one consecutive registers.
> 
> I just tested things on next-20160915 and I see no errors at all.

OK interesting.

> > I now get tons of errors:
> > 
> > Skipping twl internal clock init and using bootloader value (unknown osc rate)
> > twl 0-0048: PIH (irq 332) nested IRQs
> > of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c@48070000/t
> > wl@4b[0]' - status (0)
> > omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > 
> > It seems the regmap irqs don't work with use_single_rw?

So obviously that's a wrong conclusion then.

> > Also seems that twl6040 does support bulk access as things have been
> > working earlier?
> 
> Things kind of worked as we had single register reads and writes, but when I
> tried to power down/up the twl6040 the regcache_sync() would fail to restore
> the registers at the first place when we had two or more registers to update
> in bulk mode. In that case regmap gives an error, but I have not been checking
> the return of the regcache_sync() so the chip comes up, but not all settings
> were restored.
> 
> > Anyways, can you please revert?
> 
> Please don't.
> 
> On which linux-next version you are seeing this?

This was with next-20160922. But testing it again this is just another
regression caused by "softirq: fix tasklet_kill() and its users", so adding
Santosh to Cc.

So no need to revert $subject patch.

Regards,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 14:26     ` Tony Lindgren
  0 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-23 14:26 UTC (permalink / raw)
  To: linux-arm-kernel

* Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
> On 09/22/16 21:07, Tony Lindgren wrote:
> > Hi,
> > 
> > Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> > support bulk access") breaks at least omap4-duovero.
> 
> That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> The IRQ status is one register so even in bulk access it is one read. So the
> use_single_rw should have no effect on the access.

Oh it's just one register.

> The only time when regmap would try to use bulk access to twl6040 is when we
> execute regcache_sync() on it after resuming the chip and this would fail at
> the first time when it tries to restore more than one consecutive registers.
> 
> I just tested things on next-20160915 and I see no errors at all.

OK interesting.

> > I now get tons of errors:
> > 
> > Skipping twl internal clock init and using bootloader value (unknown osc rate)
> > twl 0-0048: PIH (irq 332) nested IRQs
> > of_get_named_gpiod_flags: parsed 'ti,audpwron-gpio' property of node '/ocp/i2c at 48070000/t
> > wl at 4b[0]' - status (0)
> > omap_i2c 48070000.i2c: bus 0 rev0.10 at 400 kHz
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > twl6040 0-004b: Failed to read IRQ status: -16
> > 
> > It seems the regmap irqs don't work with use_single_rw?

So obviously that's a wrong conclusion then.

> > Also seems that twl6040 does support bulk access as things have been
> > working earlier?
> 
> Things kind of worked as we had single register reads and writes, but when I
> tried to power down/up the twl6040 the regcache_sync() would fail to restore
> the registers at the first place when we had two or more registers to update
> in bulk mode. In that case regmap gives an error, but I have not been checking
> the return of the regcache_sync() so the chip comes up, but not all settings
> were restored.
> 
> > Anyways, can you please revert?
> 
> Please don't.
> 
> On which linux-next version you are seeing this?

This was with next-20160922. But testing it again this is just another
regression caused by "softirq: fix tasklet_kill() and its users", so adding
Santosh to Cc.

So no need to revert $subject patch.

Regards,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23 10:19     ` Peter Ujfalusi
@ 2016-09-23 14:27       ` Tony Lindgren
  -1 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-23 14:27 UTC (permalink / raw)
  To: Peter Ujfalusi; +Cc: linux-omap, Lee Jones, linux-arm-kernel

* Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 03:20]:
> On 09/23/16 10:20, Peter Ujfalusi wrote:
> > On 09/22/16 21:07, Tony Lindgren wrote:
> >> Hi,
> >>
> >> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> >> support bulk access") breaks at least omap4-duovero.
> > 
> > That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> > The IRQ status is one register so even in bulk access it is one read. So the
> > use_single_rw should have no effect on the access.
> > 
> > The only time when regmap would try to use bulk access to twl6040 is when we
> > execute regcache_sync() on it after resuming the chip and this would fail at
> > the first time when it tries to restore more than one consecutive registers.
> > 
> > I just tested things on next-20160915 and I see no errors at all.
> 
> Retested on next-20160923 and I see no issue on Panda-ES or omap5-uevm.
> 
> My guess is that you receive an interrupt from twl6040 while the chip is
> powered down, the regmap is set to cache only. But this has nothing to do with
> the bulk or non bulk access... Looking at the commit logs and git blame I see
> no recent change in the driver(s) for twl6040 which changed the way it loads
> and/or powers up/down.

OK yeah this is another one caused by "softirq: fix tasklet_kill() and
its users".

Thanks,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 14:27       ` Tony Lindgren
  0 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2016-09-23 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

* Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 03:20]:
> On 09/23/16 10:20, Peter Ujfalusi wrote:
> > On 09/22/16 21:07, Tony Lindgren wrote:
> >> Hi,
> >>
> >> Looks like commit 7a17e47f6403 ("mfd: twl6040: The chip does not
> >> support bulk access") breaks at least omap4-duovero.
> > 
> > That's odd. I see no such errors on omap4 PandaBoard-ES nor on omap5 uEVM.
> > The IRQ status is one register so even in bulk access it is one read. So the
> > use_single_rw should have no effect on the access.
> > 
> > The only time when regmap would try to use bulk access to twl6040 is when we
> > execute regcache_sync() on it after resuming the chip and this would fail at
> > the first time when it tries to restore more than one consecutive registers.
> > 
> > I just tested things on next-20160915 and I see no errors at all.
> 
> Retested on next-20160923 and I see no issue on Panda-ES or omap5-uevm.
> 
> My guess is that you receive an interrupt from twl6040 while the chip is
> powered down, the regmap is set to cache only. But this has nothing to do with
> the bulk or non bulk access... Looking at the commit logs and git blame I see
> no recent change in the driver(s) for twl6040 which changed the way it loads
> and/or powers up/down.

OK yeah this is another one caused by "softirq: fix tasklet_kill() and
its users".

Thanks,

Tony

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23 14:26     ` Tony Lindgren
@ 2016-09-23 15:24       ` Santosh Shilimkar
  -1 siblings, 0 replies; 16+ messages in thread
From: Santosh Shilimkar @ 2016-09-23 15:24 UTC (permalink / raw)
  To: Tony Lindgren, Peter Ujfalusi
  Cc: linux-omap, Lee Jones, linux-arm-kernel, Santosh Shilimkar

On 9/23/2016 7:26 AM, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>> On 09/22/16 21:07, Tony Lindgren wrote:

[...]

>> On which linux-next version you are seeing this?
>
> This was with next-20160922. But testing it again this is just another
> regression caused by "softirq: fix tasklet_kill() and its users", so adding
> Santosh to Cc.
>
> So no need to revert $subject patch.
>
So your driver also looks like fiddling with tasklet core structures if
you got impacted because of the core change. After the merge window
am going to take stab at bad users of tasklet but below is
quick snap shot conversation I had with Thomas and Andrew...

------------------------------------------------------------------
B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
 > I requested you to include this patch but now am not sure anymore.
 > Looks like there are almost 30 more users which are directly
 > tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
 > and fixing them probably would be an exercise and also those changes
 > needs those changed drivers to be tested.
 >
 > What do you suggest ? At least this patch needs to be dropped as of now
 > till we can have complete coverage for those bad users.

Yes, it needs to be dropped. Stephen, can you please revert it from next?

How to fix this: The only way is to review all tasklet usage sites for
creative abuse and then fix them one by one. This needs to be done anyway
because those are ticking timebombs even without changes in the core
code. I looked at one of the offenders and it's broken today, it's just
protected by the extremly low probablity to hit the wreckage case.

What you can do to coerce the developers/maintainers of offending code into
looking at the mess they created/merged is to implement accessors for the
tasklet struct fields and replace the open coded fiddling with them.

Once that is done, rename the struct fields to something which is absurd
enough to type.  But don't worry, you will find people doing that. I
catched a few brainwrecks who actually used:

  irqdesc->core_internal_state__do_not_mess_with_it

in their code.

Now after having everything converted to accessors, you can add sanity
checks into the accessors and emit WARN_ONCE() when they are used in the
wrong context. That'll make them look and explain why they think that
fiddling in the internals is a good idea.

Thanks,

	tglx
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 15:24       ` Santosh Shilimkar
  0 siblings, 0 replies; 16+ messages in thread
From: Santosh Shilimkar @ 2016-09-23 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 9/23/2016 7:26 AM, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>> On 09/22/16 21:07, Tony Lindgren wrote:

[...]

>> On which linux-next version you are seeing this?
>
> This was with next-20160922. But testing it again this is just another
> regression caused by "softirq: fix tasklet_kill() and its users", so adding
> Santosh to Cc.
>
> So no need to revert $subject patch.
>
So your driver also looks like fiddling with tasklet core structures if
you got impacted because of the core change. After the merge window
am going to take stab at bad users of tasklet but below is
quick snap shot conversation I had with Thomas and Andrew...

------------------------------------------------------------------
B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
 > I requested you to include this patch but now am not sure anymore.
 > Looks like there are almost 30 more users which are directly
 > tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
 > and fixing them probably would be an exercise and also those changes
 > needs those changed drivers to be tested.
 >
 > What do you suggest ? At least this patch needs to be dropped as of now
 > till we can have complete coverage for those bad users.

Yes, it needs to be dropped. Stephen, can you please revert it from next?

How to fix this: The only way is to review all tasklet usage sites for
creative abuse and then fix them one by one. This needs to be done anyway
because those are ticking timebombs even without changes in the core
code. I looked at one of the offenders and it's broken today, it's just
protected by the extremly low probablity to hit the wreckage case.

What you can do to coerce the developers/maintainers of offending code into
looking at the mess they created/merged is to implement accessors for the
tasklet struct fields and replace the open coded fiddling with them.

Once that is done, rename the struct fields to something which is absurd
enough to type.  But don't worry, you will find people doing that. I
catched a few brainwrecks who actually used:

  irqdesc->core_internal_state__do_not_mess_with_it

in their code.

Now after having everything converted to accessors, you can add sanity
checks into the accessors and emit WARN_ONCE() when they are used in the
wrong context. That'll make them look and explain why they think that
fiddling in the internals is a good idea.

Thanks,

	tglx
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23 15:24       ` Santosh Shilimkar
@ 2016-09-23 19:05         ` Peter Ujfalusi
  -1 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23 19:05 UTC (permalink / raw)
  To: Santosh Shilimkar, Tony Lindgren
  Cc: linux-omap, Lee Jones, linux-arm-kernel, Santosh Shilimkar

On 09/23/2016 06:24 PM, Santosh Shilimkar wrote:
> On 9/23/2016 7:26 AM, Tony Lindgren wrote:
>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>>> On 09/22/16 21:07, Tony Lindgren wrote:
> 
> [...]
> 
>>> On which linux-next version you are seeing this?
>>
>> This was with next-20160922. But testing it again this is just another
>> regression caused by "softirq: fix tasklet_kill() and its users", so adding
>> Santosh to Cc.
>>
>> So no need to revert $subject patch.
>>
> So your driver also looks like fiddling with tasklet core structures if
> you got impacted because of the core change.

How is that? None of the drivers in the twl6040 stack uses tasklets
explicitly. The core uses regmap_irq and the drivers use threaded irq when
they need interrupt handling, but no tasklet, no fiddling with internals.

-- 
Péter

> After the merge window
> am going to take stab at bad users of tasklet but below is
> quick snap shot conversation I had with Thomas and Andrew...
> 
> ------------------------------------------------------------------
> B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
>> I requested you to include this patch but now am not sure anymore.
>> Looks like there are almost 30 more users which are directly
>> tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
>> and fixing them probably would be an exercise and also those changes
>> needs those changed drivers to be tested.
>>
>> What do you suggest ? At least this patch needs to be dropped as of now
>> till we can have complete coverage for those bad users.
> 
> Yes, it needs to be dropped. Stephen, can you please revert it from next?
> 
> How to fix this: The only way is to review all tasklet usage sites for
> creative abuse and then fix them one by one. This needs to be done anyway
> because those are ticking timebombs even without changes in the core
> code. I looked at one of the offenders and it's broken today, it's just
> protected by the extremly low probablity to hit the wreckage case.
> 
> What you can do to coerce the developers/maintainers of offending code into
> looking at the mess they created/merged is to implement accessors for the
> tasklet struct fields and replace the open coded fiddling with them.
> 
> Once that is done, rename the struct fields to something which is absurd
> enough to type.  But don't worry, you will find people doing that. I
> catched a few brainwrecks who actually used:
> 
>  irqdesc->core_internal_state__do_not_mess_with_it
> 
> in their code.
> 
> Now after having everything converted to accessors, you can add sanity
> checks into the accessors and emit WARN_ONCE() when they are used in the
> wrong context. That'll make them look and explain why they think that
> fiddling in the internals is a good idea.
> 
> Thanks,
> 
>     tglx
> ------------------------------------------------------------------
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 19:05         ` Peter Ujfalusi
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Ujfalusi @ 2016-09-23 19:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/23/2016 06:24 PM, Santosh Shilimkar wrote:
> On 9/23/2016 7:26 AM, Tony Lindgren wrote:
>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>>> On 09/22/16 21:07, Tony Lindgren wrote:
> 
> [...]
> 
>>> On which linux-next version you are seeing this?
>>
>> This was with next-20160922. But testing it again this is just another
>> regression caused by "softirq: fix tasklet_kill() and its users", so adding
>> Santosh to Cc.
>>
>> So no need to revert $subject patch.
>>
> So your driver also looks like fiddling with tasklet core structures if
> you got impacted because of the core change.

How is that? None of the drivers in the twl6040 stack uses tasklets
explicitly. The core uses regmap_irq and the drivers use threaded irq when
they need interrupt handling, but no tasklet, no fiddling with internals.

-- 
P?ter

> After the merge window
> am going to take stab at bad users of tasklet but below is
> quick snap shot conversation I had with Thomas and Andrew...
> 
> ------------------------------------------------------------------
> B1;2802;0cOn Wed, 21 Sep 2016, Santosh Shilimkar wrote:
>> I requested you to include this patch but now am not sure anymore.
>> Looks like there are almost 30 more users which are directly
>> tweaking 'tasklet_struct' fields and calling other APIs. Hunting them
>> and fixing them probably would be an exercise and also those changes
>> needs those changed drivers to be tested.
>>
>> What do you suggest ? At least this patch needs to be dropped as of now
>> till we can have complete coverage for those bad users.
> 
> Yes, it needs to be dropped. Stephen, can you please revert it from next?
> 
> How to fix this: The only way is to review all tasklet usage sites for
> creative abuse and then fix them one by one. This needs to be done anyway
> because those are ticking timebombs even without changes in the core
> code. I looked at one of the offenders and it's broken today, it's just
> protected by the extremly low probablity to hit the wreckage case.
> 
> What you can do to coerce the developers/maintainers of offending code into
> looking at the mess they created/merged is to implement accessors for the
> tasklet struct fields and replace the open coded fiddling with them.
> 
> Once that is done, rename the struct fields to something which is absurd
> enough to type.  But don't worry, you will find people doing that. I
> catched a few brainwrecks who actually used:
> 
>  irqdesc->core_internal_state__do_not_mess_with_it
> 
> in their code.
> 
> Now after having everything converted to accessors, you can add sanity
> checks into the accessors and emit WARN_ONCE() when they are used in the
> wrong context. That'll make them look and explain why they think that
> fiddling in the internals is a good idea.
> 
> Thanks,
> 
>     tglx
> ------------------------------------------------------------------
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression in next with "mfd: twl6040: The chip does not support bulk access"
  2016-09-23 19:05         ` Peter Ujfalusi
@ 2016-09-23 20:46           ` Santosh Shilimkar
  -1 siblings, 0 replies; 16+ messages in thread
From: Santosh Shilimkar @ 2016-09-23 20:46 UTC (permalink / raw)
  To: Peter Ujfalusi, Tony Lindgren
  Cc: linux-omap, Lee Jones, linux-arm-kernel, Santosh Shilimkar


On 9/23/2016 12:05 PM, Peter Ujfalusi wrote:
> On 09/23/2016 06:24 PM, Santosh Shilimkar wrote:
>> On 9/23/2016 7:26 AM, Tony Lindgren wrote:
>>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>>>> On 09/22/16 21:07, Tony Lindgren wrote:
>>
>> [...]
>>
>>>> On which linux-next version you are seeing this?
>>>
>>> This was with next-20160922. But testing it again this is just another
>>> regression caused by "softirq: fix tasklet_kill() and its users", so adding
>>> Santosh to Cc.
>>>
>>> So no need to revert $subject patch.
>>>
>> So your driver also looks like fiddling with tasklet core structures if
>> you got impacted because of the core change.
>
> How is that? None of the drivers in the twl6040 stack uses tasklets
> explicitly. The core uses regmap_irq and the drivers use threaded irq when
> they need interrupt handling, but no tasklet, no fiddling with internals.
>
May be MFD core or some up dependency for this driver does that.
If not then you shouldn't be impacted because of the change.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Regression in next with "mfd: twl6040: The chip does not support bulk access"
@ 2016-09-23 20:46           ` Santosh Shilimkar
  0 siblings, 0 replies; 16+ messages in thread
From: Santosh Shilimkar @ 2016-09-23 20:46 UTC (permalink / raw)
  To: linux-arm-kernel


On 9/23/2016 12:05 PM, Peter Ujfalusi wrote:
> On 09/23/2016 06:24 PM, Santosh Shilimkar wrote:
>> On 9/23/2016 7:26 AM, Tony Lindgren wrote:
>>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [160923 00:21]:
>>>> On 09/22/16 21:07, Tony Lindgren wrote:
>>
>> [...]
>>
>>>> On which linux-next version you are seeing this?
>>>
>>> This was with next-20160922. But testing it again this is just another
>>> regression caused by "softirq: fix tasklet_kill() and its users", so adding
>>> Santosh to Cc.
>>>
>>> So no need to revert $subject patch.
>>>
>> So your driver also looks like fiddling with tasklet core structures if
>> you got impacted because of the core change.
>
> How is that? None of the drivers in the twl6040 stack uses tasklets
> explicitly. The core uses regmap_irq and the drivers use threaded irq when
> they need interrupt handling, but no tasklet, no fiddling with internals.
>
May be MFD core or some up dependency for this driver does that.
If not then you shouldn't be impacted because of the change.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-23 20:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 18:07 Regression in next with "mfd: twl6040: The chip does not support bulk access" Tony Lindgren
2016-09-22 18:07 ` Tony Lindgren
2016-09-23  7:20 ` Peter Ujfalusi
2016-09-23  7:20   ` Peter Ujfalusi
2016-09-23 10:19   ` Peter Ujfalusi
2016-09-23 10:19     ` Peter Ujfalusi
2016-09-23 14:27     ` Tony Lindgren
2016-09-23 14:27       ` Tony Lindgren
2016-09-23 14:26   ` Tony Lindgren
2016-09-23 14:26     ` Tony Lindgren
2016-09-23 15:24     ` Santosh Shilimkar
2016-09-23 15:24       ` Santosh Shilimkar
2016-09-23 19:05       ` Peter Ujfalusi
2016-09-23 19:05         ` Peter Ujfalusi
2016-09-23 20:46         ` Santosh Shilimkar
2016-09-23 20:46           ` Santosh Shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.