All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Zapolskiy <vz@mleia.com>
To: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	linux-ide@vger.kernel.org
Subject: Re: [PATCH] ata: disable port while unloading ATA controller driver
Date: Tue, 29 Nov 2016 20:54:11 +0200	[thread overview]
Message-ID: <09c7866c-ecd4-f48a-5112-6cf3c6786cd9@mleia.com> (raw)
In-Reply-To: <a498fb9f-822e-45aa-aac1-c7afae7a44e3@mleia.com>

Hello Tejun,

On 11/29/2016 01:51 AM, Vladimir Zapolskiy wrote:
> Hello Tejun,
> 
> On 11/28/2016 08:34 PM, Tejun Heo wrote:
>> Hello, Vladimir.
>>
>> On Mon, Nov 28, 2016 at 01:18:56AM +0200, Vladimir Zapolskiy wrote:
>>> While removing ATA controller driver ata_port_detach() sets 
>>> ATA_PFLAG_UNLOADING flag and charges the error handler, however
>>> actual port disabling does not happen due to unset
>>> ATA_PFLAG_EH_PENDING flag.
>>>
>>> To take care about clean port removal and ATA_PFLAG_EH_PENDING
>>> flag setting it is sufficient to replace ata_port_schedule_eh()
>>> call with ata_port_freeze().
>>
>> Hmm... this explanation doesn't really make sense to me. 
>> ATA_PFLAG_EH_PENDING is set by at_eh_set_pending() which is the same 
>> for both ata_port_schedule_eh() and ata_port_freeze().
> 
> correct, ATA_PFLAG_EH_PENDING is set by ata_eh_set_pending(),
> you caused me doubt, and my analysis is crap...
> 
>> There gotta me something else going on here.  Any chance you can
>> track down why EH isn't running?
>>
> 
> I've tested the unmodified master branch with a different kernel config
> and on another but similar board (SabreSD) powered by the same iMX6Q
> SoC, and I can not reproduce this problem, but I still experience it
> on the SabreAuto board, I'll trace the kernel on it over JTAG tomorrow.
> 

tracing on the board shows a race between driver initialization and
deinitialization, when async_port_probe() is scheduled after driver
removal, this causes the reported problem.

Since it is a race, it should be possible to fuzz the kernel by
introducing a delay (e.g. in ata_port_probe()) to get enough time
to reproduce the problem reliably and to verify a fix.

imx_ahci_probe()
  ahci_platform_init_host()
    ata_host_alloc_pinfo()
      ata_host_alloc()
        ata_port_alloc()    ---> sets ATA_PFLAG_INITIALIZING flag
          ata_link_init()
          ....
    ahci_host_activate()
      ata_host_activate()
        ata_host_start()
          ata_eh_freeze_port()
        ata_port_desc()
        ata_host_register() ---> schedules async_port_probe()
  ....

*** at this point the driver probe is completed, thus it can be removed ***

ata_platform_remove_one()    ==  imx_ahci_driver.remove()
  ata_port_detach()
    ata_port_schedule_eh()
      ata_std_sched_eh()    ---> return, ATA_PFLAG_EH_PENDING flag is not set
    ata_port_wait_eh()      ---> return, port cleanup work is not done

*** warning is printed out ***

async_port_probe()          ---- scheduled too late
  ata_port_probe()
    __ata_port_probe()      ---> now ATA_PFLAG_INITIALIZING flag unset
      ata_port_schedule_eh()
        ata_std_sched_eh()


It also explains why ata_port_schedule_eh() inside ata_port_detach()
replaced by ata_port_abort() with unconditional ATA_PFLAG_EH_PENDING
flag setting does not produce the warning, but still I'm not sure
that resource and state clean-ups are done correctly under the race.

If you buy this analysis sketch, it may take another day or two for
me to prepare a proper fix, or, if you have enough time and desire,
you may implement the fix on your own.

--
With best wishes,
Vladimir


  reply	other threads:[~2016-11-29 18:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 23:18 [PATCH] ata: disable port while unloading ATA controller driver Vladimir Zapolskiy
2016-11-27 23:30 ` Vladimir Zapolskiy
2016-11-28 18:34 ` Tejun Heo
2016-11-28 23:51   ` Vladimir Zapolskiy
2016-11-29 18:54     ` Vladimir Zapolskiy [this message]
2016-11-29 19:00       ` Tejun Heo
2016-11-29 20:04         ` Vladimir Zapolskiy
2016-11-29 20:44           ` Tejun Heo
2016-11-29 22:15             ` Vladimir Zapolskiy
2016-11-29 22:29               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09c7866c-ecd4-f48a-5112-6cf3c6786cd9@mleia.com \
    --to=vz@mleia.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.