From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=i6x1=N7=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E191C04EBC
	for <linux-kernel@archiver.kernel.org>; Tue, 20 Nov 2018 13:09:33 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id B36A220685
	for <linux-kernel@archiver.kernel.org>; Tue, 20 Nov 2018 13:09:14 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=linaro.org header.i=@linaro.org header.b="Q/wPwdHS"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B36A220685
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727046AbeKTXiQ (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 20 Nov 2018 18:38:16 -0500
Received: from mail-it1-f194.google.com ([209.85.166.194]:40860 "EHLO
        mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725902AbeKTXiO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 20 Nov 2018 18:38:14 -0500
Received: by mail-it1-f194.google.com with SMTP id h193so3348363ita.5
        for <linux-kernel@vger.kernel.org>; Tue, 20 Nov 2018 05:09:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=EJ/2rictGiotCUNyiDLqvZDGdfnb08uSJa4HOkr68UY=;
        b=Q/wPwdHSIaPvEAV7pw8s0Xjlw8b/DTAK+ZJOV1LNj5jrd0qKY31vDwapcbP+zJM/r0
         C6niCbG83p4nDw15oR7r9jBguwQB89gvUNJL3Wpm4iRueO65HOc767oOyUTZQXLPF27g
         R8iMhLWmO56BPFMFrIGb9GcJZw52kQeGMHj+E=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=EJ/2rictGiotCUNyiDLqvZDGdfnb08uSJa4HOkr68UY=;
        b=VIuPlN5UxKJfLo93k/riSdwtO6JpOoGRpjKySxtKY3L8aajgBf2OXANCb68tguvP6p
         M11YC79Q6gho8PQZn5Ad9FP8B0/j7eY9qn318zvGcB2wRYix9VRc763ynTCM8DlpSDKN
         A7F2R1No/lPRERfAUhCVWS/Bj2OR4xkFa2knyrGn3kPxpqZcMJwoTujCPB86r+c9yhdA
         RTRH1MErRbcywsPN9yqRsduudsXB9IaKPYRrH27XJbeuWkoxbjF9ieTlkYbba3BLtKeG
         LwFKdScbThCT68BT4WbEH7yFsQX3KEb4lX620O5ONWn6b2CCvvCURj12DqPqSk6vr2/2
         6MGw==
X-Gm-Message-State: AGRZ1gJKnydGrDrPpgtgJaxHzhGPkyJhSJumFPM1w3NmtCOD/yqn6ewj
        d+2Bv7yim/nOL6Gy56ARUgNqHQRsB5NYgQGSNGd6Sw==
X-Google-Smtp-Source: AJdET5fLojuhg9IwtmIHZA8uATqzbCKLSkdetdmJGEIlKgQIVWnbhQl/NK0TPV7NcNHYRmRzKWESgXgnEkChyLlheAk=
X-Received: by 2002:a02:2b29:: with SMTP id h41-v6mr1611148jaa.12.1542719349816;
 Tue, 20 Nov 2018 05:09:09 -0800 (PST)
MIME-Version: 1.0
Received: by 2002:a02:70c8:0:0:0:0:0 with HTTP; Tue, 20 Nov 2018 05:08:28
 -0800 (PST)
In-Reply-To: <aa26f5dc9e595feb3772495a6d0542c007948f48.camel@collabora.co.uk>
References: <20181106133007.12318-1-sjoerd.simons@collabora.co.uk>
 <9051c212-6e2a-bc39-3686-693e6cd87f1d@ti.com> <303b49cbb5b687d6b6a7ad4048eda459586c0806.camel@collabora.co.uk>
 <20181107084741.GA31092@kunai> <CAPDyKFpedP1f4XZYvebFCuooYrBa2ux9F9mYRNi1Q=M-5eJ0Rg@mail.gmail.com>
 <20181120102300.GA1056@kunai> <aa26f5dc9e595feb3772495a6d0542c007948f48.camel@collabora.co.uk>
From:   Ulf Hansson <ulf.hansson@linaro.org>
Date:   Tue, 20 Nov 2018 14:08:28 +0100
Message-ID: <CAPDyKFrFz9bHa5EmiBmQvXnoRibbL4-Gq9E7E5vDBbd=uj_2kA@mail.gmail.com>
Subject: Re: [PATCH] mmc: core: Remove timeout when enabling cache
To:     Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Cc:     Wolfram Sang <wsa@the-dreams.de>, Faiz Abbas <faiz_abbas@ti.com>,
        "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
        kernel@collabora.com,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Hongjie Fang <hongjiefang@asrmicro.com>,
        Bastian Stender <bst@pengutronix.de>,
        Kyle Roeschley <kyle.roeschley@ni.com>,
        Wolfram Sang <wsa+renesas@sang-engineering.com>,
        Shawn Lin <shawn.lin@rock-chips.com>,
        Harish Jenny K N <harish_kandiga@mentor.com>,
        Simon Horman <horms+renesas@verge.net.au>,
        Hal Emmerich <hal@halemmerich.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

+ Hal Emmerich

On 20 November 2018 at 12:38, Sjoerd Simons
<sjoerd.simons@collabora.co.uk> wrote:
> On Tue, 2018-11-20 at 11:23 +0100, Wolfram Sang wrote:
>> > > > That also happens to be one of the cards we deploy; However i
>> > > > did
>> > > > wonder about adding a quirk but decided against it as it was
>> > > > not clear
>> > > > to me from the specification that CACHE ON really is meant to
>> > > > complete
>> > > > within GENERIC_CMD6_TIMEOUT. That and i fret about ending up in
>> > > > hit-a-
>> > > > mole games as the failure is really quite tedious (boot
>> > > > failure).
>> > >
>> > > I agree that we should use the more defensive variant as a
>> > > default. I
>> > > mean there should be no performance regression since most cards
>> > > will
>> > > respond just faster, or? The only downside I could see is that we
>> > > might
>> > > miss a real timeout with no bounds set and might get stuck?
>> >
>> > Well, you have a point, but still it's kind of nice to know which
>> > cards are behaving well and which ones that doesn't. Hence I think
>> > I
>> > prefer to stick using a quirk, unless you have a strong opinion.
>
> Not an incredibly strong opinion either; I just wonder if it's the
> right trade-off.
>
> If the quirk/work-around is not there while it should be, the impact is
> that you get an unusable card (which for eMMC is likely to mean a
> failure to boot the system). Which is somewhat unfortunate.
>
> If the work-around is there while it's not needed then there doesn't
> seem to be much of an impact at all; Apart from it not being reported
> to the user/developer/kernel community?
>
> In which case it might make more to put in a warning iff the card takes
> too long with a list of cards for which this is known?
>
>> No strong opinion. Especially not if you say it is in the spec
>> (although
>> "must be sufficient" would be better than "should be" ;)). Also, I
>> assume this failure is reproducible and should turn up during
>> development? Compared to "happens once in a while randomly"?
>
> For the card in question it happens only on hard power off; The time it
> takes seems correlated to the state of the cache at hard power off (It
> takes substantially longer if there was a lot of I/O activity at
>  the time of hard power off). With light I/O activity the current
> timeout is sometimes enough.
>
> So if you know the pattern, or just happen to hit it often in e.g.
> automated testing, it does show up during development. Otherwise it can
> appear to "happen once in a while randomly".

I don't quite follow. As far as I understand, the extended timeout is
needed when turning the cache on.

The above seems more related to flushing the cache, no? Flushing have
no timeout (also reported to be an issue [1]), which happens either at
_mmc_hw_reset() or at _mmc_suspend().

What is the relation here?

>
> Unfortunately for me, it was really a case of getting reports of some
> boards started failing at some point which took a while to track back.
> Especially since it's a battery powered device (thus hard poweroffs are
> rather rare) and we allow the board manufactorer to select from various
> different eMMCs depending on price/available at build time...
>
>> Yet, if we add a quirk for that, then we should probably mention it
>> in
>> an error message when we hit -ETIMEDOUT for cache on ("does your card
>> need this quirk?")? It can be pretty time consuming to track this
>> down
>> otherwise, I'd think.
>
> Yes please. It would be nice if someone happens to have the right
> contacts with Micron to see if it's a known issue for their cards in
> general or just this one.
>
> Also would be good to have a timeout higher then 1 seconds (or for
> these cards not have one?); On our testing thusfar we've seen timeouts
> up to 850ms, but it's impossible to ensure that that's the true upper
> bound.

Using no limit of the timeout, would mean we may hang for ~10 minutes
(MMC_OPS_TIMEOUT_MS) instead, no thanks.

I am fine with let's say double of 850ms (1700ms), to have some room.

Anyway, the point is, the timeouts in the spec is there for reason.
Unfortunate I think the spec is "lazy" in some other regards and don't
specify timeouts, which complicates things.

Kind regards
Uffe

[1]
https://www.spinics.net/lists/linux-mmc/msg51815.html