stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
To: Greg KH <greg@kroah.com>
Cc: CKI Project <cki-project@redhat.com>,
	Linux Stable maillist <stable@vger.kernel.org>
Subject: Re: ❌ FAIL: Stable queue: queue-5.2
Date: Mon, 26 Aug 2019 12:40:12 +0300	[thread overview]
Message-ID: <8badf977-5af5-d5cb-82d1-61f3596f7ec8@redhat.com> (raw)
In-Reply-To: <1e9a3221-f044-a3a0-bbe1-34e6f8a468f0@redhat.com>

On 8/26/19 12:13 PM, Nikolai Kondrashov wrote:
> On 8/26/19 11:33 AM, Greg KH wrote:
>> On Mon, Aug 26, 2019 at 11:23:58AM +0300, Nikolai Kondrashov wrote:
>>> On 8/25/19 5:41 PM, Greg KH wrote:
>>>> On Sun, Aug 25, 2019 at 10:37:26AM -0400, CKI Project wrote:
>>>>> Merge testing
>>>>> -------------
>>>>>
>>>>> We cloned this repository and checked out the following commit:
>>>>>
>>>>>     Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>>>>>     Commit: f7d5b3dc4792 - Linux 5.2.10
>>>>>
>>>>>
>>>>> We grabbed the cc88f4442e50 commit of the stable queue repository.
>>>>>
>>>>> We then merged the patchset with `git am`:
>>>>>
>>>>>     keys-trusted-allow-module-init-if-tpm-is-inactive-or-deactivated.patch
>>>>
>>>> That file is not in the repo, I think your system is messed up :(
>>>
>>> Sorry for the trouble, Greg, but I think it's a race between the changes to
>>> the two repos.
>>>
>>> The job which triggered this message was started right before the moment this
>>> commit was made:
>>>
>>>      https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=af2f46e26e770b3aa0bc304a13ecd24763f3b452
>>>
>>> At that moment, the repo was still on this commit, about five hours old:
>>>
>>>      https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc88f4442e505e9f1f21c8c119debe89cbf63ab2
>>>
>>> which still had the file. And when the job finished, and the message reached
>>> you, yes, the repo no longer contained it.
>>>
>>> At the moment the job started, the latest commit to stable/linux.git
>>> was about 22 minutes old:
>>>
>>>      https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f7d5b3dc4792a5fe0a4d6b8106a8f3eb20c3c24c
>>>
>>> and the repo already contained the patches from the queue, including the one
>>> the job tried to merge:
>>>
>>>      https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.2.y&id=f820ecf609cc38676071ec6c6d3e96b26c73b747
>>
>> How in the world are you seeing such a messed up tree?
>>
>> The 5.2.10 commit moved things around, in one single atomic move.
>>
>>> IIRC, we agreed to not start testing both of the repos until the latest
>>> commits are at least 5 minutes old. In this situation the latest commit was 22
>>> minutes old, so the system started testing.
>>>
>>> We could increase the window to, say, 30 minutes (or something else), to avoid
>>> misfires like this, but then the response time would be increased accordingly.
>>>
>>> It's your pick :)
>>
>> Why is there any race at all?
>>
>> Why do you not have a local mirror of the repo?  When it updates, then
>> run the tests.  Every commit in the tree is "stand alone" and things
>> should work at that point in time.  Don't use a commit as a "time to go
>> mirror something at a later point in time", as you are ending up with
>> trees that are obviously not correct at all.
>>
>> I think you need to rework your systems as no one else seems to have
>> this "stale random tree state" issue.
>>
>> Git does commits in an atomic fashion, how you all are messing that up
>> shows you are doing _way_ more work than you probably need to :)
> 
> Sorry, I'm not the one who implemented and maintains the system, I'm just
> generally aware of how it works and am looking at the code right now, so I
> could be misunderstanding something. Please bear with me :)
> 
> However, I don't see how anything could be done, if we have two git repos,
> which are inconsistent with each other, when CI comes to test them.
> 
> I'll try to draw the timeline of what was happening to explain what I think is
> the problem. All times are in my timezone (UTC+03:00).
> 
> Time            stable/linux.git    stable/stable-queue.git Comments
>                  branch linux-5.2.y  branch master
>                                      subdir queue-5.2
> --------------- ------------------- ----------------------- -----------------
> Aug 5 19:44:27  aad39e30fb9e6e72,                           Repos are
>                  "Linux 5.2.9",                              consistent
>                  *doesn't have* the
>                  patch that failed
> 
> Aug 25 11:53:25                     cc88f4442e505e9f,       Repos are
>                                      "Linux 4.4.190",        consistent
>                                      *has* the patch
>                                      that failed
> 
> Aug 25 17:13:54 f7d5b3dc4792a5,                             Repos are
>                  "Linux 5.2.10",                             inconsistent,
>                  contains patches                            both contain
>                  from the queue                              the same patches
>                  above, including
>                  the failed one
> 
> Aug 25 17:36:18                                             Our CI job starts
> 
> Aug 25 17:36:19                     af2f46e26e770b3a        Repos are
>                                      "Linux 5.2.10",         consistent
>                                      "queue-5.2" dir is
>                                      removed, doesn't
>                                      have the failed
>                                      patch
> 
> Aug 25 17:37:23                                             Our CI sends
>                                                              failure report
> 
> I.e. I think the problem was that both linux-5.2.y branch of stable/linux.git,
> and the queue-5.2 subdir of master branch of stable/stable-queue.git contained
> the same patches for about 22 minutes on Aug 25, when our CI started.
> 
> We sample the latest commits from both repos at the same time (well, as close
> as Python and HTTP allow us), and we update our clones to those before
> testing.
> 
> We also don't start testing if the commits in either are less than 5 minutes
> old to avoid testing inconsistent repos, assuming that 5 minutes are enough to
> update them both to keep them in consistency. We can increase that time to
> what you think best fits your workflow, to avoid hitting these problems.

OK, I keep forgetting about the fact that commit and push times are different,
and I have no idea what was pushed when. I'll go check our code and logs
a little closer.

Nick

  reply	other threads:[~2019-08-26  9:40 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-25 14:37 ❌ FAIL: Stable queue: queue-5.2 CKI Project
2019-08-25 14:41 ` Greg KH
2019-08-26  8:23   ` Nikolai Kondrashov
2019-08-26  8:33     ` Greg KH
2019-08-26  9:13       ` Nikolai Kondrashov
2019-08-26  9:40         ` Nikolai Kondrashov [this message]
2019-08-26 11:12           ` Nikolai Kondrashov
2019-08-26 11:39             ` Nikolai Kondrashov
2019-08-26 13:33               ` Sasha Levin
2019-08-27 13:10                 ` Nikolai Kondrashov
  -- strict thread matches above, loose matches on Subject: below --
2019-09-19 12:42 CKI Project
2019-09-19 12:57 ` Greg KH
2019-09-07 11:48 CKI Project
2019-09-04  5:31 CKI Project
2019-09-04 21:40 ` Rachel Sibley
2019-08-30 23:41 CKI Project
2019-08-28 12:36 CKI Project
2019-09-10  8:19 ` Hangbin Liu
2019-09-10  8:58   ` Greg KH
2019-09-10  9:30     ` Hangbin Liu
2019-09-10  9:40       ` Hangbin Liu
2019-09-10 10:52         ` Sasha Levin
2019-08-27 10:54 CKI Project
2019-08-26 22:06 CKI Project
2019-08-27 14:58 ` Rachel Sibley
2019-08-26 20:33 CKI Project
2019-08-26 21:00 ` Major Hayden
2019-08-22 22:48 CKI Project
2019-08-22 23:37 ` Greg KH
2019-08-23  6:57   ` Jan Stancek
2019-08-23 15:38     ` Sasha Levin
2019-08-23  2:19 ` ? " Murphy Zhou
2019-08-18 21:09  " CKI Project
2019-08-18 14:27 CKI Project
2019-08-18 14:05 CKI Project
2019-08-18 14:20 ` Greg KH
2019-08-18 12:00 CKI Project
2019-08-18 16:59 ` Ondrej Mosnacek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8badf977-5af5-d5cb-82d1-61f3596f7ec8@redhat.com \
    --to=nikolai.kondrashov@redhat.com \
    --cc=cki-project@redhat.com \
    --cc=greg@kroah.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).