All of lore.kernel.org
 help / color / mirror / Atom feed
* CentOS builds failing in Shaman since Friday evening
@ 2017-06-03 12:52 Nathan Cutler
       [not found] ` <CAJ4mKGYGn4vL3k5TKs3v=Ho8L7DuU97eDWLD_HBqCKf+J+pfZg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Nathan Cutler @ 2017-06-03 12:52 UTC (permalink / raw)
  To: ceph-devel

CentOS builds in Shaman started failing with this error:

{standard input}: Assembler messages:
{standard input}:186778: Warning: end of file not at end of a line; 
newline inserted
{standard input}: Error: open CFI at the end of file; missing 
.cfi_endproc directive
c++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

AFAICT the first occurrence was in [1] and the error has been haunting 
the build queue since then.

[1] 
https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03695136842edb0cceace/default/45729/

Nathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [sepia] CentOS builds failing in Shaman since Friday evening
       [not found] ` <CAJ4mKGYGn4vL3k5TKs3v=Ho8L7DuU97eDWLD_HBqCKf+J+pfZg@mail.gmail.com>
@ 2017-06-03 20:45   ` Sage Weil
  2017-06-04  9:26     ` John Spray
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2017-06-03 20:45 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Nathan Cutler, ceph-devel, sepia

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2828 bytes --]

I'm seeing the builds all complete:

https://shaman.ceph.com/repos/ceph/wip-sage-testing/468be5dab6a2d8421a4fc35744463d47a80f47c2/

but it won't schedule:

$ teuthology-suite -s rados -c wip-sage-testing2 --subset 111/444 -p 100 
-k distro
2017-06-03 20:44:27,112.112 INFO:teuthology.suite.run:kernel sha1: distro
2017-06-03 20:44:27,389.389 INFO:teuthology.suite.run:ceph sha1: 
b03f3062d40c35e4898d77604d62e7e7c4e88afd
Traceback (most recent call last):
  File "/home/sage/src/teuthology/virtualenv/bin/teuthology-suite", line 11, in <module>
    load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()
  File "/home/sage/src/teuthology/scripts/suite.py", line 137, in main
    return teuthology.suite.main(args)
  File "/home/sage/src/teuthology/teuthology/suite/__init__.py", line 86, in main
    run = Run(conf)
  File "/home/sage/src/teuthology/teuthology/suite/run.py", line 46, in __init__
    self.base_config = self.create_initial_config()
  File "/home/sage/src/teuthology/teuthology/suite/run.py", line 92, in create_initial_config
    self.choose_ceph_version(ceph_hash)
  File "/home/sage/src/teuthology/teuthology/suite/run.py", line 185, in choose_ceph_version
    util.schedule_fail(str(exc), self.name)
  File "/home/sage/src/teuthology/teuthology/suite/util.py", line 72, in schedule_fail
    raise ScheduleFailError(message, name)
teuthology.exceptions.ScheduleFailError: Scheduling sage-2017-06-03_20:44:27-rados-wip-sage-testing2-distro-basic-smithi 
failed: 'package_manager_version'

:/

On Sat, 3 Jun 2017, Gregory Farnum wrote:

> Adding sepia list for more infrastructure dev attention. (No idea where that
> problem is coming from.)
> 
> On Sat, Jun 3, 2017 at 5:52 AM Nathan Cutler <ncutler@suse.cz> wrote:
>       CentOS builds in Shaman started failing with this error:
> 
>       {standard input}: Assembler messages:
>       {standard input}:186778: Warning: end of file not at end of a
>       line;
>       newline inserted
>       {standard input}: Error: open CFI at the end of file; missing
>       .cfi_endproc directive
>       c++: internal compiler error: Killed (program cc1plus)
>       Please submit a full bug report,
>       with preprocessed source if appropriate.
>       See <http://bugzilla.redhat.com/bugzilla> for instructions.
> 
>       AFAICT the first occurrence was in [1] and the error has been
>       haunting
>       the build queue since then.
> 
>       [1]
> https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03
>       695136842edb0cceace/default/45729/
> 
>       Nathan
>       --
>       To unsubscribe from this list: send the line "unsubscribe
>       ceph-devel" in
>       the body of a message to majordomo@vger.kernel.org
>       More majordomo info at 
>       http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [sepia] CentOS builds failing in Shaman since Friday evening
  2017-06-03 20:45   ` [sepia] " Sage Weil
@ 2017-06-04  9:26     ` John Spray
  2017-06-05 13:16       ` Alfredo Deza
  0 siblings, 1 reply; 6+ messages in thread
From: John Spray @ 2017-06-04  9:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Gregory Farnum, sepia, ceph-devel, Nathan Cutler

On Sat, Jun 3, 2017 at 9:45 PM, Sage Weil <sage@newdream.net> wrote:
> I'm seeing the builds all complete:
>
> https://shaman.ceph.com/repos/ceph/wip-sage-testing/468be5dab6a2d8421a4fc35744463d47a80f47c2/
>
> but it won't schedule:
>
> $ teuthology-suite -s rados -c wip-sage-testing2 --subset 111/444 -p 100
> -k distro
> 2017-06-03 20:44:27,112.112 INFO:teuthology.suite.run:kernel sha1: distro
> 2017-06-03 20:44:27,389.389 INFO:teuthology.suite.run:ceph sha1:
> b03f3062d40c35e4898d77604d62e7e7c4e88afd
> Traceback (most recent call last):
>   File "/home/sage/src/teuthology/virtualenv/bin/teuthology-suite", line 11, in <module>
>     load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()
>   File "/home/sage/src/teuthology/scripts/suite.py", line 137, in main
>     return teuthology.suite.main(args)
>   File "/home/sage/src/teuthology/teuthology/suite/__init__.py", line 86, in main
>     run = Run(conf)
>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 46, in __init__
>     self.base_config = self.create_initial_config()
>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 92, in create_initial_config
>     self.choose_ceph_version(ceph_hash)
>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 185, in choose_ceph_version
>     util.schedule_fail(str(exc), self.name)
>   File "/home/sage/src/teuthology/teuthology/suite/util.py", line 72, in schedule_fail
>     raise ScheduleFailError(message, name)
> teuthology.exceptions.ScheduleFailError: Scheduling sage-2017-06-03_20:44:27-rados-wip-sage-testing2-distro-basic-smithi
> failed: 'package_manager_version'
>
> :/

Same here.

The tip of my branch is 50b0654e, I can see teuthology finding that
and going to query shaman at this URL:
https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=centos%2F7%2Fx86_64&sha1=50b0654ed19cb083af494878738c7debe80db31e

The result has an empty dict for the 'extra' field, where teuthology
is expecting to see package_manager_version.

That stuff is supposed to be populated by ceph-build/build/build-rpm
posting a repo-extra.json file to chacra.ceph.com

I see my build log here:
https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/3848//consoleFull

And I see the POST to chacra failing here:

"""
build@2/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/repo-extra.json
-u admin:[*******]
https://1.chacra.ceph.com/repos/ceph/wip-jcsp-testing-20170604/50b0654ed19cb083af494878738c7debe80db31e/centos/7/flavors/default/extra/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   505  100    52  100   453    145   1267 --:--:-- --:--:-- --:--:--  1268
404 Not Found

The resource could not be found.
"""

So the ceph-build script is succeeding where it should be failing
(does curl not return an error or is the script ignoring it?) and
something is wrong with chacra.ceph.com that's making it 404 here (I
don't know where to begin to debug that).

John

P.S. Probably a topic for another day, but I didn't love having to
traverse several different git repos to try and work out what was
happening during a build, wouldn't it be simpler to have a single repo
for the build infrastructure?

>
> On Sat, 3 Jun 2017, Gregory Farnum wrote:
>
>> Adding sepia list for more infrastructure dev attention. (No idea where that
>> problem is coming from.)
>>
>> On Sat, Jun 3, 2017 at 5:52 AM Nathan Cutler <ncutler@suse.cz> wrote:
>>       CentOS builds in Shaman started failing with this error:
>>
>>       {standard input}: Assembler messages:
>>       {standard input}:186778: Warning: end of file not at end of a
>>       line;
>>       newline inserted
>>       {standard input}: Error: open CFI at the end of file; missing
>>       .cfi_endproc directive
>>       c++: internal compiler error: Killed (program cc1plus)
>>       Please submit a full bug report,
>>       with preprocessed source if appropriate.
>>       See <http://bugzilla.redhat.com/bugzilla> for instructions.
>>
>>       AFAICT the first occurrence was in [1] and the error has been
>>       haunting
>>       the build queue since then.
>>
>>       [1]
>> https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03
>>       695136842edb0cceace/default/45729/
>>
>>       Nathan
>>       --
>>       To unsubscribe from this list: send the line "unsubscribe
>>       ceph-devel" in
>>       the body of a message to majordomo@vger.kernel.org
>>       More majordomo info at
>>       http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
> _______________________________________________
> Sepia mailing list
> Sepia@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [sepia] CentOS builds failing in Shaman since Friday evening
  2017-06-04  9:26     ` John Spray
@ 2017-06-05 13:16       ` Alfredo Deza
  2017-06-05 13:23         ` Alfredo Deza
  2017-06-05 17:29         ` John Spray
  0 siblings, 2 replies; 6+ messages in thread
From: Alfredo Deza @ 2017-06-05 13:16 UTC (permalink / raw)
  To: John Spray; +Cc: Sage Weil, sepia, ceph-devel, Nathan Cutler

On Sun, Jun 4, 2017 at 5:26 AM, John Spray <jspray@redhat.com> wrote:
> On Sat, Jun 3, 2017 at 9:45 PM, Sage Weil <sage@newdream.net> wrote:
>> I'm seeing the builds all complete:
>>
>> https://shaman.ceph.com/repos/ceph/wip-sage-testing/468be5dab6a2d8421a4fc35744463d47a80f47c2/
>>
>> but it won't schedule:
>>
>> $ teuthology-suite -s rados -c wip-sage-testing2 --subset 111/444 -p 100
>> -k distro
>> 2017-06-03 20:44:27,112.112 INFO:teuthology.suite.run:kernel sha1: distro
>> 2017-06-03 20:44:27,389.389 INFO:teuthology.suite.run:ceph sha1:
>> b03f3062d40c35e4898d77604d62e7e7c4e88afd
>> Traceback (most recent call last):
>>   File "/home/sage/src/teuthology/virtualenv/bin/teuthology-suite", line 11, in <module>
>>     load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()
>>   File "/home/sage/src/teuthology/scripts/suite.py", line 137, in main
>>     return teuthology.suite.main(args)
>>   File "/home/sage/src/teuthology/teuthology/suite/__init__.py", line 86, in main
>>     run = Run(conf)
>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 46, in __init__
>>     self.base_config = self.create_initial_config()
>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 92, in create_initial_config
>>     self.choose_ceph_version(ceph_hash)
>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 185, in choose_ceph_version
>>     util.schedule_fail(str(exc), self.name)
>>   File "/home/sage/src/teuthology/teuthology/suite/util.py", line 72, in schedule_fail
>>     raise ScheduleFailError(message, name)
>> teuthology.exceptions.ScheduleFailError: Scheduling sage-2017-06-03_20:44:27-rados-wip-sage-testing2-distro-basic-smithi
>> failed: 'package_manager_version'
>>
>> :/
>
> Same here.

On Friday we had configuration that was recently pushed to Jenkins to
build nfs-ganesha and samba for every commit to Ceph release branches
*including master*

The effect of that change was not immediately apparent since it
"reacts" to behavior on the Ceph repo.

Locally `git log` just shows 19 new commits for June 2nd (that Friday)
but Github shows about 15 merges with a *ton* of commits for master
(+100 commits)

This is not usually a problem, but the combinatorial effect meant that
those ~100 commits where really more like +300 commits *that appeared
within minutes of each other*.

Trying to mitigate that problem, I manually tried to change a slave to
be able to consume more of these "bookkeeping" from the master Jenkins
instance. This had the problem of
doing up to 10 ceph builds at the same time (we don't allow this) and
having mixed information as to where builds go.

Builds follow this path: github -> jenkins trigger -> jenkins jobs for
different distros -> jenkins asks shaman what chacra server to push to
-> binaries are pushed to selected chacra server

Since I made this one server do several Ceph builds, the variables
that are used to find out "what chacra server should I push my
binaries to" got polluted. This is why John's build POSTed
to a chacra server that was wrong (hence the 404).

On Friday we disabled the nfs-ganesha, and samba builds, and we have a
tracker issue open to address the fact that we are (currently) unable
to digest several hundred commits at once:

    http://tracker.ceph.com/issues/20095

Apologies for the trouble, this unfortunately means you will need to
rebuild your branches (if they failed to schedule)

>
> The tip of my branch is 50b0654e, I can see teuthology finding that
> and going to query shaman at this URL:
> https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=centos%2F7%2Fx86_64&sha1=50b0654ed19cb083af494878738c7debe80db31e
>
> The result has an empty dict for the 'extra' field, where teuthology
> is expecting to see package_manager_version.
>
> That stuff is supposed to be populated by ceph-build/build/build-rpm
> posting a repo-extra.json file to chacra.ceph.com
>
> I see my build log here:
> https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/3848//consoleFull
>
> And I see the POST to chacra failing here:
>
> """
> build@2/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/repo-extra.json
> -u admin:[*******]
> https://1.chacra.ceph.com/repos/ceph/wip-jcsp-testing-20170604/50b0654ed19cb083af494878738c7debe80db31e/centos/7/flavors/default/extra/
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
>
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> 100   505  100    52  100   453    145   1267 --:--:-- --:--:-- --:--:--  1268
> 404 Not Found
>
> The resource could not be found.
> """
>
> So the ceph-build script is succeeding where it should be failing
> (does curl not return an error or is the script ignoring it?) and
> something is wrong with chacra.ceph.com that's making it 404 here (I
> don't know where to begin to debug that).
>
> John
>
> P.S. Probably a topic for another day, but I didn't love having to
> traverse several different git repos to try and work out what was
> happening during a build, wouldn't it be simpler to have a single repo
> for the build infrastructure?
>
>>
>> On Sat, 3 Jun 2017, Gregory Farnum wrote:
>>
>>> Adding sepia list for more infrastructure dev attention. (No idea where that
>>> problem is coming from.)
>>>
>>> On Sat, Jun 3, 2017 at 5:52 AM Nathan Cutler <ncutler@suse.cz> wrote:
>>>       CentOS builds in Shaman started failing with this error:
>>>
>>>       {standard input}: Assembler messages:
>>>       {standard input}:186778: Warning: end of file not at end of a
>>>       line;
>>>       newline inserted
>>>       {standard input}: Error: open CFI at the end of file; missing
>>>       .cfi_endproc directive
>>>       c++: internal compiler error: Killed (program cc1plus)
>>>       Please submit a full bug report,
>>>       with preprocessed source if appropriate.
>>>       See <http://bugzilla.redhat.com/bugzilla> for instructions.
>>>
>>>       AFAICT the first occurrence was in [1] and the error has been
>>>       haunting
>>>       the build queue since then.
>>>
>>>       [1]
>>> https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03
>>>       695136842edb0cceace/default/45729/
>>>
>>>       Nathan
>>>       --
>>>       To unsubscribe from this list: send the line "unsubscribe
>>>       ceph-devel" in
>>>       the body of a message to majordomo@vger.kernel.org
>>>       More majordomo info at
>>>       http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>
>> _______________________________________________
>> Sepia mailing list
>> Sepia@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com
>>
> _______________________________________________
> Sepia mailing list
> Sepia@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [sepia] CentOS builds failing in Shaman since Friday evening
  2017-06-05 13:16       ` Alfredo Deza
@ 2017-06-05 13:23         ` Alfredo Deza
  2017-06-05 17:29         ` John Spray
  1 sibling, 0 replies; 6+ messages in thread
From: Alfredo Deza @ 2017-06-05 13:23 UTC (permalink / raw)
  To: John Spray; +Cc: Sage Weil, sepia, ceph-devel, Nathan Cutler

And I just had to kill a few concurrent builds that got picked up by
the misconfigured slave.

If your build says 'Killed by Alfredo' it is because of the previous
email. Please reschedule!


On Mon, Jun 5, 2017 at 9:16 AM, Alfredo Deza <adeza@redhat.com> wrote:
> On Sun, Jun 4, 2017 at 5:26 AM, John Spray <jspray@redhat.com> wrote:
>> On Sat, Jun 3, 2017 at 9:45 PM, Sage Weil <sage@newdream.net> wrote:
>>> I'm seeing the builds all complete:
>>>
>>> https://shaman.ceph.com/repos/ceph/wip-sage-testing/468be5dab6a2d8421a4fc35744463d47a80f47c2/
>>>
>>> but it won't schedule:
>>>
>>> $ teuthology-suite -s rados -c wip-sage-testing2 --subset 111/444 -p 100
>>> -k distro
>>> 2017-06-03 20:44:27,112.112 INFO:teuthology.suite.run:kernel sha1: distro
>>> 2017-06-03 20:44:27,389.389 INFO:teuthology.suite.run:ceph sha1:
>>> b03f3062d40c35e4898d77604d62e7e7c4e88afd
>>> Traceback (most recent call last):
>>>   File "/home/sage/src/teuthology/virtualenv/bin/teuthology-suite", line 11, in <module>
>>>     load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()
>>>   File "/home/sage/src/teuthology/scripts/suite.py", line 137, in main
>>>     return teuthology.suite.main(args)
>>>   File "/home/sage/src/teuthology/teuthology/suite/__init__.py", line 86, in main
>>>     run = Run(conf)
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 46, in __init__
>>>     self.base_config = self.create_initial_config()
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 92, in create_initial_config
>>>     self.choose_ceph_version(ceph_hash)
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 185, in choose_ceph_version
>>>     util.schedule_fail(str(exc), self.name)
>>>   File "/home/sage/src/teuthology/teuthology/suite/util.py", line 72, in schedule_fail
>>>     raise ScheduleFailError(message, name)
>>> teuthology.exceptions.ScheduleFailError: Scheduling sage-2017-06-03_20:44:27-rados-wip-sage-testing2-distro-basic-smithi
>>> failed: 'package_manager_version'
>>>
>>> :/
>>
>> Same here.
>
> On Friday we had configuration that was recently pushed to Jenkins to
> build nfs-ganesha and samba for every commit to Ceph release branches
> *including master*
>
> The effect of that change was not immediately apparent since it
> "reacts" to behavior on the Ceph repo.
>
> Locally `git log` just shows 19 new commits for June 2nd (that Friday)
> but Github shows about 15 merges with a *ton* of commits for master
> (+100 commits)
>
> This is not usually a problem, but the combinatorial effect meant that
> those ~100 commits where really more like +300 commits *that appeared
> within minutes of each other*.
>
> Trying to mitigate that problem, I manually tried to change a slave to
> be able to consume more of these "bookkeeping" from the master Jenkins
> instance. This had the problem of
> doing up to 10 ceph builds at the same time (we don't allow this) and
> having mixed information as to where builds go.
>
> Builds follow this path: github -> jenkins trigger -> jenkins jobs for
> different distros -> jenkins asks shaman what chacra server to push to
> -> binaries are pushed to selected chacra server
>
> Since I made this one server do several Ceph builds, the variables
> that are used to find out "what chacra server should I push my
> binaries to" got polluted. This is why John's build POSTed
> to a chacra server that was wrong (hence the 404).
>
> On Friday we disabled the nfs-ganesha, and samba builds, and we have a
> tracker issue open to address the fact that we are (currently) unable
> to digest several hundred commits at once:
>
>     http://tracker.ceph.com/issues/20095
>
> Apologies for the trouble, this unfortunately means you will need to
> rebuild your branches (if they failed to schedule)
>
>>
>> The tip of my branch is 50b0654e, I can see teuthology finding that
>> and going to query shaman at this URL:
>> https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=centos%2F7%2Fx86_64&sha1=50b0654ed19cb083af494878738c7debe80db31e
>>
>> The result has an empty dict for the 'extra' field, where teuthology
>> is expecting to see package_manager_version.
>>
>> That stuff is supposed to be populated by ceph-build/build/build-rpm
>> posting a repo-extra.json file to chacra.ceph.com
>>
>> I see my build log here:
>> https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/3848//consoleFull
>>
>> And I see the POST to chacra failing here:
>>
>> """
>> build@2/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/repo-extra.json
>> -u admin:[*******]
>> https://1.chacra.ceph.com/repos/ceph/wip-jcsp-testing-20170604/50b0654ed19cb083af494878738c7debe80db31e/centos/7/flavors/default/extra/
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                  Dload  Upload   Total   Spent    Left  Speed
>>
>>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
>> 100   505  100    52  100   453    145   1267 --:--:-- --:--:-- --:--:--  1268
>> 404 Not Found
>>
>> The resource could not be found.
>> """
>>
>> So the ceph-build script is succeeding where it should be failing
>> (does curl not return an error or is the script ignoring it?) and
>> something is wrong with chacra.ceph.com that's making it 404 here (I
>> don't know where to begin to debug that).
>>
>> John
>>
>> P.S. Probably a topic for another day, but I didn't love having to
>> traverse several different git repos to try and work out what was
>> happening during a build, wouldn't it be simpler to have a single repo
>> for the build infrastructure?
>>
>>>
>>> On Sat, 3 Jun 2017, Gregory Farnum wrote:
>>>
>>>> Adding sepia list for more infrastructure dev attention. (No idea where that
>>>> problem is coming from.)
>>>>
>>>> On Sat, Jun 3, 2017 at 5:52 AM Nathan Cutler <ncutler@suse.cz> wrote:
>>>>       CentOS builds in Shaman started failing with this error:
>>>>
>>>>       {standard input}: Assembler messages:
>>>>       {standard input}:186778: Warning: end of file not at end of a
>>>>       line;
>>>>       newline inserted
>>>>       {standard input}: Error: open CFI at the end of file; missing
>>>>       .cfi_endproc directive
>>>>       c++: internal compiler error: Killed (program cc1plus)
>>>>       Please submit a full bug report,
>>>>       with preprocessed source if appropriate.
>>>>       See <http://bugzilla.redhat.com/bugzilla> for instructions.
>>>>
>>>>       AFAICT the first occurrence was in [1] and the error has been
>>>>       haunting
>>>>       the build queue since then.
>>>>
>>>>       [1]
>>>> https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03
>>>>       695136842edb0cceace/default/45729/
>>>>
>>>>       Nathan
>>>>       --
>>>>       To unsubscribe from this list: send the line "unsubscribe
>>>>       ceph-devel" in
>>>>       the body of a message to majordomo@vger.kernel.org
>>>>       More majordomo info at
>>>>       http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Sepia mailing list
>>> Sepia@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com
>>>
>> _______________________________________________
>> Sepia mailing list
>> Sepia@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [sepia] CentOS builds failing in Shaman since Friday evening
  2017-06-05 13:16       ` Alfredo Deza
  2017-06-05 13:23         ` Alfredo Deza
@ 2017-06-05 17:29         ` John Spray
  1 sibling, 0 replies; 6+ messages in thread
From: John Spray @ 2017-06-05 17:29 UTC (permalink / raw)
  To: Alfredo Deza; +Cc: Sage Weil, sepia, ceph-devel, Nathan Cutler

On Mon, Jun 5, 2017 at 2:16 PM, Alfredo Deza <adeza@redhat.com> wrote:
> On Sun, Jun 4, 2017 at 5:26 AM, John Spray <jspray@redhat.com> wrote:
>> On Sat, Jun 3, 2017 at 9:45 PM, Sage Weil <sage@newdream.net> wrote:
>>> I'm seeing the builds all complete:
>>>
>>> https://shaman.ceph.com/repos/ceph/wip-sage-testing/468be5dab6a2d8421a4fc35744463d47a80f47c2/
>>>
>>> but it won't schedule:
>>>
>>> $ teuthology-suite -s rados -c wip-sage-testing2 --subset 111/444 -p 100
>>> -k distro
>>> 2017-06-03 20:44:27,112.112 INFO:teuthology.suite.run:kernel sha1: distro
>>> 2017-06-03 20:44:27,389.389 INFO:teuthology.suite.run:ceph sha1:
>>> b03f3062d40c35e4898d77604d62e7e7c4e88afd
>>> Traceback (most recent call last):
>>>   File "/home/sage/src/teuthology/virtualenv/bin/teuthology-suite", line 11, in <module>
>>>     load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()
>>>   File "/home/sage/src/teuthology/scripts/suite.py", line 137, in main
>>>     return teuthology.suite.main(args)
>>>   File "/home/sage/src/teuthology/teuthology/suite/__init__.py", line 86, in main
>>>     run = Run(conf)
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 46, in __init__
>>>     self.base_config = self.create_initial_config()
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 92, in create_initial_config
>>>     self.choose_ceph_version(ceph_hash)
>>>   File "/home/sage/src/teuthology/teuthology/suite/run.py", line 185, in choose_ceph_version
>>>     util.schedule_fail(str(exc), self.name)
>>>   File "/home/sage/src/teuthology/teuthology/suite/util.py", line 72, in schedule_fail
>>>     raise ScheduleFailError(message, name)
>>> teuthology.exceptions.ScheduleFailError: Scheduling sage-2017-06-03_20:44:27-rados-wip-sage-testing2-distro-basic-smithi
>>> failed: 'package_manager_version'
>>>
>>> :/
>>
>> Same here.
>
> On Friday we had configuration that was recently pushed to Jenkins to
> build nfs-ganesha and samba for every commit to Ceph release branches
> *including master*
>
> The effect of that change was not immediately apparent since it
> "reacts" to behavior on the Ceph repo.
>
> Locally `git log` just shows 19 new commits for June 2nd (that Friday)
> but Github shows about 15 merges with a *ton* of commits for master
> (+100 commits)
>
> This is not usually a problem, but the combinatorial effect meant that
> those ~100 commits where really more like +300 commits *that appeared
> within minutes of each other*.
>
> Trying to mitigate that problem, I manually tried to change a slave to
> be able to consume more of these "bookkeeping" from the master Jenkins
> instance. This had the problem of
> doing up to 10 ceph builds at the same time (we don't allow this) and
> having mixed information as to where builds go.
>
> Builds follow this path: github -> jenkins trigger -> jenkins jobs for
> different distros -> jenkins asks shaman what chacra server to push to
> -> binaries are pushed to selected chacra server
>
> Since I made this one server do several Ceph builds, the variables
> that are used to find out "what chacra server should I push my
> binaries to" got polluted. This is why John's build POSTed
> to a chacra server that was wrong (hence the 404).
>
> On Friday we disabled the nfs-ganesha, and samba builds, and we have a
> tracker issue open to address the fact that we are (currently) unable
> to digest several hundred commits at once:
>
>     http://tracker.ceph.com/issues/20095
>
> Apologies for the trouble, this unfortunately means you will need to
> rebuild your branches (if they failed to schedule)

Thanks Alfredo -- we appear to be back in business!

John

>
>>
>> The tip of my branch is 50b0654e, I can see teuthology finding that
>> and going to query shaman at this URL:
>> https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=centos%2F7%2Fx86_64&sha1=50b0654ed19cb083af494878738c7debe80db31e
>>
>> The result has an empty dict for the 'extra' field, where teuthology
>> is expecting to see package_manager_version.
>>
>> That stuff is supposed to be populated by ceph-build/build/build-rpm
>> posting a repo-extra.json file to chacra.ceph.com
>>
>> I see my build log here:
>> https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos7,DIST=centos7,MACHINE_SIZE=huge/3848//consoleFull
>>
>> And I see the POST to chacra failing here:
>>
>> """
>> build@2/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/repo-extra.json
>> -u admin:[*******]
>> https://1.chacra.ceph.com/repos/ceph/wip-jcsp-testing-20170604/50b0654ed19cb083af494878738c7debe80db31e/centos/7/flavors/default/extra/
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                  Dload  Upload   Total   Spent    Left  Speed
>>
>>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
>> 100   505  100    52  100   453    145   1267 --:--:-- --:--:-- --:--:--  1268
>> 404 Not Found
>>
>> The resource could not be found.
>> """
>>
>> So the ceph-build script is succeeding where it should be failing
>> (does curl not return an error or is the script ignoring it?) and
>> something is wrong with chacra.ceph.com that's making it 404 here (I
>> don't know where to begin to debug that).
>>
>> John
>>
>> P.S. Probably a topic for another day, but I didn't love having to
>> traverse several different git repos to try and work out what was
>> happening during a build, wouldn't it be simpler to have a single repo
>> for the build infrastructure?
>>
>>>
>>> On Sat, 3 Jun 2017, Gregory Farnum wrote:
>>>
>>>> Adding sepia list for more infrastructure dev attention. (No idea where that
>>>> problem is coming from.)
>>>>
>>>> On Sat, Jun 3, 2017 at 5:52 AM Nathan Cutler <ncutler@suse.cz> wrote:
>>>>       CentOS builds in Shaman started failing with this error:
>>>>
>>>>       {standard input}: Assembler messages:
>>>>       {standard input}:186778: Warning: end of file not at end of a
>>>>       line;
>>>>       newline inserted
>>>>       {standard input}: Error: open CFI at the end of file; missing
>>>>       .cfi_endproc directive
>>>>       c++: internal compiler error: Killed (program cc1plus)
>>>>       Please submit a full bug report,
>>>>       with preprocessed source if appropriate.
>>>>       See <http://bugzilla.redhat.com/bugzilla> for instructions.
>>>>
>>>>       AFAICT the first occurrence was in [1] and the error has been
>>>>       haunting
>>>>       the build queue since then.
>>>>
>>>>       [1]
>>>> https://shaman.ceph.com/builds/ceph/wip-sage-testing2/f93ad23a8fec219667a03
>>>>       695136842edb0cceace/default/45729/
>>>>
>>>>       Nathan
>>>>       --
>>>>       To unsubscribe from this list: send the line "unsubscribe
>>>>       ceph-devel" in
>>>>       the body of a message to majordomo@vger.kernel.org
>>>>       More majordomo info at
>>>>       http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Sepia mailing list
>>> Sepia@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com
>>>
>> _______________________________________________
>> Sepia mailing list
>> Sepia@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/sepia-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-05 17:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-03 12:52 CentOS builds failing in Shaman since Friday evening Nathan Cutler
     [not found] ` <CAJ4mKGYGn4vL3k5TKs3v=Ho8L7DuU97eDWLD_HBqCKf+J+pfZg@mail.gmail.com>
2017-06-03 20:45   ` [sepia] " Sage Weil
2017-06-04  9:26     ` John Spray
2017-06-05 13:16       ` Alfredo Deza
2017-06-05 13:23         ` Alfredo Deza
2017-06-05 17:29         ` John Spray

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.