All of lore.kernel.org
 help / color / mirror / Atom feed
* long (never ending?) do_install for adwaita-icon-theme
@ 2017-08-22 17:32 Trevor Woerner
  2017-08-22 17:40 ` Richard Purdie
  0 siblings, 1 reply; 13+ messages in thread
From: Trevor Woerner @ 2017-08-22 17:32 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

Is anyone else seeing insanely long (never ending?) do_install times
for adwaita-icon-theme? The other day I manually killed a jenkins jobs
still running after 3 days waiting for this recipe's do_install.

This morning I started a job manually and it's currently stuck in

    0: adwaita-icon-theme-3.24.0-r0 do_install - 9241s (pid 25320)

I'm pretty sure it's not going to end without manual intervention.

These aren't the first/only times I've seen this issue, they're just
the most recent. It doesn't happen often, but enough to be noticeable.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-22 17:32 long (never ending?) do_install for adwaita-icon-theme Trevor Woerner
@ 2017-08-22 17:40 ` Richard Purdie
  2017-08-22 18:01   ` Alexander Kanavin
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Purdie @ 2017-08-22 17:40 UTC (permalink / raw)
  To: Trevor Woerner, Patches and discussions about the oe-core layer
  Cc: Alexander Kanavin

On Tue, 2017-08-22 at 13:32 -0400, Trevor Woerner wrote:
> Is anyone else seeing insanely long (never ending?) do_install times
> for adwaita-icon-theme? The other day I manually killed a jenkins
> jobs
> still running after 3 days waiting for this recipe's do_install.
> 
> This morning I started a job manually and it's currently stuck in
> 
>     0: adwaita-icon-theme-3.24.0-r0 do_install - 9241s (pid 25320)
> 
> I'm pretty sure it's not going to end without manual intervention.
> 
> These aren't the first/only times I've seen this issue, they're just
> the most recent. It doesn't happen often, but enough to be
> noticeable.

I suspect it may be http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=231c9fe94d5b65bea345ffe9eb5e68b0db11cb07.

I've cc'd Alex, Ross and I have seen this very occasionally too.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-22 17:40 ` Richard Purdie
@ 2017-08-22 18:01   ` Alexander Kanavin
  2017-08-22 18:46     ` Trevor Woerner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Kanavin @ 2017-08-22 18:01 UTC (permalink / raw)
  To: Richard Purdie, Trevor Woerner,
	Patches and discussions about the oe-core layer
  Cc: Alexander Kanavin

On 08/22/2017 08:40 PM, Richard Purdie wrote:
>> Is anyone else seeing insanely long (never ending?) do_install times
>> for adwaita-icon-theme? The other day I manually killed a jenkins
>> jobs
>> still running after 3 days waiting for this recipe's do_install.
>>
>> This morning I started a job manually and it's currently stuck in
>>
>>      0: adwaita-icon-theme-3.24.0-r0 do_install - 9241s (pid 25320)
>>
>> I'm pretty sure it's not going to end without manual intervention.
>>
>> These aren't the first/only times I've seen this issue, they're just
>> the most recent. It doesn't happen often, but enough to be
>> noticeable.
> 
> I suspect it may be http://git.yoctoproject.org/cgit.cgi/poky/commit/?id=231c9fe94d5b65bea345ffe9eb5e68b0db11cb07.
> 
> I've cc'd Alex, Ross and I have seen this very occasionally too.

Do you have any kind of logs for when it happens?

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-22 18:01   ` Alexander Kanavin
@ 2017-08-22 18:46     ` Trevor Woerner
       [not found]       ` <e165f760-1c1e-1a5f-2fba-f8c293a16278@intel.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Trevor Woerner @ 2017-08-22 18:46 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: Alexander Kanavin, Patches and discussions about the oe-core layer

On Tue, Aug 22, 2017 at 2:01 PM, Alexander Kanavin
<alexander.kanavin@linux.intel.com> wrote:
> Do you have any kind of logs for when it happens?


Unfortunately not, I'll try to keep some the next time it happens.

But I have a suspicion it might be related to a build failure. Earlier
today I was doing a build which failed in a recipe that wasn't
adwaita-icon-theme, nor related to it. When I checked on my build
later I could see the tail of the error message from the failed
recipe, but the overall build was still waiting for
adwaita-icon-theme's do_install to finish (hours after the failure).

Maybe adwaita-icon-theme's do_install would have succeeded if
everything had gone well? Maybe to reproduce, it might be necessary to
time it such that another recipe fails while adwaita-icon-theme is
building?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
       [not found]       ` <e165f760-1c1e-1a5f-2fba-f8c293a16278@intel.com>
@ 2017-08-23 12:48         ` Alexander Kanavin
  2017-08-27 16:01         ` Richard Purdie
  1 sibling, 0 replies; 13+ messages in thread
From: Alexander Kanavin @ 2017-08-23 12:48 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On 08/22/2017 09:46 PM, Trevor Woerner wrote:
>> Do you have any kind of logs for when it happens?
> 
> 
> Unfortunately not, I'll try to keep some the next time it happens.
> 
> But I have a suspicion it might be related to a build failure. Earlier
> today I was doing a build which failed in a recipe that wasn't
> adwaita-icon-theme, nor related to it. When I checked on my build
> later I could see the tail of the error message from the failed
> recipe, but the overall build was still waiting for
> adwaita-icon-theme's do_install to finish (hours after the failure).
> 
> Maybe adwaita-icon-theme's do_install would have succeeded if
> everything had gone well? Maybe to reproduce, it might be necessary to
> time it such that another recipe fails while adwaita-icon-theme is
> building?

We can certainly revert my patch that speeds up the do_install(), but I 
would prefer not to do it blindly: it's much better to first firmly 
establish that the patch is indeed the reason for the never ending 
do_install (I can't imagine why, but it certainly could be). So please 
do try to get some logs, or a stable way to reproduce.

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
       [not found]       ` <e165f760-1c1e-1a5f-2fba-f8c293a16278@intel.com>
  2017-08-23 12:48         ` Alexander Kanavin
@ 2017-08-27 16:01         ` Richard Purdie
  2017-08-27 16:07           ` Richard Purdie
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Purdie @ 2017-08-27 16:01 UTC (permalink / raw)
  To: Alexander Kanavin, Trevor Woerner, Alexander Kanavin, seebs
  Cc: Patches and discussions about the oe-core layer

On Wed, 2017-08-23 at 15:46 +0300, Alexander Kanavin wrote:
> On 08/22/2017 09:46 PM, Trevor Woerner wrote:
> > 
> > > 
> > > Do you have any kind of logs for when it happens?
> > 
> > Unfortunately not, I'll try to keep some the next time it happens.
> > 
> > But I have a suspicion it might be related to a build failure.
> > Earlier
> > today I was doing a build which failed in a recipe that wasn't
> > adwaita-icon-theme, nor related to it. When I checked on my build
> > later I could see the tail of the error message from the failed
> > recipe, but the overall build was still waiting for
> > adwaita-icon-theme's do_install to finish (hours after the
> > failure).
> > 
> > Maybe adwaita-icon-theme's do_install would have succeeded if
> > everything had gone well? Maybe to reproduce, it might be necessary
> > to
> > time it such that another recipe fails while adwaita-icon-theme is
> > building?
> We can certainly revert my patch that speeds up the do_install(), but
> I 
> would prefer not to do it blindly: it's much better to first firmly 
> establish that the patch is indeed the reason for the never ending 
> do_install (I can't imagine why, but it certainly could be). So
> please 
> do try to get some logs, or a stable way to reproduce.

The autobuilder hung with this issue on the debian8 worker:

https://autobuilder.yocto.io/builders/nightly-rpm-non-rpm/builds/435

I was able to ssh in and take a look at what was going on. Basically
its hung in connecting to pseudo. I did a little debugging whilst it
was hung:

The process list is a long list of forked off copies of install-sh
which in turn call chmod and cp of which there are a ton of jammed
processes.

$ ps ax | grep "cp " | wc
    272    1903   81859
$ ps ax | grep "chmod " | wc
     99     692   22472
$ ps ax | grep "install-sh " | wc
    508    5638  252481

$ grep 33196 /home/rpurdie/pslog33196 ?        SN     0:00 chmod 644 /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/image/usr/share/icons/Adwaita//./32x32/status/_inst.29542_

$ strace -p 33196
Process 33196 attached
connect(22, {sa_family=AF_LOCAL, sun_path="pseudo.socket"}, 110

$ grep 29542 /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/pseudo.log
[i.e. no match]

$ grep 33208 /home/rpurdie/pslog
33208 ?        SN     0:00 chmod 644 /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/image/usr/share/icons/Adwaita//./32x32/mimetypes/_inst.29870_

$ strace -p 33208
Process 33208 attached
connect(22, {sa_family=AF_LOCAL, sun_path="pseudo.socket"}, 110

$ ps ax | grep pseudo
19251 ?        S      0:00 /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/sysroots-components/x86_64/pseudo-native/usr/bin/pseudo bitbake-worker decafbadbeef
43675 ?        Ss     0:13 /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/sysroots-components/x86_64/pseudo-native/usr/bin/pseudo -d

$ strace -p 43675
Process 43675 attached
read(1025,

$ ls -la /proc/43675/fd | grep socket | wc
    486    5346   34471
$ ls -la /proc/43675/fd | grep -v socket
total 0
dr-x------ 2 pokybuild users  0 Aug 27 15:12 .
dr-xr-xr-x 9 pokybuild users  0 Aug 27 08:47 ..
lrwx------ 1 pokybuild users 64 Aug 27 15:12 0 -> /dev/null
lrwx------ 1 pokybuild users 64 Aug 27 15:12 1 -> /dev/null
l-wx------ 1 pokybuild users 64 Aug 27 15:12 2 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/pseudo.log
l-wx------ 1 pokybuild users 64 Aug 27 15:12 3 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/pseudo.log
lrwx------ 1 pokybuild users 64 Aug 27 15:12 4 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/pseudo.lock
lrwx------ 1 pokybuild users 64 Aug 27 15:12 6 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/logs.db
lrwx------ 1 pokybuild users 64 Aug 27 15:12 7 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo/files.db

So it has 486 sockets connected and very little else open as fds.

$ ls -la /proc/43675/fd | grep 1025
lrwx------ 1 pokybuild users 64 Aug 27 15:12 1025 -> socket:[111861392]

$ ss  -p | grep 1392
u_str  ESTAB      0      0        pseudo.socket 111861392               * 111831542 users:(("pseudo",pid=43675,fd=1025))
u_str  ESTAB      0      0                    * 111831542               * 111861392 users:(("bash",pid=32133,fd=22))

$ strace -p 32133
Process 32133 attached
read(3, 

$ ls -la /proc/32133/fd
total 0
dr-x------ 2 pokybuild users  0 Aug 27 15:15 .
dr-xr-xr-x 9 pokybuild users  0 Aug 27 08:51 ..
lr-x------ 1 pokybuild users 64 Aug 27 15:16 0 -> /dev/null
l-wx------ 1 pokybuild users 64 Aug 27 15:16 1 -> pipe:[111501676]
l-wx------ 1 pokybuild users 64 Aug 27 15:16 2 -> pipe:[111501676]
lr-x------ 1 pokybuild users 64 Aug 27 15:16 20 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/sysroots-components/x86_64/pseudo-native/usr
lr-x------ 1 pokybuild users 64 Aug 27 15:16 21 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/pseudo
lrwx------ 1 pokybuild users 64 Aug 27 15:16 22 -> socket:[111831542]
lr-x------ 1 pokybuild users 64 Aug 27 15:16 255 -> /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/adwaita-icon-theme-3.24.0/install-sh
lr-x------ 1 pokybuild users 64 Aug 27 15:16 3 -> pipe:[111853934]

$ ls -la /proc/*/fd/ | grep 111853934

leads to process 33197

$ ps ax | grep 33197
33197 ?        SN     0:00 /bin/bash /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/adwaita-icon-theme-3.24.0/install-sh -c -m 644 ../../../adwaita-icon-theme-3.24.0/Adwaita//./32x32/actions/media-eject.png /home/pokybuild/yocto-autobuilder/yocto-worker/nightly-rpm-non-rpm/build/build/tmp/work/all-poky-linux/adwaita-icon-theme/3.24.0-r0/image/usr/share/icons/Adwaita//./32x32/actions/media-eject.png

$ strace -p 33197
Process 33197 attached
connect(22, {sa_family=AF_LOCAL, sun_path="pseudo.socket"}, 110

I then decided I couldn't really get much more from this and ran strace
against the pseudo server whilst I killed 32133. This gave:

read(1025, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\205}\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 76) = 76
read(1025, "/bin/bash\0", 10)           = 10
rt_sigaction(SIGPIPE, {0x40f420, [PIPE], SA_RESTORER|SA_RESTART, 0x7f2a17a18060}, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7f2a17a18060}, 8) = 0
write(1025, "\4\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\372\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 76) = 76
rt_sigaction(SIGPIPE, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7f2a17a18060}, {0x40f420, [PIPE], SA_RESTORER|SA_RESTART, 0x7f2a17a18060}, 8) = 0
accept(5, {sa_family=AF_LOCAL, NULL}, [2]) = 22
select(1026, [5 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 27 30 31 32 34 35 37 42 43 45 47 52 53 54 56 58 59 60 89 92 94 95 99 100 131 138 141 143 144 145 146 148 149 152 156 159 160 165 166 170 171 173 185 186 187 188 190 191 192 194 197 199 200 201 202 204 206 207 208 209 211 212 213 215 216 219 220 221 222 223 224 227 228 230 231 234 236 238 241 243 244 245 246 248 249 252 253 254 259 260 261 262 263 264 265 272 273 274 279 280 281 282 284 285 286 287 288 289 290 291 293 295 296 297 299 300 301 302 303 304 309 311 314 315 318 320 321 322 323 324 325 326 327 328 330 331 332 333 334 339 340 341 343 345 347 348 350 354 355 356 357 360 361 362 364 365 366 367 368 369 372 373 374 375 376 378 379 381 382 383 388 389 390 391 393 394 395 400 403 409 410 411 417 420 425 431 434 443 455 460 467 485 556 577 578 581 590 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 629 631 635 637 638 639 640 641 642 646 647 650 652 653 654 655 656 659 661 662 664 666 667 669 670 671 674 675 676 681 682 683 685 687 688 691 692 693 694 695 696 698 702 703 704 705 706 709 710 711 712 713 714 715 716 718 719 721 722 724 728 736 738 751 752 754 755 756 757 758 759 760 761 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 780 781 782 783 784 785 786 787 788 790 792 793 794 795 796 798 799 800 802 803 804 805 806 807 811 813 814 815 817 818 819 820 821 823 826 827 829 832 836 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 873 875 878 879 881 882 884 885 886 888 889 890 891 892 893 894 895 896 897 898 899 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 923 925 928 929 930 931 932 934 935 937 939 942 943 951 955 957 959 960 961 962 963 964 965 966 968 970 971 972 973 974 990 995 997 1005 1013 1017 1020 1022 1023 1025], [1], [5 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 27 30 31 32 34 35 37 42 43 45 47 52 53 54 56 58 59 60 89 92 94 95 99 100 131 138 141 143 144 145 146 148 149 152 156 159 160 165 166 170 171 173 185 186 187 188 190 191 192 194 197 199 200 201 202 204 206 207 208 209 211 212 213 215 216 219 220 221 222 223 224 227 228 230 231 234 236 238 241 243 244 245 246 248 249 252 253 254 259 260 261 262 263 264 265 272 273 274 279 280 281 282 284 285 286 287 288 289 290 291 293 295 296 297 299 300 301 302 303 304 309 311 314 315 318 320 321 322 323 324 325 326 327 328 330 331 332 333 334 339 340 341 343 345 347 348 350 354 355 356 357 360 361 362 364 365 366 367 368 369 372 373 374 375 376 378 379 381 382 383 388 389 390 391 393 394 395 400 403 409 410 411 417 420 425 431 434 443 455 460 467 485 556 577 578 581 590 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 629 631 635 637 638 639 640 641 642 646 647 650 652 653 654 655 656 659 661 662 664 666 667 669 670 671 674 675 676 681 682 683 685 687 688 691 692 693 694 695 696 698 702 703 704 705 706 709 710 711 712 713 714 715 716 718 719 721 722 724 728 736 738 751 752 754 755 756 757 758 759 760 761 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 780 781 782 783 784 785 786 787 788 790 792 793 794 795 796 798 799 800 802 803 804 805 806 807 811 813 814 815 817 818 819 820 821 823 826 827 829 832 836 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 873 875 878 879 881 882 884 885 886 888 889 890 891 892 893 894 895 896 897 898 899 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 923 925 928 929 930 931 932 934 935 937 939 942 943 951 955 957 959 960 961 962 963 964 965 966 968 970 971 972 973 974 990 995 997 1005 1013 1017 1020 1022 1023 1025], {2, 0}) = 31 (in [5 15 17 18 19 20 22 89 131 149 159 187 434 460 467 485 556 635 654 675 696 990 995 997 1013 1017 1020 1022 1023 1025], out [1], left {1, 999780})
read(15, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0;a\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 76) = 76
read(15, "cp\0", 3)

and it then closed the connection on 1025 shortly thereafter.

I was thinking it might have been stuck with a large pseudo request,
maybe bigger than page size but that doesn't seem to be the case.

seebs: Any idea why pseudo could end up blocking in a read()?

Cheers,

Richard
  




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-27 16:01         ` Richard Purdie
@ 2017-08-27 16:07           ` Richard Purdie
  2017-08-28 10:38             ` Alexander Kanavin
  2017-08-28 17:53             ` Khem Raj
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Purdie @ 2017-08-27 16:07 UTC (permalink / raw)
  To: Alexander Kanavin, Trevor Woerner, Alexander Kanavin, seebs
  Cc: Patches and discussions about the oe-core layer

On Sun, 2017-08-27 at 17:01 +0100, Richard Purdie wrote:
> On Wed, 2017-08-23 at 15:46 +0300, Alexander Kanavin wrote:
> > 
> > On 08/22/2017 09:46 PM, Trevor Woerner wrote:
> > > $ strace -p 43675
> Process 43675 attached
> read(1025,

And the answer is staring me in the face. select() only supports FDs up
to 1024. 1025 > 1024 which is > FD_SETSIZE.

Therefore pseudo hangs when we run into large numbers of fds :/.

https://access.redhat.com/solutions/488623

So I think we might at least understand what is breaking. It may be
using poll/epoll would work instead?

Cheers,

Richard






^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-27 16:07           ` Richard Purdie
@ 2017-08-28 10:38             ` Alexander Kanavin
  2017-08-28 14:43               ` Richard Purdie
  2017-08-28 17:53             ` Khem Raj
  1 sibling, 1 reply; 13+ messages in thread
From: Alexander Kanavin @ 2017-08-28 10:38 UTC (permalink / raw)
  To: Richard Purdie, Trevor Woerner, seebs
  Cc: Patches and discussions about the oe-core layer

On 08/27/2017 07:07 PM, Richard Purdie wrote:
>>> On 08/22/2017 09:46 PM, Trevor Woerner wrote:
>>>> $ strace -p 43675
>> Process 43675 attached
>> read(1025,
> 
> And the answer is staring me in the face. select() only supports FDs up
> to 1024. 1025 > 1024 which is > FD_SETSIZE.
> 
> Therefore pseudo hangs when we run into large numbers of fds :/.
> 
> https://access.redhat.com/solutions/488623
> 
> So I think we might at least understand what is breaking. It may be
> using poll/epoll would work instead?

What I don't understand is how exceeding the fd limit in select() leads 
to read() that never finishes. Can you clarify the sequence please?

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-28 10:38             ` Alexander Kanavin
@ 2017-08-28 14:43               ` Richard Purdie
  2017-08-28 15:14                 ` Alexander Kanavin
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Purdie @ 2017-08-28 14:43 UTC (permalink / raw)
  To: Alexander Kanavin, Trevor Woerner, seebs
  Cc: Patches and discussions about the oe-core layer

On Mon, 2017-08-28 at 13:38 +0300, Alexander Kanavin wrote:
> On 08/27/2017 07:07 PM, Richard Purdie wrote:
> > 
> > > 
> > > > 
> > > > On 08/22/2017 09:46 PM, Trevor Woerner wrote:
> > > > > 
> > > > > $ strace -p 43675
> > > Process 43675 attached
> > > read(1025,
> > And the answer is staring me in the face. select() only supports
> > FDs up
> > to 1024. 1025 > 1024 which is > FD_SETSIZE.
> > 
> > Therefore pseudo hangs when we run into large numbers of fds :/.
> > 
> > https://access.redhat.com/solutions/488623
> > 
> > So I think we might at least understand what is breaking. It may be
> > using poll/epoll would work instead?
> What I don't understand is how exceeding the fd limit in select()
> leads 
> to read() that never finishes. Can you clarify the sequence please?

The bitfield operation that says "can I read this fd?" probably
overflows into the next data item in the structure (I haven't checked
exactly how it fails). It may be "set", certainly the case I debugged
had to have thought it was and the code thinks "fd 1025 is ready for
reading" when in fact it may not be. The code then calls read() on it
and if it wasn't ready for reading, it would block.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-28 14:43               ` Richard Purdie
@ 2017-08-28 15:14                 ` Alexander Kanavin
  2017-08-28 15:27                   ` Alexander Kanavin
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Kanavin @ 2017-08-28 15:14 UTC (permalink / raw)
  To: Richard Purdie, Trevor Woerner, seebs
  Cc: Patches and discussions about the oe-core layer

On 08/28/2017 05:43 PM, Richard Purdie wrote:

> The bitfield operation that says "can I read this fd?" probably
> overflows into the next data item in the structure (I haven't checked
> exactly how it fails). It may be "set", certainly the case I debugged
> had to have thought it was and the code thinks "fd 1025 is ready for
> reading" when in fact it may not be. The code then calls read() on it
> and if it wasn't ready for reading, it would block.

Right; what I am getting at is that we might have a bug here 
(specifically the absence of check against FD_SETSIZE), rather than 
something that needs to be solved by switching to poll().

Does the code use that constant anywhere?

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-28 15:14                 ` Alexander Kanavin
@ 2017-08-28 15:27                   ` Alexander Kanavin
  2017-08-28 16:22                     ` Richard Purdie
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Kanavin @ 2017-08-28 15:27 UTC (permalink / raw)
  To: Richard Purdie, Trevor Woerner, seebs
  Cc: Patches and discussions about the oe-core layer

On 08/28/2017 06:14 PM, Alexander Kanavin wrote:
> On 08/28/2017 05:43 PM, Richard Purdie wrote:
> 
>> The bitfield operation that says "can I read this fd?" probably
>> overflows into the next data item in the structure (I haven't checked
>> exactly how it fails). It may be "set", certainly the case I debugged
>> had to have thought it was and the code thinks "fd 1025 is ready for
>> reading" when in fact it may not be. The code then calls read() on it
>> and if it wasn't ready for reading, it would block.
> 
> Right; what I am getting at is that we might have a bug here 
> (specifically the absence of check against FD_SETSIZE), rather than 
> something that needs to be solved by switching to poll().
> 
> Does the code use that constant anywhere?

Actually, wait: "Executing FD_CLR() or FD_SET() with a value of fd that 
is negative or is equal to or larger than FD_SETSIZE will result in 
undefined behavior." (man select)

This means that select() cannot be used with fd >= FD_SETSIZE at all?

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-28 15:27                   ` Alexander Kanavin
@ 2017-08-28 16:22                     ` Richard Purdie
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Purdie @ 2017-08-28 16:22 UTC (permalink / raw)
  To: Alexander Kanavin, Trevor Woerner, seebs
  Cc: Patches and discussions about the oe-core layer

On Mon, 2017-08-28 at 18:27 +0300, Alexander Kanavin wrote:
> On 08/28/2017 06:14 PM, Alexander Kanavin wrote:
> > 
> > On 08/28/2017 05:43 PM, Richard Purdie wrote:
> > 
> > > 
> > > The bitfield operation that says "can I read this fd?" probably
> > > overflows into the next data item in the structure (I haven't
> > > checked
> > > exactly how it fails). It may be "set", certainly the case I
> > > debugged
> > > had to have thought it was and the code thinks "fd 1025 is ready
> > > for
> > > reading" when in fact it may not be. The code then calls read()
> > > on it
> > > and if it wasn't ready for reading, it would block.
> > Right; what I am getting at is that we might have a bug here 
> > (specifically the absence of check against FD_SETSIZE), rather
> > than 
> > something that needs to be solved by switching to poll().
> > 
> > Does the code use that constant anywhere?
> 
>
> Actually, wait: "Executing FD_CLR() or FD_SET() with a value of fd
> that is negative or is equal to or larger than FD_SETSIZE will result
> in undefined behavior." (man select)
> 
> This means that select() cannot be used with fd >= FD_SETSIZE at all?

Correct. The code has no guards on this happening.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: long (never ending?) do_install for adwaita-icon-theme
  2017-08-27 16:07           ` Richard Purdie
  2017-08-28 10:38             ` Alexander Kanavin
@ 2017-08-28 17:53             ` Khem Raj
  1 sibling, 0 replies; 13+ messages in thread
From: Khem Raj @ 2017-08-28 17:53 UTC (permalink / raw)
  To: Richard Purdie
  Cc: Patches and discussions about the oe-core layer, Alexander Kanavin

On Sun, Aug 27, 2017 at 9:07 AM, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> On Sun, 2017-08-27 at 17:01 +0100, Richard Purdie wrote:
>> On Wed, 2017-08-23 at 15:46 +0300, Alexander Kanavin wrote:
>> >
>> > On 08/22/2017 09:46 PM, Trevor Woerner wrote:
>> > > $ strace -p 43675
>> Process 43675 attached
>> read(1025,
>
> And the answer is staring me in the face. select() only supports FDs up
> to 1024. 1025 > 1024 which is > FD_SETSIZE.
>
> Therefore pseudo hangs when we run into large numbers of fds :/.
>
> https://access.redhat.com/solutions/488623
>
> So I think we might at least understand what is breaking. It may be
> using poll/epoll would work instead?
>

I think your explanation looks plausible. I am getting build hung in
do_package_data
tasks too after few days on box ( make -j20 and 44 tasks in parallel).
Then I reboot the box resume the build and it works
again.

> Cheers,
>
> Richard
>
>
>
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-28 17:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22 17:32 long (never ending?) do_install for adwaita-icon-theme Trevor Woerner
2017-08-22 17:40 ` Richard Purdie
2017-08-22 18:01   ` Alexander Kanavin
2017-08-22 18:46     ` Trevor Woerner
     [not found]       ` <e165f760-1c1e-1a5f-2fba-f8c293a16278@intel.com>
2017-08-23 12:48         ` Alexander Kanavin
2017-08-27 16:01         ` Richard Purdie
2017-08-27 16:07           ` Richard Purdie
2017-08-28 10:38             ` Alexander Kanavin
2017-08-28 14:43               ` Richard Purdie
2017-08-28 15:14                 ` Alexander Kanavin
2017-08-28 15:27                   ` Alexander Kanavin
2017-08-28 16:22                     ` Richard Purdie
2017-08-28 17:53             ` Khem Raj

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.