Re: [Bug 205937] New: BUG: unable to handle page fault for address: f3170000

* Re: [Bug 205937] New: BUG: unable to handle page fault for address: f3170000
       [not found] <bug-205937-27@https.bugzilla.kernel.org/>
@ 2019-12-23 22:14 ` Andrew Morton
  2019-12-24  3:45   ` Dennis Clarke
  2019-12-26 21:41   ` Christopher Lameter
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2019-12-23 22:14 UTC (permalink / raw)
  To: dclarke
  Cc: bugzilla-daemon, penberg, Christopher Lameter, David Rientjes,
	Joonsoo Kim, linux-mm, Qian Cai

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Thanks.

On Sat, 21 Dec 2019 03:08:17 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=205937
> 
>             Bug ID: 205937
>            Summary: BUG: unable to handle page fault for address: f3170000
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.5-rc2
>           Hardware: i386
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: dclarke@blastwave.org
>         Regression: No

"yes"

Looks like the asynchronous sysfs file removal code is failing. 
sysfs_slab_remove_workfn().

Guys, did we make recent changes in this area?

> Created attachment 286393
>   --> https://bugzilla.kernel.org/attachment.cgi?id=286393&action=edit
> kernel config for 5.5.0-rc2
> 
> Testing a system under excessive memory pressure with some trivial
> code wherein a set of 16 pthreads are dispatched and each merely fills
> an array : 
> 
> 
> void *big_array_fill(void *recv_parm)
> {
>     thread_parm_t *p = (thread_parm_t *)recv_parm;
> 
>     printf("TRD  : %d filling the big_array.\n", p->tnum);
>     for ( p->loop0 = 0; p->loop0 < BIG_ARRAY_DIM0; p->loop0++ ) {
>         for ( p->loop1 = 0; p->loop1 < BIG_ARRAY_DIM1; p->loop1++ ) {
>             p->big_array[p->loop0][p->loop1] = (uint64_t)(p->loop0 * p->loop1);
>         }
>     }
>     printf("TRD  : %d big_array full.\n", p->tnum);
> 
>     /* return some random data */
>     p->ret_val = drand48();
> 
>     return (NULL);
> }
> 
> The received parameters for each thread were in a struct thus : 
> 
> titan$ cat p0.h 
> 
> #define NUM_THREADS 16
> #define BIG_ARRAY_DIM0 384
> #define BIG_ARRAY_DIM1 65536
> 
> /*
>  * struct to pass parameters to a dispatched thread
>  */
> typedef struct {
>   uint32_t tnum;     /* thread number */
>   int sleep_time, loop0, loop1;
>   double ret_val;    /* some sort of a return data value */
>   uint64_t big_array[BIG_ARRAY_DIM0][BIG_ARRAY_DIM1]; /* memory abuse */
> } thread_parm_t;
> 
> These threads were fired of as a test while doing a teaching demo :
> 
>     printf("\n-------------- begin dispatch -----------------------\n");
>     for ( i = 0; i < NUM_THREADS; i++) {
>         parm[i] = calloc( (size_t) 1 , (size_t) sizeof(thread_parm_t) );
> 
>         if ( parm[i] == NULL ) {
>             if ( errno == ENOMEM ) {
>                 fprintf(stderr,"FAIL : calloc returns ENOMEM at %s:%d\n",
>                         __FILE__, __LINE__ );
>             } else {
>                 fprintf(stderr,"FAIL : calloc fails at %s:%d\n",
>                         __FILE__, __LINE__ );
>             }
>             perror("FAIL ");
>             /* gee .. before we bail out did we allocate any of the
>              * previous thread parameter memory regions? If so then
>              * clean up before bailing out. In fact we may have 
>              * already dispatched out threads. */
> 
>             if (i == 0 ) return ( EXIT_FAILURE );
> 
>             for ( j = 0; j < i; j++ ) {
>                 /* lets ask those threads to just be nice and 
>                  * we call them in with a join */
>                 pthread_join(tid[j], NULL);
>                 fprintf(stderr,"BAIL : pthread_join(%i) done.\n", j);
>                 free(parm[j]);
>                 parm[j] = NULL;
>             }
>             fprintf(stderr,"BAIL : cleanup done.\n", j);
>             ru();
> 
>             return ( EXIT_FAILURE );
> 
>         }
> 
>         parm[i]->tnum = i;
>         parm[i]->sleep_time = 1 + (int)( drand48() * 10.0 );
> 
>         pthread_create( &tid[i], NULL, big_array_fill, (void *)parm[i] );
> 
>         printf("INFO : pthread_create %2i called for %2i secs.\n",
>                                                i, parm[i]->sleep_time );
>     }
>     printf("\n-------------- end dispatch -------------------------\n");
> 
> 
> All very nice and does what it does on most systems and even with a very
> old and slow pentium II with very little memory we see everything just
> works fine so long as there is some swap. 
> 
> However on linux 5.5-rc2 I see this a warning that the CPU is busy and
> that is fine however the process seems to merely get "stuck" for lack
> of a better word. A kill -HUP on the pid has no effect. A kill -9 also
> seems to have no effect. A kill -9 of the PPID merelu shifts the new 
> parent to be number 1 and I see a zombie that won't go away. 
> 
> esther# 
> esther# ps -efl | grep -E "UID|dclarke|init"
> F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
> 4 S root         1     0  0  80   0 -  9079 do_epo Dec20 ?        00:01:02
> /sbin/init verbose
> 4 S dclarke    382     1  0  80   0 -  4320 do_epo Dec20 ?        00:00:03
> /lib/systemd/systemd --user
> 5 S dclarke    384   382  0  80   0 -  9424 do_sig Dec20 ?        00:00:00
> (sd-pam)
> 0 Z dclarke    914     1  3  95  15 -     0 -      01:13 ?        00:03:13 [p0]
> <defunct>
> 4 S root       959   338  0  80   0 -  3256 poll_s 01:55 ?        00:00:01
> sshd: dclarke [priv]
> 5 S dclarke    965   959  0  80   0 -  3256 poll_s 01:55 ?        00:00:03
> sshd: dclarke@pts/2
> 0 S dclarke    966   965  0  80   0 -  2458 do_wai 01:55 pts/2    00:00:02
> -bash
> 0 S root      1188  1107  6  80   0 -  1958 pipe_r 02:57 pts/2    00:00:00 grep
> -E UID|dclarke|init
> esther# 
> 
> Looking in /proc I see : 
> 
> esther#                                                                         
> esther# cat /proc/914/status                                                    
> Name:   p0                                                                      
> State:  Z (zombie)                                                              
> Tgid:   914                                                                     
> Ngid:   0                                                                       
> Pid:    914                                                                     
> PPid:   1                                                                       
> TracerPid:      0                                                               
> Uid:    16411   16411   16411   16411                                           
> Gid:    20002   20002   20002   20002                                           
> FDSize: 0                                                                       
> Groups: 20002                                                                   
> NStgid: 914                                                                     
> NSpid:  914                                                                     
> NSpgid: 913                                                                     
> NSsid:  398                                                                     
> Threads:        2                                                               
> SigQ:   2/7323                                                                  
> SigPnd: 0000000000000000                                                        
> ShdPnd: 0000000000000103                                                        
> SigBlk: 0000000000000000                                                        
> SigIgn: 0000000000000000                                                        
> SigCgt: 0000000180000000                                                        
> CapInh: 0000000000000000                                                        
> CapPrm: 0000000000000000                                                        
> CapEff: 0000000000000000                                                        
> CapBnd: 0000003fffffffff                                                        
> CapAmb: 0000000000000000                                                        
> NoNewPrivs:     0                                                               
> Seccomp:        0                                                               
> Speculation_Store_Bypass:       vulnerable                                      
> Cpus_allowed:   1                                                               
> Cpus_allowed_list:      0                                                       
> voluntary_ctxt_switches:        13                                              
> nonvoluntary_ctxt_switches:     74                                              
> esther# 
> 
> However dmesg reveals far more information : 
> 
> .
> .
> .
> [44540.046308] kobject: '(null)' (5fcda702): kobject_cleanup, parent 2a0c29d5   
> [44540.060815] kobject: '(null)' (5fcda702): calling ktype release              
> [44540.230679] kobject: '(null)' (0cf40105): kobject_cleanup, parent 2a0c29d5   
> [44540.244669] kobject: '(null)' (0cf40105): calling ktype release              
> [44540.430165] kobject: '(null)' (1eed3f2a): kobject_cleanup, parent 2a0c29d5   
> [44540.444359] kobject: '(null)' (1eed3f2a): calling ktype release              
> [44540.612080] kobject: '(null)' (b9893805): kobject_cleanup, parent 2a0c29d5   
> [44540.625521] kobject: '(null)' (b9893805): calling ktype release              
> [44540.777358] kobject: '(null)' (6e8d4424): kobject_cleanup, parent 2a0c29d5   
> [44540.792340] kobject: '(null)' (6e8d4424): calling ktype release              
> [44540.902623] kobject: '(null)' (07ba38b5): kobject_cleanup, parent 2a0c29d5   
> [44540.916637] kobject: '(null)' (07ba38b5): calling ktype release              
> [44545.033382] kobject: '(null)' (dbf42766): kobject_cleanup, parent 2a0c29d5   
> [44545.048144] kobject: '(null)' (dbf42766): calling ktype release              
> [44545.242257] kobject: '(null)' (e64a3d73): kobject_cleanup, parent 2a0c29d5   
> [44545.255661] kobject: '(null)' (e64a3d73): calling ktype release              
> [44545.402036] kobject: '(null)' (e43ef4d7): kobject_cleanup, parent 2a0c29d5   
> [44545.415573] kobject: '(null)' (e43ef4d7): calling ktype release              
> [44545.566126] kobject: '(null)' (2c27ba6b): kobject_cleanup, parent 2a0c29d5   
> [44545.579740] kobject: '(null)' (2c27ba6b): calling ktype release              
> [44546.186101] kobject: '(null)' (da4ac031): kobject_cleanup, parent 2a0c29d5   
> [44546.188957] BUG: unable to handle page fault for address: f3170000           
> [44546.188965] #PF: supervisor read access in kernel mode                       
> [44546.188973] #PF: error_code(0x0000) - not-present page                       
> [44546.188979] *pde = 36f4a067 *pte = 33170060                                  
> [44546.188995] Oops: 0000 [#1] DEBUG_PAGEALLOC                                  
> [44546.189004] CPU: 0 PID: 680 Comm: kworker/0:1 Not tainted 5.5.0-rc2-genunix
> #1                                       
> [44546.189072] Hardware name:  /CN700-8237, BIOS 6.00 PG 11/13/2006             
> [44546.189079] Workqueue: events sysfs_slab_remove_workfn                       
> [44546.189090] EIP: hw_bitblt_1+0x240/0x310 [viafb]                             
> [44546.189108] Code: 08 80 fa 02 0f 84 d8 00 00 00 0f b6 55 ec c0 ea 03 0f b6
> d2 0f af ca 83 c1 03 c1 e9 02             74 17 81 c3 00 00 20 00 8d 74 26 00
> 90 <8b> 14 87 89 13 83 c0 01 39 c8 72 f4 8d 65 f4 31 c0 5b 5e 5f 5d c3          
> [44546.189116] EAX: 00000994 EBX: f8600000 ECX: 000009a0 EDX: 00000000          
> [44546.189124] ESI: 00000002 EDI: f316d9b0 EBP: eb98bc70 ESP: eb98bc50          
> [44546.189193] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010083    
> [44546.189201] CR0: 80050033 CR2: f3170000 CR3: 2b25b000 CR4: 00000690          
> [44546.189206] Call Trace:                                                      
> [44546.189213]  ? hw_bitblt_2+0x2b0/0x2b0 [viafb]                               
> [44546.189219]  viafb_imageblit+0x90/0xf0 [viafb]                               
> [44546.189225]  bit_putcs+0x215/0x430                                           
> [44546.189231]  ? bit_clear+0x120/0x120                                         
> [44546.189236]  fbcon_putcs+0xcb/0xe0                                           
> [44546.189242]  ? bit_clear+0x120/0x120                                         
> [44546.189248]  ? fb_flashcursor+0x100/0x100                                    
> [44546.189315]  vt_console_print+0x353/0x400                                    
> [44546.189321]  ? insert_char+0xd0/0xd0                                         
> [44546.189327]  console_unlock+0x35e/0x4e0                                      
> [44546.189333]  vprintk_emit+0x23a/0x2f0                                        
> [44546.189339]  vprintk_default+0x17/0x20                                       
> [44546.189345]  vprintk_func+0x36/0xb7                                          
> [44546.189350]  printk+0x13/0x15                                                
> [44546.189356]  __dynamic_pr_debug+0x46/0x70                                    
> [44546.189363]  ? __lock_acquire.isra.0+0xfe/0x4e0                              
> [44546.189369]  kobject_put+0x7b/0x190                                          
> [44546.189376]  sysfs_slab_remove_workfn+0x30/0x40                              
> [44546.189382]  process_one_work+0x1e4/0x3c0                                    
> [44546.189388]  worker_thread+0x14e/0x3b0                                       
> [44546.189395]  ? process_one_work+0x3c0/0x3c0                                  
> [44546.189401]  kthread+0xdb/0x110                                              
> [44546.189407]  ? process_one_work+0x3c0/0x3c0                                  
> [44546.189414]  ? kthread_create_on_node+0x20/0x20                              
> [44546.189419]  ret_from_fork+0x2e/0x38                                         
> [44546.189424] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops
> videobuf2_v4l2 videobuf2_comm            on videodev mc evdev padlock_sha
> padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti    
>        mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr
> i2c_viapro sg i2c_algo_bit acpi_cpufreq bu            tton ip_tables x_tables
> autofs4 sd_mod ata_generic fan                                                  
> [44546.189481] CR2: 00000000f3170000                                            
> [44546.189481] ---[ end trace 5d021d89c9f5c08d ]---                             
> [44546.189481] EIP: hw_bitblt_1+0x240/0x310 [viafb]                             
> [44546.189481] Code: 08 80 fa 02 0f 84 d8 00 00 00 0f b6 55 ec c0 ea 03 0f b6
> d2 0f af ca 83 c1 03 c1 e9 02             74 17 81 c3 00 00 20 00 8d 74 26 00
> 90 <8b> 14 87 89 13 83 c0 01 39 c8 72 f4 8d 65 f4 31 c0 5b 5e 5f 5d c3          
> [44546.189481] EAX: 00000994 EBX: f8600000 ECX: 000009a0 EDX: 00000000          
> [44546.189481] ESI: 00000002 EDI: f316d9b0 EBP: eb98bc70 ESP: eb98bc50          
> [44546.189481] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010083    
> [44546.189481] CR0: 80050033 CR2: f3170000 CR3: 2b25b000 CR4: 00000690          
> [44571.433760] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [p0:918]       
> [44571.433768] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops
> videobuf2_v4l2 videobuf2_comm            on videodev mc evdev padlock_sha
> padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti    
>        mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr
> i2c_viapro sg i2c_algo_bit acpi_cpufreq bu            tton ip_tables x_tables
> autofs4 sd_mod ata_generic fan                                                  
> [44571.434034] CPU: 0 PID: 918 Comm: p0 Tainted: G      D          
> 5.5.0-rc2-genunix #1                                
> [44571.434042] Hardware name:  /CN700-8237, BIOS 6.00 PG 11/13/2006             
> [44571.434047] EIP: 0x437636                                                    
> [44571.434066] Code: 83 c4 10 8b 45 e4 c7 40 08 00 00 00 00 eb 6a 8b 45 e4 c7
> 40 0c 00 00 00 00 eb 42 8b 45             e4 8b 50 08 8b 45 e4 8b 40 0c 0f af
> c2 <8b> 55 e4 8b 7a 08 8b 55 e4 8b 72 0c 89 c2 c1 fa 1f 8b 4d e4 c1 e7          
> [44571.434074] EAX: 003f2551 EBX: 0043a000 ECX: 9652a010 EDX: 00000079          
> [44571.434082] ESI: 0079859a EDI: 00790000 EBP: 96529358 ESP: 96529330          
> [44571.434151] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296    
> [44599.433696] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [htop:893]     
> [44599.433765] Modules linked in: via_camera videobuf2_dma_sg videobuf2_memops
> videobuf2_v4l2 videobuf2_comm            on videodev mc evdev padlock_sha
> padlock_aes snd_pcm uhci_hcd via_cputemp ehci_pci hwmon_vid ehci_hcd snd_ti    
>        mer via_rng viafb snd rng_core usbcore soundcore serio_raw pcspkr
> i2c_viapro sg i2c_algo_bit acpi_cpufreq bu            tton ip_tables x_tables
> autofs4 sd_mod ata_generic fan                                                  
> [44599.434032] CPU: 0 PID: 893 Comm: htop Tainted: G      D      L   
> 5.5.0-rc2-genunix #1                              
> [44599.434040] Hardware name:  /CN700-8237, BIOS 6.00 PG 11/13/2006             
> [44599.434045] EIP: 0xb7f564a7                                                  
> [44599.434063] Code: 24 04 89 1a 89 6a 08 89 42 04 8b 44 24 0c 89 4a 18 89 42
> 0c 8b 44 24 10 89 42 10 8b 44             24 08 89 42 14 83 c4 18 89 d0 5b 5e
> 5f <5d> c2 04 00 8d 74 26 00 90 f7 c7 00 ff 00 00 74 10 c6 44 24 17 00          
> [44599.434132] EAX: bfa3e8a0 EBX: b7f8dafc ECX: 00000000 EDX: bfa3e8a0          
> [44599.434140] ESI: bfa4146c EDI: 000004b4 EBP: 00000000 ESP: bfa3e828          
> [44599.434148] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000282    
> esther# 
> 
> Not sure what other information to include however : 
> 
> esther# 
> esther# cat /proc/version 
> Linux version 5.5.0-rc2-genunix (root@esther) (gcc version 9.2.1 20191130
> (Debian 9.2.1-21)) #1 Tue Dec 17 01:57:17 UTC 2019
> esther# 
> esther# cat /proc/cpuinfo 
> processor       : 0
> vendor_id       : CentaurHauls
> cpu family      : 6
> model           : 10
> model name      : VIA Esther processor 1200MHz
> stepping        : 9
> cpu MHz         : 400.000
> cache size      : 128 KB
> fdiv_bug        : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat
> clflush acpi mmx fxsr sse sse2 tm nx cpuid pni est tm2 rng rng_en ace ace_en
> ace2 ace2_en phe phe_en pmm pmm_en
> bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
> swapgs itlb_multihit
> bogomips        : 800.02
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 32 bits virtual
> power management:
> 
> esther# 
> esther# cat /proc/meminfo 
> MemTotal:         937412 kB
> MemFree:           70200 kB
> MemAvailable:      31728 kB
> Buffers:           11400 kB
> Cached:            43532 kB
> SwapCached:        55872 kB
> Active:           385352 kB
> Inactive:         400988 kB
> Active(anon):     352888 kB
> Inactive(anon):   379860 kB
> Active(file):      32464 kB
> Inactive(file):    21128 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> HighTotal:         76680 kB
> HighFree:           1552 kB
> LowTotal:         860732 kB
> LowFree:           68648 kB
> SwapTotal:      31250428 kB
> SwapFree:       29862396 kB
> Dirty:                16 kB
> Writeback:             0 kB
> AnonPages:        676316 kB
> Mapped:            16560 kB
> Shmem:              1340 kB
> KReclaimable:      13748 kB
> Slab:              54152 kB
> SReclaimable:      13748 kB
> SUnreclaim:        40404 kB
> KernelStack:         632 kB
> PageTables:         2932 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    31719132 kB
> Committed_AS:    2333036 kB
> VmallocTotal:     122880 kB
> VmallocUsed:       11532 kB
> VmallocChunk:          0 kB
> Percpu:              192 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> FileHugePages:         0 kB
> FilePmdMapped:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       4096 kB
> Hugetlb:               0 kB
> DirectMap4k:      905208 kB
> DirectMap4M:           0 kB
> esther# 
> esther# swapon
> NAME      TYPE       SIZE USED PRIO
> /dev/sda2 partition 29.8G 1.3G   -2
> esther# 
> 
> Also I will attach the kernel config from /boot for 5.5.0-rc2-genunix.
> 
> 
> -- 
> Dennis Clarke
> RISC-V/SPARC/PPC/ARM/CISC
> UNIX and Linux spoken
> GreyBeard and suspenders optional
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread