All of lore.kernel.org
 help / color / mirror / Atom feed
* Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
@ 2012-01-11 10:18 Borzenkov, Andrey
  2012-01-11 10:36 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Borzenkov, Andrey @ 2012-01-11 10:18 UTC (permalink / raw)
  To: linux-kernel

I try to understand whether what I see is a bug in kernel (may be in accounting) or some other problem.

Server with 1TB memory (slightly less due to 2 DIMMs disabled) running SLES11 SP1:

Linux rx900-01 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64 x86_64 x86_64 GNU/Linux

Server runs Oracle database and SAP central instance. Oracle SGA is ~500GB; there are over 2000 oracle client processes (connection from dialog instances).

Second time server experienced slowdown. CPU goes near to 100% system; in SAR statistic from yesterday

00:00:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
14:20:01        all      8.37      0.00      1.50      8.32      0.00     81.82
14:30:01        all      9.76      0.00     11.91      9.53      0.00     68.80
14:40:02        all      7.73      0.00     14.94      8.35      0.00     68.98
14:50:01        all      4.46      0.00     64.24      4.71      0.00     26.60
15:04:10        all      3.92      0.00     71.64      3.91      0.00     20.53
15:14:06        all      4.26      0.06     73.70      3.75      0.00     18.22
15:21:43        all      5.80      0.00     58.29      6.39      0.00     29.51
15:33:13        all      0.57      0.00     98.44      0.22      0.00      0.77
15:40:01        all      2.11      0.00     92.75      1.38      0.00      3.76
15:53:05        all      4.65      0.00     67.29      4.62      0.00     23.43
16:00:02        all      0.22      0.00     99.73      0.01      0.00      0.03
16:10:01        all      6.77      0.00     62.23      5.45      0.00     25.55
16:22:36        all      1.09      0.00     96.75      0.60      0.00      1.56
16:35:00        all      1.00      0.00     98.32      0.23      0.00      0.46


14:00:01     70890.08  32610.94 293683.34      0.66 302353.74   1511.70      0.00    553.15     36.59
14:10:01     62785.00  41756.85 248404.91      1.39 262783.25    526.27      0.00     13.80      2.62
14:20:01     45202.79  14421.25 247825.24      0.57 263555.49      0.00      0.00      0.00      0.00
14:30:01     55258.89  19961.76 320001.06      6.67 292015.15   4642.89    421.42   1939.95     38.31
14:40:02     39944.66  13820.21 265597.61     18.52 225282.13   3609.07    827.76    983.08     22.16
14:40:02     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
14:50:01     32186.97   6173.29 290924.86     18.14 159640.32    320.29   1047.12    357.74     26.16
15:04:10     21821.44   5088.61 204284.90      8.49 136538.28    167.06   1166.24    320.86     24.06
15:14:06     26446.08  10471.94 210644.97     13.09 134230.69    704.49   1810.80    790.93     31.44
15:21:43     39126.74   7556.64 342544.63     31.48 180565.12    285.73   1354.19    451.70     27.54
15:33:13      2796.56   2909.25  47558.06      8.24  22382.27     88.62   2052.66    531.01     24.80
15:40:01      6200.17   5969.42 161088.56     21.97  42077.69    120.03   1832.63    438.99     22.48
15:53:05     23803.80   9406.65 258179.22     60.35 118982.69    211.06   1679.50    512.08     27.09
16:00:02       728.41   3156.68  16022.08      3.30   6382.78     88.78   2142.13    653.90     29.31
16:10:01     27009.53   8949.01 330194.26    142.67 126540.73    209.47   1883.92    735.54     35.14
16:22:36      2546.53   4279.80  64544.54     14.25  17826.19    148.21   2405.98    840.27     32.90
16:35:00      2038.20   4231.65  61680.28     19.28  19653.35    114.72   2416.81    936.73     37.00


Just got info about the same situation; looking at /proc/meminfo:

MemTotal:       992606568 kB
MemFree:          209064 kB
Buffers:            8144 kB
Cached:         435401824 kB
SwapCached:      1098968 kB
Active:         440527496 kB
Inactive:       13550356 kB
Active(anon):   440436992 kB
Inactive(anon): 13467304 kB
Active(file):      90504 kB
Inactive(file):    83052 kB
Unevictable:         124 kB
Mlocked:               0 kB
SwapTotal:      292421588 kB
SwapFree:       290932124 kB
Dirty:                32 kB
Writeback:            32 kB
AnonPages:      17599492 kB
Mapped:         434635928 kB
Shmem:          435236256 kB
Slab:            2423828 kB
SReclaimable:    1957952 kB
SUnreclaim:       465876 kB
KernelStack:       39536 kB
PageTables:     519555856 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    788724872 kB
Committed_AS:   514649400 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     3662244 kB
VmallocChunk:   33484035568 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       22528 kB
DirectMap2M:     2058240 kB
DirectMap1G:    1004535808 kB


What can be the reason for system consuming half of physical memory for page tables?


---
With best regards

Andrey Borzenkov


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 10:18 Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1) Borzenkov, Andrey
@ 2012-01-11 10:36 ` Eric Dumazet
  2012-01-11 10:52   ` Borzenkov, Andrey
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-01-11 10:36 UTC (permalink / raw)
  To: Borzenkov, Andrey; +Cc: linux-kernel

Le mercredi 11 janvier 2012 à 11:18 +0100, Borzenkov, Andrey a écrit :

> What can be the reason for system consuming half of physical memory for page tables?
> 

Why dont you use hugepages to map oracle SGA ?

If not, its normal to eat so much memory for page tables

grep VmPTE /proc/*/status




^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 10:36 ` Eric Dumazet
@ 2012-01-11 10:52   ` Borzenkov, Andrey
  2012-01-11 11:19     ` Eric Dumazet
  2012-01-11 11:26     ` Avi Kivity
  0 siblings, 2 replies; 8+ messages in thread
From: Borzenkov, Andrey @ 2012-01-11 10:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1125 bytes --]



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, January 11, 2012 2:37 PM
> To: Borzenkov, Andrey
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: Extermeley large PageTables over 500G on kernel 2.6.32.49
> (SLES11 SP1)
> 
> Le mercredi 11 janvier 2012 à 11:18 +0100, Borzenkov, Andrey a écrit :
> 
> > What can be the reason for system consuming half of physical memory for
> page tables?
> >
> 
> Why dont you use hugepages to map oracle SGA ?
> 

Colleagues responsible for Oracle are reconfiguring it now.

> If not, its normal to eat so much memory for page tables
> 
> grep VmPTE /proc/*/status
> 
> 

Forgive my ignorance. I thought that

1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
2. All processes sharing the same shared memory share the same page table

So page table for Oracle SGA 500G would be around 1G and shared by all Oracle clients. Is my assumption incorrect?
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 10:52   ` Borzenkov, Andrey
@ 2012-01-11 11:19     ` Eric Dumazet
  2012-01-11 11:23       ` Borzenkov, Andrey
  2012-01-11 11:26     ` Avi Kivity
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2012-01-11 11:19 UTC (permalink / raw)
  To: Borzenkov, Andrey; +Cc: linux-kernel

Le mercredi 11 janvier 2012 à 11:52 +0100, Borzenkov, Andrey a écrit :

> Colleagues responsible for Oracle are reconfiguring it now.
> 

OK. Since your hardware supports 1GB hugepages, you might try to use
them as well. Not sure if your kernel is recent enough...

cat /proc/cmdline 
ro root=LABEL=/ hugepagesz=1GB hugepages=512 

> > If not, its normal to eat so much memory for page tables
> > 
> > grep VmPTE /proc/*/status
> > 
> > 
> 
> Forgive my ignorance. I thought that
> 
> 1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
> 2. All processes sharing the same shared memory share the same page table

It depends how oracle maps its SGA.

It probably uses a method disallowing page table sharing.

Anyway, hugetables for this kind of workload is a must.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 11:19     ` Eric Dumazet
@ 2012-01-11 11:23       ` Borzenkov, Andrey
  0 siblings, 0 replies; 8+ messages in thread
From: Borzenkov, Andrey @ 2012-01-11 11:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1408 bytes --]

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, January 11, 2012 3:20 PM
> To: Borzenkov, Andrey
> Cc: linux-kernel@vger.kernel.org
> Subject: RE: Extermeley large PageTables over 500G on kernel 2.6.32.49
> (SLES11 SP1)
> 
> Le mercredi 11 janvier 2012 à 11:52 +0100, Borzenkov, Andrey a écrit :
> 
> > Colleagues responsible for Oracle are reconfiguring it now.
> >
> 
> OK. Since your hardware supports 1GB hugepages, you might try to use
> them as well. Not sure if your kernel is recent enough...
> 
> cat /proc/cmdline
> ro root=LABEL=/ hugepagesz=1GB hugepages=512
> 

It says 2M so it probably is the limit.

> > > If not, its normal to eat so much memory for page tables
> > >
> > > grep VmPTE /proc/*/status
> > >
> > >
> >
> > Forgive my ignorance. I thought that
> >
> > 1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
> > 2. All processes sharing the same shared memory share the same page
> table
> 
> It depends how oracle maps its SGA.
> 
> It probably uses a method disallowing page table sharing.

Do you have any reference to description of these methods so we can check?

Thank you!

> 
> Anyway, hugetables for this kind of workload is a must.
> 


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 10:52   ` Borzenkov, Andrey
  2012-01-11 11:19     ` Eric Dumazet
@ 2012-01-11 11:26     ` Avi Kivity
  2012-01-13  5:20       ` Borzenkov, Andrey
  1 sibling, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2012-01-11 11:26 UTC (permalink / raw)
  To: Borzenkov, Andrey; +Cc: Eric Dumazet, linux-kernel

On 01/11/2012 12:52 PM, Borzenkov, Andrey wrote:
> > If not, its normal to eat so much memory for page tables
> > 
> > grep VmPTE /proc/*/status
> > 
> > 
>
> Forgive my ignorance. I thought that
>
> 1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
> 2. All processes sharing the same shared memory share the same page table
>
> So page table for Oracle SGA 500G would be around 1G and shared by all Oracle clients. Is my assumption incorrect?
>

The second assumption is incorrect.  So fully populated the 2000
processes would consume 2T; they just haven't accessed all the SGA yet.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-11 11:26     ` Avi Kivity
@ 2012-01-13  5:20       ` Borzenkov, Andrey
  2012-01-16 15:43         ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Borzenkov, Andrey @ 2012-01-13  5:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Dumazet, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1187 bytes --]

> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com]
> Sent: Wednesday, January 11, 2012 3:27 PM
> To: Borzenkov, Andrey
> Cc: Eric Dumazet; linux-kernel@vger.kernel.org
> Subject: Re: Extermeley large PageTables over 500G on kernel 2.6.32.49
> (SLES11 SP1)
> 
> On 01/11/2012 12:52 PM, Borzenkov, Andrey wrote:
> > > If not, its normal to eat so much memory for page tables
> > >
> > > grep VmPTE /proc/*/status
> > >
> > >
> >
> > Forgive my ignorance. I thought that
> >
> > 1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
> > 2. All processes sharing the same shared memory share the same page
> table
> >
> > So page table for Oracle SGA 500G would be around 1G and shared by all
> Oracle clients. Is my assumption incorrect?
> >
> 
> The second assumption is incorrect.  So fully populated the 2000
> processes would consume 2T; they just haven't accessed all the SGA yet.
> 

Does kernel ever free allocated page tables, or once allocated they stick for the process lifetime? 
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1)
  2012-01-13  5:20       ` Borzenkov, Andrey
@ 2012-01-16 15:43         ` Avi Kivity
  0 siblings, 0 replies; 8+ messages in thread
From: Avi Kivity @ 2012-01-16 15:43 UTC (permalink / raw)
  To: Borzenkov, Andrey; +Cc: Eric Dumazet, linux-kernel

On 01/13/2012 07:20 AM, Borzenkov, Andrey wrote:
> > > 1. PTE is 8 bytes per page, which is 4K which gives 2K per 1M of memory
> > > 2. All processes sharing the same shared memory share the same page
> > table
> > >
> > > So page table for Oracle SGA 500G would be around 1G and shared by all
> > Oracle clients. Is my assumption incorrect?
> > >
> > 
> > The second assumption is incorrect.  So fully populated the 2000
> > processes would consume 2T; they just haven't accessed all the SGA yet.
> > 
>
> Does kernel ever free allocated page tables, or once allocated they stick for the process lifetime? 

munmap() will free them, but that doesn't apply here.  Linux doesn't
swap page tables.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-01-16 15:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-11 10:18 Extermeley large PageTables over 500G on kernel 2.6.32.49 (SLES11 SP1) Borzenkov, Andrey
2012-01-11 10:36 ` Eric Dumazet
2012-01-11 10:52   ` Borzenkov, Andrey
2012-01-11 11:19     ` Eric Dumazet
2012-01-11 11:23       ` Borzenkov, Andrey
2012-01-11 11:26     ` Avi Kivity
2012-01-13  5:20       ` Borzenkov, Andrey
2012-01-16 15:43         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.