On 3/6/20 3:12 PM, Mike Kravetz wrote: > On 3/5/20 10:36 PM, Longpeng (Mike) wrote: >> 在 2020/3/6 8:09, Mike Kravetz 写道: >>> On 3/4/20 7:30 PM, Longpeng(Mike) wrote: >>>> From: Longpeng >>> I am thinking we may want to have a more generic solution by allowing >>> the default_hugepagesz= processing code to verify the passed size and >>> set up the corresponding hstate. This would require more cooperation >>> between architecture specific and independent code. This could be >>> accomplished with a simple arch_hugetlb_valid_size() routine provided >>> by the architectures. Below is an untested patch to add such support >>> to the architecture independent code and x86. Other architectures would >>> be similar. >>> >>> In addition, with architectures providing arch_hugetlb_valid_size() it >>> should be possible to have a common routine in architecture independent >>> code to read/process hugepagesz= command line arguments. >>> >> I just want to use the minimize changes to address this issue, so I choosed a >> way which my patch did. >> >> To be honest, the approach you suggested above is much better though it need >> more changes. >> >>> Of course, another approach would be to simply require ALL architectures >>> to set up hstates for ALL supported huge page sizes. >>> >> I think this is also needed, then we can request all supported size of hugepages >> by sysfs(e.g. /sys/kernel/mm/hugepages/*) dynamically. Currently, (x86) we can >> only request 1G-hugepage through sysfs if we boot with 'default_hugepagesz=1G', >> even with the first approach. > I 'think' you can use sysfs for 1G huge pages on x86 today. Just booted a > system without any hugepage options on the command line. > > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > 0 > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/^Cugepages > # echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > 1 > # cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages > 1 > > x86 and riscv will set up hstates for PUD_SIZE hstates by default if > CONFIG_CONTIG_ALLOC. This is because of a somewhat recent feature that > allowed dynamic allocation of gigantic (page order >= MAX_ORDER) pages. > Before that feature, it made no sense to set up an hstate for gigantic > pages if they were not allocated at boot time and could not be dynamically > added later. > > I'll code up a proposal that does the following: > - Have arch specific code provide a list of supported huge page sizes > - Arch independent code uses list to create all hstates > - Move processing of "hugepagesz=" to arch independent code > - Validate "default_hugepagesz=" when value is read from command line > > It make take a few days. When ready, I will pull in the architecture > specific people. Hi Mike, On platforms that support multiple huge page sizes when 'hugepagesz' is not specified before 'hugepages=', hugepages are not allocated. (For example if we are requesting 1GB hugepages) In terms of reporting meminfo and /sys/kernel/../nr_hugepages reports the expected results but if we use sysctl vm.nr_hugepages then it reports a non-zero value as it reads the max_huge_pages from the default hstate instead of nr_huge_pages. AFAIK nr_huge_pages is the one that indicates the number of huge pages that are successfully allocated. Does vm.nr_hugepages is expected to report the maximum number of hugepages? If so, will it not make sense to rename the procname? However, if we expect nr_hugepages to report the number of successfully allocated hugepages then we should use nr_huge_pages in hugetlb_sysctl_handler_common(). > >> BTW, because it's not easy to discuss with you due to the time difference, I >> have another question about the default hugepages to consult you here. Why the >> /proc/meminfo only show the info about the default hugepages, but not others? >> meminfo is more well know than sysfs, some ordinary users know meminfo but don't >> know use the sysfs to get the hugepages status(e.g. total, free). > I believe that is simply history. In the beginning there was only the > default huge page size and that was added to meminfo. People then wrote > scripts to parse huge page information in meminfo. When support for > other huge pages was added, it was not added to meminfo as it could break > user scripts parsing the file. Adding information for all potential > huge page sizes may create lots of entries that are unused. I was not > around when these decisions were made, but that is my understanding. > BTW - A recently added meminfo field 'Hugetlb' displays the amount of > memory consumed by huge pages of ALL sizes. -- Nitesh