Shaohua Li <shli@kernel.org> writes:

> On Fri, Sep 23, 2016 at 10:32:39AM +0800, Huang, Ying wrote:
>> Rik van Riel <riel@redhat.com> writes:
>> 
>> > On Thu, 2016-09-22 at 15:56 -0700, Shaohua Li wrote:
>> >> On Wed, Sep 07, 2016 at 09:45:59AM -0700, Huang, Ying wrote:
>> >> >.
>> >> > - It will help the memory fragmentation, especially when the THP is
>> >> > . heavily used by the applications.. The 2M continuous pages will
>> >> > be
>> >> > . free up after THP swapping out.
>> >> 
>> >> So this is impossible without THP swapin. While 2M swapout makes a
>> >> lot of
>> >> sense, I doubt 2M swapin is really useful. What kind of application
>> >> is
>> >> 'optimized' to do sequential memory access?
>> >
>> > I suspect a lot of this will depend on the ratio of storage
>> > speed to CPU & RAM speed.
>> >
>> > When swapping to a spinning disk, it makes sense to avoid
>> > extra memory use on swapin, and work in 4kB blocks.
>> 
>> For spinning disk, the THP swap optimization will be turned off in
>> current implementation.  Because huge swap cluster allocation based on
>> swap cluster management, which is available only for non-rotating block
>> devices (blk_queue_nonrot()).
>
> For 2m swapin, as long as one byte is changed in the 2m, next time we must do
> 2m swapout. There is huge waste of memory and IO bandwidth and increases
> unnecessary memory pressure. 2M IO will very easily saturate a very fast SSD
> and makes IO the bottleneck. Not sure about NVRAM though.

One solution is to make 2M swapin configurable, maybe via a sysfs file
in /sys/kernel/mm/transparent_hugepage/, so that we can turn on it only
for really fast storage devices, such as NVRAM, etc.

Best Regards,
Huang, Ying