linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes
@ 2020-03-03 12:13 SeongJae Park
  2020-03-03 12:14 ` [RFC v4 1/7] mm/madvise: Export madvise_common() to mm internal code SeongJae Park
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:13 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

DAMON[1] can be used as a primitive for data access awared memory management
optimizations.  That said, users who want such optimizations should run DAMON,
read the monitoring results, analyze it, plan a new memory management scheme,
and apply the new scheme by themselves.  Such efforts will be inevitable for
some complicated optimizations.

However, in many other cases, the users could simply want the system to apply a
memory management action to a memory region of a specific size having a
specific access frequency for a specific time.  For example, "page out a memory
region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
"Do not use THP for a memory region larger than 2 MiB rarely accessed for more
than 1 seconds".

This RFC patchset makes DAMON to handle such data access monitoring-based
operation schemes.  With this change, users can do the data access awared
optimizations by simply specifying their schemes to DAMON.


Evaluations
===========

Efficient THP
-------------

Transparent Huge Pages (THP) subsystem could waste memory space in some cases
because it aggressively promotes regular pages to huge pages.  For the reason,
use of THP is prohivited by a number of memory intensive programs such as
Redis[1] and MongoDB[2].

Below two simple data access monitoring-based operation schemes might be
helpful for the problem:

    # format: <min/max size> <min/max frequency (0-100)> <min/max age> <action>

    # If a memory region larger than 2 MiB is showing access rate higher than
    # 5%, apply MADV_HUGEPAGE to the region.
    2M	null	5	null	null	null	hugepage

    # If a memory region larger than 2 MiB is showing access rate lower than 5%
    # for more than 1 second, apply MADV_NOHUGEPAGE to the region.
    2M	null	null	5	1s	null	nohugepage

We can expect the schmes would reduce the memory space overhead but preserve
some of the performance benefit of THP.  I call this schemes Efficient THP
(ETHP).

Please note that these schemes are neither highly tuned nor for general
usecases.  These are made with my straightforward instinction for only a
demonstration of DAMOS.


Setup
-----

On my personal QEMU/KVM based virtual machine on an Intel i7 host machine
running Ubuntu 18.04, I measure runtime and consumed memory space of various
realistic workloads with several configurations.  I use 13 and 12 workloads in
PARSEC3[3] and SPLASH-2X[4] benchmark suites, respectively.  I personally use
another wrapper scripts[5] for setup and run of the workloads.

For the measurement of the amount of consumed memory in system global scope, I
drop caches before starting each of the workloads and monitor 'MemFree' in the
'/proc/meminfo' file.

The configurations I use are as below:

    orig: Linux v5.5 with 'madvise' THP policy
    thp: Linux v5.5 with 'always' THP policy
    ethp: Linux v5.5 applying the above schemes

To minimize the measurement errors, I repeat the run 5 times and average
results.  You can get stdev, min, and max of the numbers among the repeated
runs in appendix below.


[1] "Redis latency problems troubleshooting", https://redis.io/topics/latency
[2] "Disable Transparent Huge Pages (THP)",
    https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
[3] "The PARSEC Becnhmark Suite", https://parsec.cs.princeton.edu/index.htm
[4] "SPLASH-2x", https://parsec.cs.princeton.edu/parsec3-doc.htm#splash2x
[5] "parsec3_on_ubuntu", https://github.com/sjp38/parsec3_on_ubuntu


Results
-------

TL;DR: 'ethp' removes 97.61% of 'thp' memory space overhead while preserving
25.40% (up to 88.36%) of 'thp' performance improvement in total.

Following sections show the results of the measurements with raw numbers and
'orig'-relative overheads (percent) of each configuration.


Memory Space Overheads
~~~~~~~~~~~~~~~~~~~~~~

Below shows measured memory space overheads.  Raw numbers are in KiB, and the
overheads in parentheses are in percent.  For example, 'parsec3/blackscholes'
consumes about 1.819 GiB and 1.824 GiB with 'orig' and 'thp' configuration,
respectively.  The overhead of 'thp' compared to 'orig' for the workload is
0.3%.

              workloads  orig         thp (overhead)        ethp (overhead)
   parsec3/blackscholes  1819486.000  1824921.400 (  0.30)  1829070.600 (  0.53)
      parsec3/bodytrack  1417885.800  1417077.600 ( -0.06)  1427560.800 (  0.68)
        parsec3/canneal  1043876.800  1039773.000 ( -0.39)  1048445.200 (  0.44)
          parsec3/dedup  2400000.400  2434625.600 (  1.44)  2417374.400 (  0.72)
        parsec3/facesim  540206.400   542422.400 (  0.41)   551485.400 (  2.09)
         parsec3/ferret  320480.200   320157.000 ( -0.10)   331470.400 (  3.43)
   parsec3/fluidanimate  573961.400   572329.600 ( -0.28)   581836.000 (  1.37)
       parsec3/freqmine  983981.200   994839.600 (  1.10)   996124.600 (  1.23)
       parsec3/raytrace  1745175.200  1742756.400 ( -0.14)  1751706.000 (  0.37)
  parsec3/streamcluster  120558.800   120309.800 ( -0.21)   131997.800 (  9.49)
      parsec3/swaptions  14820.400    23388.800 ( 57.81)    24698.000 ( 66.65)
           parsec3/vips  2956319.200  2955803.600 ( -0.02)  2977506.200 (  0.72)
           parsec3/x264  3187699.000  3184944.000 ( -0.09)  3198462.800 (  0.34)
        splash2x/barnes  1212774.800  1221892.400 (  0.75)  1212100.800 ( -0.06)
           splash2x/fft  9364725.000  9267074.000 ( -1.04)  8997901.200 ( -3.92)
         splash2x/lu_cb  515242.400   519881.400 (  0.90)   526621.600 (  2.21)
        splash2x/lu_ncb  517308.000   520396.400 (  0.60)   521732.400 (  0.86)
      splash2x/ocean_cp  3348189.400  3380799.400 (  0.97)  3328473.400 ( -0.59)
     splash2x/ocean_ncp  3908599.800  7072076.800 ( 80.94)  4449410.400 ( 13.84)
     splash2x/radiosity  1469087.800  1482244.400 (  0.90)  1471781.000 (  0.18)
         splash2x/radix  1712487.400  1385972.800 (-19.07)  1420461.800 (-17.05)
      splash2x/raytrace  45030.600    50946.600 ( 13.14)    58586.200 ( 30.10)
       splash2x/volrend  151037.800   151188.000 (  0.10)   163213.600 (  8.06)
splash2x/water_nsquared  47442.400    47257.000 ( -0.39)    59285.800 ( 24.96)
 splash2x/water_spatial  667355.200   666824.400 ( -0.08)   673274.400 (  0.89)
                  total  40083800.000 42939900.000 (  7.13) 40150600.000 (  0.17)

In total, 'thp' shows 7.13% memory space overhead while 'ethp' shows only 0.17%
overhead.  In other words, 'ethp' removed 97.61% of 'thp' memory space
overhead.

For almost every workload, 'ethp' constantly show about 10-15 MiB memory space
overhead, mainly due to its python wrapper I used for convenient test runs.
Using DAMON's raw interface would further remove this overhead.

In case of 'parsec3/swaptions' and 'splash2x/raytrace', 'ethp' shows even
higher memory space overhead.  This is mainly due to the small size of the
workloads and the constant memory overhead of 'ethp', which came from the
python wrapper.  The workloads consumes only about 14 MiB and 45 MiB each.
Because the constant memory consumption from the python wrapper of 'ethp'
(about 10-15 MiB) is relatively huge to the small working set, the relative
overhead becomes high.  Nonetheless, such small workloads are not appropriate
target of the 'ethp' and the overhead can be removed by avoiding use of the
wrapper.


Runtime Overheads
~~~~~~~~~~~~~~~~~

Below shows measured runtime in similar way.  The raw numbers are in seconds
and the overheads are in percent.  Minus runtime overheads mean speedup.

                runtime  orig      thp (overhead)     ethp (overhead)
   parsec3/blackscholes  107.003   106.468 ( -0.50)   107.260 (  0.24)
      parsec3/bodytrack  78.854    78.757 ( -0.12)    79.261 (  0.52)
        parsec3/canneal  137.520   120.854 (-12.12)   132.427 ( -3.70)
          parsec3/dedup  11.873    11.665 ( -1.76)    11.883 (  0.09)
        parsec3/facesim  207.895   204.215 ( -1.77)   206.170 ( -0.83)
         parsec3/ferret  190.507   189.972 ( -0.28)   190.818 (  0.16)
   parsec3/fluidanimate  211.064   208.862 ( -1.04)   211.874 (  0.38)
       parsec3/freqmine  290.157   288.831 ( -0.46)   292.495 (  0.81)
       parsec3/raytrace  118.460   118.741 (  0.24)   119.808 (  1.14)
  parsec3/streamcluster  324.524   283.709 (-12.58)   307.209 ( -5.34)
      parsec3/swaptions  154.458   154.894 (  0.28)   155.307 (  0.55)
           parsec3/vips  58.588    58.622 (  0.06)    59.037 (  0.77)
           parsec3/x264  66.493    66.604 (  0.17)    67.051 (  0.84)
        splash2x/barnes  79.769    73.886 ( -7.38)    78.737 ( -1.29)
           splash2x/fft  32.857    22.960 (-30.12)    25.808 (-21.45)
         splash2x/lu_cb  85.113    84.939 ( -0.20)    85.344 (  0.27)
        splash2x/lu_ncb  92.408    90.103 ( -2.49)    93.585 (  1.27)
      splash2x/ocean_cp  44.374    42.876 ( -3.37)    43.613 ( -1.71)
     splash2x/ocean_ncp  80.710    51.831 (-35.78)    71.498 (-11.41)
     splash2x/radiosity  90.626    90.398 ( -0.25)    91.238 (  0.68)
         splash2x/radix  30.875    25.226 (-18.30)    25.882 (-16.17)
      splash2x/raytrace  84.114    82.602 ( -1.80)    85.124 (  1.20)
       splash2x/volrend  86.796    86.347 ( -0.52)    88.223 (  1.64)
splash2x/water_nsquared  230.781   220.667 ( -4.38)   232.664 (  0.82)
 splash2x/water_spatial  88.719    90.187 (  1.65)    89.228 (  0.57)
                  total  2984.530  2854.220 ( -4.37)  2951.540 ( -1.11)

In total, 'thp' shows 4.37% speedup while 'ethp' shows 1.11% speedup.  In other
words, 'ethp' preserves about 25.40% of THP performance benefit.

In the best case (splash2x/raytrace), 'ethp' preserves 88.36% of the benefit.

If we narrow down to workloads showing high THP performance benefits
(splash2x/fft, splash2x/ocean_ncp, and splash2x/radix), 'thp' and 'ethp' shows
30.75% and 14.71% speedup in total, respectively.  In other words, 'ethp'
preserves about 47.83% of the benefit.

Even in the worst case (splash2x/volrend), 'ethp' incurs only 1.64% runtime
overhead, which is similar to that of 'thp' (1.65% for
'splash2x/water_spatial').


Sequence Of Patches
===================

The patches are based on the v5.5 plus v5 DAMON patchset[1] and Minchan's
``madvise()`` factor-out patch[2].  Minchan's patch was necessary for reuse of
``madvise()`` code in DAMON.  You can also clone the complete git tree:

    $ git clone git://github.com/sjp38/linux -b damos/rfc/v4

The web is also available:
https://github.com/sjp38/linux/releases/tag/damos/rfc/v4


[1] https://lore.kernel.org/linux-mm/20200217103110.30817-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20200128001641.5086-2-minchan@kernel.org/

The first patch allows DAMON to reuse ``madvise()`` code for the actions.  The
second patch accounts age of each region.  The third patch implements the
handling of the schemes in DAMON and exports a kernel space programming
interface for it.  The fourth patch implements a debugfs interface for
privileged people and programs.  The fifth and sixth patches each adds
kunittests and selftests for these changes, and finally the seventhe patch
modifies the user space tool for DAMON to support description and applying of
schemes in human freiendly way.


Patch History
=============

Changes from RFC v3
(https://lore.kernel.org/linux-mm/20200225102300.23895-1-sjpark@amazon.com/)
 - Add Reviewed-by from Brendan Higgins
 - Code cleanup: Modularize madvise() call
 - Fix a trivial bug in the wrapper python script
 - Add more stable and detailed evaluation results with updated ETHP scheme

Changes from RFC v2
(https://lore.kernel.org/linux-mm/20200218085309.18346-1-sjpark@amazon.com/)
 - Fix aging mechanism for more better 'old region' selection
 - Add more kunittests and kselftests for this patchset
 - Support more human friedly description and application of 'schemes'

Changes from RFC v1
(https://lore.kernel.org/linux-mm/20200210150921.32482-1-sjpark@amazon.com/)
 - Properly adjust age accounting related properties after splitting, merging,
   and action applying

SeongJae Park (7):
  mm/madvise: Export madvise_common() to mm internal code
  mm/damon: Account age of target regions
  mm/damon: Implement data access monitoring-based operation schemes
  mm/damon/schemes: Implement a debugfs interface
  mm/damon-test: Add kunit test case for regions age accounting
  mm/damon/selftests: Add 'schemes' debugfs tests
  damon/tools: Support more human friendly 'schemes' control

 include/linux/damon.h                         |  29 ++
 mm/damon-test.h                               |   5 +
 mm/damon.c                                    | 391 +++++++++++++++++-
 mm/internal.h                                 |   4 +
 mm/madvise.c                                  |   3 +-
 tools/damon/_convert_damos.py                 | 125 ++++++
 tools/damon/_damon.py                         | 143 +++++++
 tools/damon/damo                              |   7 +
 tools/damon/record.py                         | 135 +-----
 tools/damon/schemes.py                        | 105 +++++
 .../testing/selftests/damon/debugfs_attrs.sh  |  29 ++
 11 files changed, 845 insertions(+), 131 deletions(-)
 create mode 100755 tools/damon/_convert_damos.py
 create mode 100644 tools/damon/_damon.py
 create mode 100644 tools/damon/schemes.py

-- 
2.17.1

==================================== >8 =======================================

Appendix: Stdev / min / max numbers among the repeated runs
===========================================================

Below are stdev/min/max of each number in the 5 repeated runs.

runtime_stdev	orig	thp	ethp
parsec3/blackscholes	0.884	0.932	0.693
parsec3/bodytrack	0.672	0.501	0.470
parsec3/canneal	3.434	1.278	4.112
parsec3/dedup	0.074	0.032	0.070
parsec3/facesim	1.079	0.572	0.688
parsec3/ferret	1.674	0.498	0.801
parsec3/fluidanimate	1.422	1.804	1.273
parsec3/freqmine	2.285	2.735	3.852
parsec3/raytrace	1.240	0.821	1.407
parsec3/streamcluster	2.226	2.221	2.778
parsec3/swaptions	1.760	2.164	1.650
parsec3/vips	0.071	0.113	0.433
parsec3/x264	4.972	4.732	5.464
splash2x/barnes	0.149	0.434	0.944
splash2x/fft	0.186	0.074	2.053
splash2x/lu_cb	0.358	0.674	0.054
splash2x/lu_ncb	0.694	0.586	0.301
splash2x/ocean_cp	0.214	0.181	0.163
splash2x/ocean_ncp	0.738	0.574	5.860
splash2x/radiosity	0.447	0.786	0.493
splash2x/radix	0.183	0.195	0.250
splash2x/raytrace	0.869	1.248	1.071
splash2x/volrend	0.896	0.801	0.759
splash2x/water_nsquared	3.050	3.032	1.750
splash2x/water_spatial	0.497	1.607	0.665


memused.avg_stdev	orig	thp	ethp
parsec3/blackscholes	6837.158	4942.183	5531.310
parsec3/bodytrack	5591.783	5771.259	3959.415
parsec3/canneal	4034.250	5205.223	3294.782
parsec3/dedup	56582.594	12462.196	49390.950
parsec3/facesim	1879.070	3572.512	2407.374
parsec3/ferret	1686.811	4110.648	3050.263
parsec3/fluidanimate	5252.273	3550.694	3577.428
parsec3/freqmine	2634.481	12225.383	2220.963
parsec3/raytrace	5652.660	5615.677	4645.947
parsec3/streamcluster	2296.864	1906.081	2189.578
parsec3/swaptions	1100.155	18202.456	1689.923
parsec3/vips	5260.607	9104.494	2508.632
parsec3/x264	14892.433	18097.263	16853.532
splash2x/barnes	3055.563	2552.379	3749.773
splash2x/fft	115636.847	18058.645	193864.925
splash2x/lu_cb	2266.989	2495.620	9615.377
splash2x/lu_ncb	4816.990	3106.290	3406.873
splash2x/ocean_cp	5597.264	2189.592	40420.686
splash2x/ocean_ncp	6962.524	5038.039	352254.041
splash2x/radiosity	6151.433	1561.840	6976.647
splash2x/radix	12938.174	4141.470	64272.890
splash2x/raytrace	912.177	1473.169	1812.460
splash2x/volrend	1866.708	1527.107	1881.400
splash2x/water_nsquared	2126.581	4481.707	2471.129
splash2x/water_spatial	1495.886	3564.505	3182.864


runtime_min	orig	thp	ethp
parsec3/blackscholes	106.073	105.724	106.799
parsec3/bodytrack	78.361	78.327	78.994
parsec3/canneal	130.735	118.456	125.902
parsec3/dedup	11.816	11.631	11.781
parsec3/facesim	206.358	203.462	205.526
parsec3/ferret	189.118	189.461	190.130
parsec3/fluidanimate	209.879	207.381	210.656
parsec3/freqmine	287.349	285.988	288.519
parsec3/raytrace	117.320	118.014	118.021
parsec3/streamcluster	322.404	280.907	304.489
parsec3/swaptions	153.017	153.133	154.307
parsec3/vips	58.480	58.518	58.496
parsec3/x264	61.569	61.987	62.333
splash2x/barnes	79.595	73.170	77.782
splash2x/fft	32.588	22.838	24.391
splash2x/lu_cb	84.897	84.229	85.300
splash2x/lu_ncb	91.640	89.480	93.192
splash2x/ocean_cp	44.216	42.661	43.403
splash2x/ocean_ncp	79.912	50.717	63.298
splash2x/radiosity	90.332	89.911	90.786
splash2x/radix	30.617	25.012	25.569
splash2x/raytrace	82.972	81.291	83.608
splash2x/volrend	86.205	85.414	86.772
splash2x/water_nsquared	228.749	216.488	230.019
splash2x/water_spatial	88.326	88.636	88.469


memused.avg_min	orig	thp	ethp
parsec3/blackscholes	1809578.000	1815893.000	1821555.000
parsec3/bodytrack	1407270.000	1408774.000	1422950.000
parsec3/canneal	1037996.000	1029491.000	1042278.000
parsec3/dedup	2290578.000	2419128.000	2322004.000
parsec3/facesim	536908.000	539368.000	548194.000
parsec3/ferret	317173.000	313275.000	325452.000
parsec3/fluidanimate	566148.000	566925.000	578031.000
parsec3/freqmine	979565.000	985279.000	992844.000
parsec3/raytrace	1737270.000	1735498.000	1745751.000
parsec3/streamcluster	117213.000	118264.000	127825.000
parsec3/swaptions	13012.000	10753.000	21858.000
parsec3/vips	2946474.000	2941690.000	2975157.000
parsec3/x264	3171581.000	3170872.000	3184577.000
splash2x/barnes	1208476.000	1218535.000	1205510.000
splash2x/fft	9160132.000	9250818.000	8835513.000
splash2x/lu_cb	511850.000	515668.000	519205.000
splash2x/lu_ncb	512127.000	514471.000	518500.000
splash2x/ocean_cp	3342506.000	3377932.000	3290066.000
splash2x/ocean_ncp	3901749.000	7063386.000	3962171.000
splash2x/radiosity	1457419.000	1479232.000	1467156.000
splash2x/radix	1690840.000	1380921.000	1344838.000
splash2x/raytrace	43518.000	48571.000	55468.000
splash2x/volrend	147356.000	148650.000	159562.000
splash2x/water_nsquared	43685.000	38495.000	54409.000
splash2x/water_spatial	665912.000	660742.000	669843.000


runtime_max	orig	thp	ethp
parsec3/blackscholes	108.322	108.141	108.641
parsec3/bodytrack	80.166	79.687	80.200
parsec3/canneal	140.219	122.073	137.615
parsec3/dedup	12.014	11.723	12.000
parsec3/facesim	209.291	205.234	207.192
parsec3/ferret	193.589	190.830	192.235
parsec3/fluidanimate	213.730	212.390	213.867
parsec3/freqmine	293.634	292.283	299.323
parsec3/raytrace	120.096	120.346	121.437
parsec3/streamcluster	327.827	287.094	311.657
parsec3/swaptions	157.661	158.341	158.589
parsec3/vips	58.648	58.815	59.611
parsec3/x264	73.389	73.856	75.369
splash2x/barnes	79.975	74.413	80.244
splash2x/fft	33.168	23.043	29.852
splash2x/lu_cb	85.825	85.914	85.446
splash2x/lu_ncb	93.717	91.074	93.902
splash2x/ocean_cp	44.789	43.190	43.882
splash2x/ocean_ncp	81.981	52.296	80.782
splash2x/radiosity	91.509	91.966	92.180
splash2x/radix	31.130	25.546	26.299
splash2x/raytrace	85.347	84.163	86.881
splash2x/volrend	88.575	87.389	88.957
splash2x/water_nsquared	236.851	224.982	235.537
splash2x/water_spatial	89.689	92.978	90.276


memused.avg_max	orig	thp	ethp
parsec3/blackscholes	1827350.000	1830922.000	1836584.000
parsec3/bodytrack	1423070.000	1422588.000	1434832.000
parsec3/canneal	1048155.000	1043151.000	1051713.000
parsec3/dedup	2446661.000	2452237.000	2459532.000
parsec3/facesim	542340.000	547457.000	554321.000
parsec3/ferret	321678.000	325083.000	333474.000
parsec3/fluidanimate	579067.000	576389.000	587029.000
parsec3/freqmine	986759.000	1018980.000	998800.000
parsec3/raytrace	1750980.000	1749291.000	1757761.000
parsec3/streamcluster	123761.000	122647.000	133602.000
parsec3/swaptions	16305.000	59605.000	26835.000
parsec3/vips	2961299.000	2964746.000	2982101.000
parsec3/x264	3209871.000	3219818.000	3230036.000
splash2x/barnes	1217047.000	1224832.000	1215995.000
splash2x/fft	9505048.000	9302095.000	9378025.000
splash2x/lu_cb	518393.000	522739.000	545540.000
splash2x/lu_ncb	526380.000	522996.000	528341.000
splash2x/ocean_cp	3358820.000	3384581.000	3383533.000
splash2x/ocean_ncp	3920669.000	7079011.000	4937246.000
splash2x/radiosity	1474991.000	1483739.000	1485635.000
splash2x/radix	1731625.000	1393183.000	1498907.000
splash2x/raytrace	46122.000	52292.000	61116.000
splash2x/volrend	152488.000	153180.000	164793.000
splash2x/water_nsquared	49449.000	50555.000	60859.000
splash2x/water_spatial	669943.000	669815.000	679012.000

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC v4 1/7] mm/madvise: Export madvise_common() to mm internal code
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-03 12:14 ` [RFC v4 2/7] mm/damon: Account age of target regions SeongJae Park
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit exports ``madvise_common()`` to ``mm/`` code for future
reuse.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/internal.h | 4 ++++
 mm/madvise.c  | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3cf20ab3ca01..dcdfe00e02ff 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -576,4 +576,8 @@ static inline bool is_migrate_highatomic_page(struct page *page)
 
 void setup_zone_pageset(struct zone *zone);
 extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+
+
+int madvise_common(struct task_struct *task, struct mm_struct *mm,
+			unsigned long start, size_t len_in, int behavior);
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/madvise.c b/mm/madvise.c
index 0c901de531e4..4fa9dfc770bc 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1005,7 +1005,7 @@ madvise_behavior_valid(int behavior)
  * @task could be a zombie leader if it calls sys_exit so accessing mm_struct
  * via task->mm is prohibited. Please use @mm instead of task->mm.
  */
-static int madvise_common(struct task_struct *task, struct mm_struct *mm,
+int madvise_common(struct task_struct *task, struct mm_struct *mm,
 			unsigned long start, size_t len_in, int behavior)
 {
 	unsigned long end, tmp;
@@ -1103,6 +1103,7 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
 
 	return error;
 }
+EXPORT_SYMBOL_GPL(madvise_common);
 
 /*
  * The madvise(2) system call.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 2/7] mm/damon: Account age of target regions
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
  2020-03-03 12:14 ` [RFC v4 1/7] mm/madvise: Export madvise_common() to mm internal code SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-04 15:21   ` Rik van Riel
  2020-03-03 12:14 ` [RFC v4 3/7] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

DAMON can be used as a primitive for data access pattern awared memory
maangement optimizations.  However, users who want such optimizations
should run DAMON, read the monitoring results, analyze it, plan a new
memory management scheme, and apply the new scheme by themselves.  It
would not be too hard, but still require some level of efforts.  For
complicated optimizations, this effort is inevitable.

That said, in many cases, users would simply want to apply an actions to
a memory region of a specific size having a specific access frequency
for a specific time.  For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".

For such optimizations, users will need to first account the age of each
region themselves.  To reduce such efforts, this commit implements a
simple age account of each region in DAMON.  For each aggregation step,
DAMON compares the access frequency and start/end address of each region
with those from last aggregation and reset the age of the region if the
change is significant.  Else, the age is incremented.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h |  5 +++
 mm/damon.c            | 80 ++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 78785cb88d42..50fbe308590e 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -22,6 +22,11 @@ struct damon_region {
 	unsigned long sampling_addr;
 	unsigned int nr_accesses;
 	struct list_head list;
+
+	unsigned int age;
+	unsigned long last_vm_start;
+	unsigned long last_vm_end;
+	unsigned int last_nr_accesses;
 };
 
 /* Represents a monitoring target task */
diff --git a/mm/damon.c b/mm/damon.c
index ff150ae7532a..c292ddd36c86 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -87,6 +87,10 @@ static struct damon_region *damon_new_region(struct damon_ctx *ctx,
 	ret->sampling_addr = damon_rand(ctx, vm_start, vm_end);
 	INIT_LIST_HEAD(&ret->list);
 
+	ret->age = 0;
+	ret->last_vm_start = vm_start;
+	ret->last_vm_end = vm_end;
+
 	return ret;
 }
 
@@ -600,11 +604,44 @@ static void kdamond_flush_aggregated(struct damon_ctx *c)
 			damon_write_rbuf(c, &r->vm_end, sizeof(r->vm_end));
 			damon_write_rbuf(c, &r->nr_accesses,
 					sizeof(r->nr_accesses));
+			r->last_nr_accesses = r->nr_accesses;
 			r->nr_accesses = 0;
 		}
 	}
 }
 
+#define diff_of(a, b) (a > b ? a - b : b - a)
+
+/*
+ * Increase or reset the age of the given monitoring target region
+ *
+ * If the area or '->nr_accesses' has changed significantly, reset the '->age'.
+ * Else, increase the age.
+ */
+static void damon_do_count_age(struct damon_region *r, unsigned int threshold)
+{
+	unsigned long sz_threshold = (r->vm_end - r->vm_start) / 5;
+
+	if (diff_of(r->vm_start, r->last_vm_start) +
+			diff_of(r->vm_end, r->last_vm_end) > sz_threshold)
+		r->age = 0;
+	else if (diff_of(r->nr_accesses, r->last_nr_accesses) > threshold)
+		r->age = 0;
+	else
+		r->age++;
+}
+
+static void kdamond_count_age(struct damon_ctx *c, unsigned int threshold)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+
+	damon_for_each_task(c, t) {
+		damon_for_each_region(r, t)
+			damon_do_count_age(r, threshold);
+	}
+}
+
 #define sz_damon_region(r) (r->vm_end - r->vm_start)
 
 /*
@@ -613,15 +650,15 @@ static void kdamond_flush_aggregated(struct damon_ctx *c)
 static void damon_merge_two_regions(struct damon_region *l,
 				struct damon_region *r)
 {
-	l->nr_accesses = (l->nr_accesses * sz_damon_region(l) +
-			r->nr_accesses * sz_damon_region(r)) /
-			(sz_damon_region(l) + sz_damon_region(r));
+	unsigned long sz_l = sz_damon_region(l), sz_r = sz_damon_region(r);
+
+	l->nr_accesses = (l->nr_accesses * sz_l + r->nr_accesses * sz_r) /
+			(sz_l + sz_r);
+	l->age = (l->age * sz_l + r->age * sz_r) / (sz_l + sz_r);
 	l->vm_end = r->vm_end;
 	damon_destroy_region(r);
 }
 
-#define diff_of(a, b) (a > b ? a - b : b - a)
-
 /*
  * Merge adjacent regions having similar access frequencies
  *
@@ -631,17 +668,43 @@ static void damon_merge_two_regions(struct damon_region *l,
 static void damon_merge_regions_of(struct damon_task *t, unsigned int thres)
 {
 	struct damon_region *r, *prev = NULL, *next;
+	unsigned long sz_subregion, last_last_vm = 0;
+	unsigned long sz_biggest = 0;	/* size of the biggest subregion */
+	struct region last_biggest;	/* last region of the biggest sub */
 
 	damon_for_each_region_safe(r, next, t) {
 		if (!prev || prev->vm_end != r->vm_start)
 			goto next;
 		if (diff_of(prev->nr_accesses, r->nr_accesses) > thres)
 			goto next;
+		if (!sz_biggest) {
+			sz_biggest = sz_damon_region(prev);
+			last_biggest.start = prev->last_vm_start;
+			last_biggest.end = prev->last_vm_end;
+		}
+		if (last_last_vm != r->last_vm_start)
+			sz_subregion = 0;
+		sz_subregion += sz_damon_region(r);
+		last_last_vm = r->last_vm_start;
+		if (sz_subregion > sz_biggest) {
+			sz_biggest = sz_subregion;
+			last_biggest.start = r->last_vm_start;
+			last_biggest.end = r->last_vm_end;
+		}
 		damon_merge_two_regions(prev, r);
 		continue;
 next:
+		if (sz_biggest) {
+			sz_biggest = 0;
+			prev->last_vm_start = last_biggest.start;
+			prev->last_vm_end = last_biggest.end;
+		}
 		prev = r;
 	}
+	if (sz_biggest) {
+		prev->last_vm_start = last_biggest.start;
+		prev->last_vm_end = last_biggest.end;
+	}
 }
 
 /*
@@ -674,6 +737,12 @@ static void damon_split_region_at(struct damon_ctx *ctx,
 	struct damon_region *new;
 
 	new = damon_new_region(ctx, r->vm_start + sz_r, r->vm_end);
+	new->age = r->age;
+	new->last_vm_start = r->vm_start;
+	new->last_nr_accesses = r->last_nr_accesses;
+
+	r->last_vm_start = r->vm_start;
+	r->last_vm_end = r->vm_end;
 	r->vm_end = new->vm_start;
 
 	damon_add_region(new, r, damon_next_region(r));
@@ -865,6 +934,7 @@ static int kdamond_fn(void *data)
 
 		if (kdamond_aggregate_interval_passed(ctx)) {
 			kdamond_merge_regions(ctx, max_nr_accesses / 10);
+			kdamond_count_age(ctx, max_nr_accesses / 10);
 			if (ctx->aggregate_cb)
 				ctx->aggregate_cb(ctx);
 			kdamond_flush_aggregated(ctx);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 3/7] mm/damon: Implement data access monitoring-based operation schemes
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
  2020-03-03 12:14 ` [RFC v4 1/7] mm/madvise: Export madvise_common() to mm internal code SeongJae Park
  2020-03-03 12:14 ` [RFC v4 2/7] mm/damon: Account age of target regions SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-03 12:14 ` [RFC v4 4/7] mm/damon/schemes: Implement a debugfs interface SeongJae Park
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

In many cases, users might use DAMON for simple data access awared
memory management optimizations such as applying an operation scheme to
a memory region of a specific size having a specific access frequency
for a specific time.  For example, "page out a memory region larger than
100 MiB but having a low access frequency more than 10 minutes", or "Use
THP for a memory region larger than 2 MiB having a high access frequency
for more than 2 seconds".

To minimize users from spending their time for implementation of such
simple data access monitoring-based operation schemes, this commit makes
DAMON to handle such schemes directly.  With this commit, users can
simply specify their desired schemes to DAMON.

Each of the schemes is composed with conditions for filtering of the
target memory regions and desired memory management action for the
target.  In specific, the format is::

    <min/max size> <min/max access frequency> <min/max age> <action>

The filtering conditions are size of memory region, number of accesses
to the region monitored by DAMON, and the age of the region.  The age of
region is incremented periodically but reset when its addresses or
access frequency has significanly changed or the action of a scheme has
applied.  For the action, current implementation supports only a few of
madvise() hints, ``MADV_WILLNEED``, ``MADV_COLD``, ``MADV_PAGEOUT``,
``MADV_HUGEPAGE``, and ``MADV_NOHUGEPAGE``.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 include/linux/damon.h |  24 ++++++++
 mm/damon.c            | 140 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 50fbe308590e..8cb2452579ee 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -36,6 +36,27 @@ struct damon_task {
 	struct list_head list;
 };
 
+/* Data Access Monitoring-based Operation Scheme */
+enum damos_action {
+	DAMOS_WILLNEED,
+	DAMOS_COLD,
+	DAMOS_PAGEOUT,
+	DAMOS_HUGEPAGE,
+	DAMOS_NOHUGEPAGE,
+	DAMOS_ACTION_LEN,
+};
+
+struct damos {
+	unsigned int min_sz_region;
+	unsigned int max_sz_region;
+	unsigned int min_nr_accesses;
+	unsigned int max_nr_accesses;
+	unsigned int min_age_region;
+	unsigned int max_age_region;
+	enum damos_action action;
+	struct list_head list;
+};
+
 struct damon_ctx {
 	unsigned long sample_interval;
 	unsigned long aggr_interval;
@@ -58,6 +79,7 @@ struct damon_ctx {
 	struct rnd_state rndseed;
 
 	struct list_head tasks_list;	/* 'damon_task' objects */
+	struct list_head schemes_list;	/* 'damos' objects */
 
 	/* callbacks */
 	void (*sample_cb)(struct damon_ctx *context);
@@ -66,6 +88,8 @@ struct damon_ctx {
 
 int damon_set_pids(struct damon_ctx *ctx,
 			unsigned long *pids, ssize_t nr_pids);
+int damon_set_schemes(struct damon_ctx *ctx,
+			struct damos **schemes, ssize_t nr_schemes);
 int damon_set_recording(struct damon_ctx *ctx,
 			unsigned int rbuf_len, char *rfile_path);
 int damon_set_attrs(struct damon_ctx *ctx, unsigned long s, unsigned long a,
diff --git a/mm/damon.c b/mm/damon.c
index c292ddd36c86..338e7ea76c7f 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -11,6 +11,7 @@
 
 #define CREATE_TRACE_POINTS
 
+#include <asm-generic/mman-common.h>
 #include <linux/damon.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
@@ -24,6 +25,8 @@
 #include <linux/slab.h>
 #include <trace/events/damon.h>
 
+#include "internal.h"
+
 #define damon_get_task_struct(t) \
 	(get_pid_task(find_vpid(t->pid), PIDTYPE_PID))
 
@@ -45,6 +48,12 @@
 #define damon_for_each_task_safe(ctx, t, next) \
 	list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list)
 
+#define damon_for_each_schemes(ctx, r) \
+	list_for_each_entry(r, &(ctx)->schemes_list, list)
+
+#define damon_for_each_schemes_safe(ctx, s, next) \
+	list_for_each_entry_safe(s, next, &(ctx)->schemes_list, list)
+
 #define MAX_RFILE_PATH_LEN	256
 
 /* Get a random number in [l, r) */
@@ -190,6 +199,27 @@ static void damon_destroy_task(struct damon_task *t)
 	damon_free_task(t);
 }
 
+static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
+{
+	list_add_tail(&s->list, &ctx->schemes_list);
+}
+
+static void damon_del_scheme(struct damos *s)
+{
+	list_del(&s->list);
+}
+
+static void damon_free_scheme(struct damos *s)
+{
+	kfree(s);
+}
+
+static void damon_destroy_scheme(struct damos *s)
+{
+	damon_del_scheme(s);
+	damon_free_scheme(s);
+}
+
 /*
  * Returns number of monitoring target tasks
  */
@@ -642,6 +672,93 @@ static void kdamond_count_age(struct damon_ctx *c, unsigned int threshold)
 	}
 }
 
+static int damos_madvise(struct damon_task *task, struct damon_region *r,
+			int behavior)
+{
+	struct task_struct *t;
+	struct mm_struct *mm;
+	int ret = -ENOMEM;
+
+	t = damon_get_task_struct(task);
+	if (!t)
+		goto out;
+	mm = damon_get_mm(task);
+	if (!mm)
+		goto put_task_out;
+
+	ret = madvise_common(t, mm, PAGE_ALIGN(r->vm_start),
+			PAGE_ALIGN(r->vm_end - r->vm_start), behavior);
+	mmput(mm);
+put_task_out:
+	put_task_struct(t);
+out:
+	return ret;
+}
+
+static int damos_do_action(struct damon_task *task, struct damon_region *r,
+			enum damos_action action)
+{
+	int madv_action;
+
+	switch (action) {
+	case DAMOS_WILLNEED:
+		madv_action = MADV_WILLNEED;
+		break;
+	case DAMOS_COLD:
+		madv_action = MADV_COLD;
+		break;
+	case DAMOS_PAGEOUT:
+		madv_action = MADV_PAGEOUT;
+		break;
+	case DAMOS_HUGEPAGE:
+		madv_action = MADV_HUGEPAGE;
+		break;
+	case DAMOS_NOHUGEPAGE:
+		madv_action = MADV_NOHUGEPAGE;
+		break;
+	default:
+		pr_warn("Wrong action %d\n", action);
+		return -EINVAL;
+	}
+
+	return damos_madvise(task, r, madv_action);
+}
+
+static void damon_do_apply_schemes(struct damon_ctx *c, struct damon_task *t,
+				struct damon_region *r)
+{
+	struct damos *s;
+	unsigned long sz;
+
+	damon_for_each_schemes(c, s) {
+		sz = r->vm_end - r->vm_start;
+		if ((s->min_sz_region && sz < s->min_sz_region) ||
+				(s->max_sz_region && s->max_sz_region < sz))
+			continue;
+		if ((s->min_nr_accesses && r->nr_accesses < s->min_nr_accesses)
+				|| (s->max_nr_accesses &&
+					s->max_nr_accesses < r->nr_accesses))
+			continue;
+		if ((s->min_age_region && r->age < s->min_age_region) ||
+				(s->max_age_region &&
+				 s->max_age_region < r->age))
+			continue;
+		damos_do_action(t, r, s->action);
+		r->age = 0;
+	}
+}
+
+static void kdamond_apply_schemes(struct damon_ctx *c)
+{
+	struct damon_task *t;
+	struct damon_region *r;
+
+	damon_for_each_task(c, t) {
+		damon_for_each_region(r, t)
+			damon_do_apply_schemes(c, t, r);
+	}
+}
+
 #define sz_damon_region(r) (r->vm_end - r->vm_start)
 
 /*
@@ -937,6 +1054,7 @@ static int kdamond_fn(void *data)
 			kdamond_count_age(ctx, max_nr_accesses / 10);
 			if (ctx->aggregate_cb)
 				ctx->aggregate_cb(ctx);
+			kdamond_apply_schemes(ctx);
 			kdamond_flush_aggregated(ctx);
 			kdamond_split_regions(ctx);
 		}
@@ -1011,6 +1129,27 @@ int damon_stop(struct damon_ctx *ctx)
 	return damon_turn_kdamond(ctx, false);
 }
 
+/*
+ * Set the data access monitoring oriented schemes
+ *
+ * NOTE: This function should not be called while the kdamond of the context is
+ * running.
+ *
+ * Returns 0 if success, or negative error code otherwise.
+ */
+int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes,
+			ssize_t nr_schemes)
+{
+	struct damos *s, *next;
+	ssize_t i;
+
+	damon_for_each_schemes_safe(ctx, s, next)
+		damon_destroy_scheme(s);
+	for (i = 0; i < nr_schemes; i++)
+		damon_add_scheme(ctx, schemes[i]);
+	return 0;
+}
+
 /*
  * This function should not be called while the kdamond is running.
  */
@@ -1456,6 +1595,7 @@ static int __init damon_init_user_ctx(void)
 
 	prandom_seed_state(&ctx->rndseed, 42);
 	INIT_LIST_HEAD(&ctx->tasks_list);
+	INIT_LIST_HEAD(&ctx->schemes_list);
 
 	ctx->sample_cb = NULL;
 	ctx->aggregate_cb = NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 4/7] mm/damon/schemes: Implement a debugfs interface
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
                   ` (2 preceding siblings ...)
  2020-03-03 12:14 ` [RFC v4 3/7] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-03 12:14 ` [RFC v4 5/7] mm/damon-test: Add kunit test case for regions age accounting SeongJae Park
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit implements a debugfs interface for the data access
monitoring oriented memory management schemes.  It is supposed to be
used by administrators and/or privileged user space programs.  Users can
read and update the rules using ``<debugfs>/damon/schemes`` file.  The
format is::

    <min/max size> <min/max access frequency> <min/max age> <action>

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 mm/damon.c | 171 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 169 insertions(+), 2 deletions(-)

diff --git a/mm/damon.c b/mm/damon.c
index 338e7ea76c7f..c573a0290234 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -199,6 +199,29 @@ static void damon_destroy_task(struct damon_task *t)
 	damon_free_task(t);
 }
 
+static struct damos *damon_new_scheme(
+		unsigned int min_sz_region, unsigned int max_sz_region,
+		unsigned int min_nr_accesses, unsigned int max_nr_accesses,
+		unsigned int min_age_region, unsigned int max_age_region,
+		enum damos_action action)
+{
+	struct damos *ret;
+
+	ret = kmalloc(sizeof(struct damos), GFP_KERNEL);
+	if (!ret)
+		return NULL;
+	ret->min_sz_region = min_sz_region;
+	ret->max_sz_region = max_sz_region;
+	ret->min_nr_accesses = min_nr_accesses;
+	ret->max_nr_accesses = max_nr_accesses;
+	ret->min_age_region = min_age_region;
+	ret->max_age_region = max_age_region;
+	ret->action = action;
+	INIT_LIST_HEAD(&ret->list);
+
+	return ret;
+}
+
 static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
 {
 	list_add_tail(&s->list, &ctx->schemes_list);
@@ -1306,6 +1329,144 @@ static ssize_t debugfs_monitor_on_write(struct file *file,
 	return ret;
 }
 
+static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
+{
+	struct damos *s;
+	int written = 0;
+	int rc;
+
+	damon_for_each_schemes(c, s) {
+		rc = snprintf(&buf[written], len - written,
+				"%u %u %u %u %u %u %d\n",
+				s->min_sz_region, s->max_sz_region,
+				s->min_nr_accesses, s->max_nr_accesses,
+				s->min_age_region, s->max_age_region,
+				s->action);
+		if (!rc)
+			return -ENOMEM;
+		written += rc;
+	}
+	return written;
+}
+
+static ssize_t debugfs_schemes_read(struct file *file, char __user *buf,
+		size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char *kbuf;
+	ssize_t ret;
+
+	kbuf = kmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	ret = sprint_schemes(ctx, kbuf, count);
+	if (ret < 0)
+		goto out;
+	ret = simple_read_from_buffer(buf, count, ppos, kbuf, ret);
+
+out:
+	kfree(kbuf);
+	return ret;
+}
+
+static void free_schemes_arr(struct damos **schemes, ssize_t nr_schemes)
+{
+	ssize_t i;
+
+	for (i = 0; i < nr_schemes; i++)
+		kfree(schemes[i]);
+	kfree(schemes);
+}
+
+/*
+ * Converts a string into an array of struct damos pointers
+ *
+ * Returns an array of struct damos pointers that converted if the conversion
+ * success, or NULL otherwise.
+ */
+static struct damos **str_to_schemes(const char *str, ssize_t len,
+				ssize_t *nr_schemes)
+{
+	struct damos *scheme, **schemes;
+	const int max_nr_schemes = 256;
+	int pos = 0, parsed, ret;
+	unsigned int min_sz, max_sz, min_nr_a, max_nr_a, min_age, max_age;
+	int action;
+
+	schemes = kmalloc_array(max_nr_schemes, sizeof(struct damos *),
+			GFP_KERNEL);
+	if (!schemes)
+		return NULL;
+
+	*nr_schemes = 0;
+	while (pos < len && *nr_schemes < max_nr_schemes) {
+		ret = sscanf(&str[pos], "%u %u %u %u %u %u %d%n",
+				&min_sz, &max_sz, &min_nr_a, &max_nr_a,
+				&min_age, &max_age, &action, &parsed);
+		pos += parsed;
+		if (ret != 7)
+			break;
+		if (action >= DAMOS_ACTION_LEN) {
+			pr_err("wrong action %d\n", action);
+			goto fail;
+		}
+
+		scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a,
+				min_age, max_age, action);
+		if (!scheme)
+			goto fail;
+
+		schemes[*nr_schemes] = scheme;
+		*nr_schemes += 1;
+	}
+	if (!*nr_schemes)
+		goto fail;
+	return schemes;
+fail:
+	free_schemes_arr(schemes, *nr_schemes);
+	return NULL;
+}
+
+static ssize_t debugfs_schemes_write(struct file *file, const char __user *buf,
+		size_t count, loff_t *ppos)
+{
+	struct damon_ctx *ctx = &damon_user_ctx;
+	char *kbuf;
+	struct damos **schemes;
+	ssize_t nr_schemes = 0, ret;
+
+	if (*ppos)
+		return -EINVAL;
+
+	kbuf = kmalloc_array(count, sizeof(char), GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	ret = simple_write_to_buffer(kbuf, count, ppos, buf, count);
+	if (ret < 0)
+		goto out;
+
+	schemes = str_to_schemes(kbuf, ret, &nr_schemes);
+
+	spin_lock(&ctx->kdamond_lock);
+	if (ctx->kdamond)
+		goto monitor_running;
+
+	damon_set_schemes(ctx, schemes, nr_schemes);
+	spin_unlock(&ctx->kdamond_lock);
+	goto out;
+
+monitor_running:
+	spin_unlock(&ctx->kdamond_lock);
+	pr_err("%s: kdamond is running. Turn it off first.\n", __func__);
+	ret = -EINVAL;
+	free_schemes_arr(schemes, nr_schemes);
+out:
+	kfree(kbuf);
+	return ret;
+}
+
 static ssize_t damon_sprint_pids(struct damon_ctx *ctx, char *buf, ssize_t len)
 {
 	struct damon_task *t;
@@ -1536,6 +1697,12 @@ static const struct file_operations pids_fops = {
 	.write = debugfs_pids_write,
 };
 
+static const struct file_operations schemes_fops = {
+	.owner = THIS_MODULE,
+	.read = debugfs_schemes_read,
+	.write = debugfs_schemes_write,
+};
+
 static const struct file_operations record_fops = {
 	.owner = THIS_MODULE,
 	.read = debugfs_record_read,
@@ -1552,10 +1719,10 @@ static struct dentry *debugfs_root;
 
 static int __init debugfs_init(void)
 {
-	const char * const file_names[] = {"attrs", "record",
+	const char * const file_names[] = {"attrs", "record", "schemes",
 		"pids", "monitor_on"};
 	const struct file_operations *fops[] = {&attrs_fops, &record_fops,
-		&pids_fops, &monitor_on_fops};
+		&schemes_fops, &pids_fops, &monitor_on_fops};
 	int i;
 
 	debugfs_root = debugfs_create_dir("damon", NULL);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 5/7] mm/damon-test: Add kunit test case for regions age accounting
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
                   ` (3 preceding siblings ...)
  2020-03-03 12:14 ` [RFC v4 4/7] mm/damon/schemes: Implement a debugfs interface SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-03 12:14 ` [RFC v4 6/7] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
  2020-03-03 12:14 ` [RFC v4 7/7] damon/tools: Support more human friendly 'schemes' control SeongJae Park
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

After merges of regions, each region should know their last shape in
proper way to measure the changes from the last modification and reset
the age if the changes are significant.  This commit adds kunit test
cases checking whether the regions are knowing their last shape properly
after merges of regions.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
---
 mm/damon-test.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/damon-test.h b/mm/damon-test.h
index c7dc21325c77..2ba757357211 100644
--- a/mm/damon-test.h
+++ b/mm/damon-test.h
@@ -540,6 +540,8 @@ static void damon_test_merge_regions_of(struct kunit *test)
 
 	unsigned long saddrs[] = {0, 114, 130, 156, 170};
 	unsigned long eaddrs[] = {112, 130, 156, 170, 230};
+	unsigned long lsa[] = {0, 114, 130, 156, 184};
+	unsigned long lea[] = {100, 122, 156, 170, 230};
 	int i;
 
 	t = damon_new_task(42);
@@ -556,6 +558,9 @@ static void damon_test_merge_regions_of(struct kunit *test)
 		r = damon_nth_region_of(t, i);
 		KUNIT_EXPECT_EQ(test, r->vm_start, saddrs[i]);
 		KUNIT_EXPECT_EQ(test, r->vm_end, eaddrs[i]);
+		KUNIT_EXPECT_EQ(test, r->last_vm_start, lsa[i]);
+		KUNIT_EXPECT_EQ(test, r->last_vm_end, lea[i]);
+
 	}
 	damon_free_task(t);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 6/7] mm/damon/selftests: Add 'schemes' debugfs tests
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
                   ` (4 preceding siblings ...)
  2020-03-03 12:14 ` [RFC v4 5/7] mm/damon-test: Add kunit test case for regions age accounting SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  2020-03-03 12:14 ` [RFC v4 7/7] damon/tools: Support more human friendly 'schemes' control SeongJae Park
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit adds simple selftets for 'schemes' debugfs file of DAMON.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 .../testing/selftests/damon/debugfs_attrs.sh  | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index d5188b0f71b1..82a98c81975b 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -97,6 +97,35 @@ fi
 
 echo $ORIG_CONTENT > $file
 
+# Test schemes file
+file="$DBGFS/schemes"
+
+ORIG_CONTENT=$(cat $file)
+echo "1 2 3 4 5 6 3" > $file
+if [ $? -ne 0 ]
+then
+	echo "$file write fail"
+	echo $ORIG_CONTENT > $file
+	exit 1
+fi
+
+echo "1 2
+3 4 5 6 3" > $file
+if [ $? -eq 0 ]
+then
+	echo "$file splitted write success (expected fail)"
+	echo $ORIG_CONTENT > $file
+	exit 1
+fi
+
+echo > $file
+if [ $? -ne 0 ]
+then
+	echo "$file empty string writing fail"
+	echo $ORIG_CONTENT > $file
+	exit 1
+fi
+
 # Test pids file
 file="$DBGFS/pids"
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v4 7/7] damon/tools: Support more human friendly 'schemes' control
  2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
                   ` (5 preceding siblings ...)
  2020-03-03 12:14 ` [RFC v4 6/7] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
@ 2020-03-03 12:14 ` SeongJae Park
  6 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-03 12:14 UTC (permalink / raw)
  To: akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, riel, rientjes, rostedt, shuah, sj38.park,
	vbabka, vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

From: SeongJae Park <sjpark@amazon.de>

This commit implements 'schemes' subcommand of the damon userspace tool.
It can be used to describe and apply the data access monitoring-based
operation schemes in more human friendly fashion.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/damon/_convert_damos.py | 125 +++++++++++++++++++++++++++++
 tools/damon/_damon.py         | 143 ++++++++++++++++++++++++++++++++++
 tools/damon/damo              |   7 ++
 tools/damon/record.py         | 135 +++-----------------------------
 tools/damon/schemes.py        | 105 +++++++++++++++++++++++++
 5 files changed, 392 insertions(+), 123 deletions(-)
 create mode 100755 tools/damon/_convert_damos.py
 create mode 100644 tools/damon/_damon.py
 create mode 100644 tools/damon/schemes.py

diff --git a/tools/damon/_convert_damos.py b/tools/damon/_convert_damos.py
new file mode 100755
index 000000000000..0f1e7e3d4ccc
--- /dev/null
+++ b/tools/damon/_convert_damos.py
@@ -0,0 +1,125 @@
+#!/usr/bin/env python3
+
+"""
+Change human readable data access monitoring-based operation schemes to the low
+level input for the '<debugfs>/damon/schemes' file.  Below is an example of the
+schemes written in the human readable format:
+
+# format is: <min/max size> <min/max frequency (0-100)> <min/max age> <action>
+# lines starts with '#' or blank are ignored.
+# B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB
+# us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days
+# 'null' means zero, which passes the check
+
+# if a region (no matter of its size) keeps a high access frequency for more
+# than 100ms, put the region on the head of the LRU list (call madvise() with
+# MADV_WILLNEED).
+null	null	80	null	100ms	null	willneed
+
+# if a region keeps a low access frequency for more than 100ms, put the
+# region on the tail of the LRU list (call madvise() with MADV_COLD).
+0B	0B	10	20	200ms	1h cold
+
+# if a region keeps a very low access frequency for more than 100ms, swap
+# out the region immediately (call madvise() with MADV_PAGEOUT).
+0B	null	0	10	100ms	2h pageout
+
+# if a region of a size bigger than 2MiB keeps a very high access frequency
+# for more than 100ms, let the region to use huge pages (call madvise()
+# with MADV_HUGEPAGE).
+2M	null	90	99	100ms	2h hugepage
+
+# If a regions of a size bigger than 2MiB keeps no high access frequency
+# for more than 100ms, avoid the region from using huge pages (call
+# madvise() with MADV_NOHUGEPAGE).
+2M	null	0	25	100ms	2h nohugepage
+"""
+
+import argparse
+
+unit_to_bytes = {'B': 1, 'K': 1024, 'M': 1024 * 1024, 'G': 1024 * 1024 * 1024,
+        'T': 1024 * 1024 * 1024 * 1024}
+
+def text_to_bytes(txt):
+    if txt == 'null':
+        return 0
+    unit = txt[-1]
+    number = int(txt[:-1])
+    return number * unit_to_bytes[unit]
+
+unit_to_usecs = {'us': 1, 'ms': 1000, 's': 1000 * 1000, 'm': 60 * 1000 * 1000,
+        'h': 60 * 60 * 1000 * 1000, 'd': 24 * 60 * 60 * 1000 * 1000}
+
+def text_to_us(txt):
+    if txt == 'null':
+        return 0
+    unit = txt[-2:]
+    if unit in ['us', 'ms']:
+        number = int(txt[:-2])
+    else:
+        unit = txt[-1]
+        number = int(txt[:-1])
+    return number * unit_to_usecs[unit]
+
+damos_action_to_int = {'DAMOS_WILLNEED': 0, 'DAMOS_COLD': 1,
+        'DAMOS_PAGEOUT': 2, 'DAMOS_HUGEPAGE': 3, 'DAMOS_NOHUGEPAGE': 4}
+
+def text_to_damos_action(txt):
+    return damos_action_to_int['DAMOS_' + txt.upper()]
+
+def text_to_nr_accesses(txt, max_nr_accesses):
+    if txt == 'null':
+        return 0
+    return int(int(txt) * max_nr_accesses / 100)
+
+def debugfs_scheme(line, sample_interval, aggr_interval):
+    fields = line.split()
+    if len(fields) != 7:
+        print('wrong input line: %s' % line)
+        exit(1)
+
+    limit_nr_accesses = aggr_interval / sample_interval
+    try:
+        min_sz = text_to_bytes(fields[0])
+        max_sz = text_to_bytes(fields[1])
+        min_nr_accesses = text_to_nr_accesses(fields[2], limit_nr_accesses)
+        max_nr_accesses = text_to_nr_accesses(fields[3], limit_nr_accesses)
+        min_age = text_to_us(fields[4]) / aggr_interval
+        max_age = text_to_us(fields[5]) / aggr_interval
+        action = text_to_damos_action(fields[6])
+    except:
+        print('wrong input field')
+        raise
+    return '%d\t%d\t%d\t%d\t%d\t%d\t%d' % (min_sz, max_sz, min_nr_accesses,
+            max_nr_accesses, min_age, max_age, action)
+
+def convert(schemes_file, sample_interval, aggr_interval):
+    lines = []
+    with open(schemes_file, 'r') as f:
+        for line in f:
+            if line.startswith('#'):
+                continue
+            line = line.strip()
+            if line == '':
+                continue
+            lines.append(debugfs_scheme(line, sample_interval, aggr_interval))
+    return '\n'.join(lines)
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('input', metavar='<file>',
+            help='input file describing the schemes')
+    parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+            default=5000, help='sampling interval (us)')
+    parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+            default=100000, help='aggregation interval (us)')
+    args = parser.parse_args()
+
+    schemes_file = args.input
+    sample_interval = args.sample
+    aggr_interval = args.aggr
+
+    print(convert(schemes_file, sample_interval, aggr_interval))
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/damon/_damon.py b/tools/damon/_damon.py
new file mode 100644
index 000000000000..0a703ec7471a
--- /dev/null
+++ b/tools/damon/_damon.py
@@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Contains core functions for DAMON debugfs control.
+"""
+
+import os
+import subprocess
+
+debugfs_attrs = None
+debugfs_record = None
+debugfs_schemes = None
+debugfs_pids = None
+debugfs_monitor_on = None
+
+def set_target_pid(pid):
+    return subprocess.call('echo %s > %s' % (pid, debugfs_pids), shell=True,
+            executable='/bin/bash')
+
+def turn_damon(on_off):
+    return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
+            shell=True, executable="/bin/bash")
+
+def is_damon_running():
+    with open(debugfs_monitor_on, 'r') as f:
+        return f.read().strip() == 'on'
+
+class Attrs:
+    sample_interval = None
+    aggr_interval = None
+    regions_update_interval = None
+    min_nr_regions = None
+    max_nr_regions = None
+    rbuf_len = None
+    rfile_path = None
+    schemes = None
+
+    def __init__(self, s, a, r, n, x, l, f, c):
+        self.sample_interval = s
+        self.aggr_interval = a
+        self.regions_update_interval = r
+        self.min_nr_regions = n
+        self.max_nr_regions = x
+        self.rbuf_len = l
+        self.rfile_path = f
+        self.schemes = c
+
+    def __str__(self):
+        return "%s %s %s %s %s %s %s\n%s" % (self.sample_interval,
+                self.aggr_interval, self.regions_update_interval,
+                self.min_nr_regions, self.max_nr_regions, self.rbuf_len,
+                self.rfile_path, self.schemes)
+
+    def attr_str(self):
+        return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
+                self.regions_update_interval, self.min_nr_regions,
+                self.max_nr_regions)
+
+    def record_str(self):
+        return '%s %s ' % (self.rbuf_len, self.rfile_path)
+
+    def apply(self):
+        ret = subprocess.call('echo %s > %s' % (self.attr_str(), debugfs_attrs),
+                shell=True, executable='/bin/bash')
+        if ret:
+            return ret
+        ret = subprocess.call('echo %s > %s' % (self.record_str(),
+            debugfs_record), shell=True, executable='/bin/bash')
+        if ret:
+            return ret
+        return subprocess.call('echo %s > %s' % (
+            self.schemes.replace('\n', ' '), debugfs_schemes), shell=True,
+            executable='/bin/bash')
+
+def current_attrs():
+    with open(debugfs_attrs, 'r') as f:
+        attrs = f.read().split()
+    attrs = [int(x) for x in attrs]
+
+    with open(debugfs_record, 'r') as f:
+        rattrs = f.read().split()
+    attrs.append(int(rattrs[0]))
+    attrs.append(rattrs[1])
+
+    with open(debugfs_schemes, 'r') as f:
+        schemes = f.read()
+    attrs.append(schemes)
+
+    return Attrs(*attrs)
+
+def chk_update_debugfs(debugfs):
+    global debugfs_attrs
+    global debugfs_record
+    global debugfs_schemes
+    global debugfs_pids
+    global debugfs_monitor_on
+
+    debugfs_damon = os.path.join(debugfs, 'damon')
+    debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
+    debugfs_record = os.path.join(debugfs_damon, 'record')
+    debugfs_schemes = os.path.join(debugfs_damon, 'schemes')
+    debugfs_pids = os.path.join(debugfs_damon, 'pids')
+    debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
+
+    if not os.path.isdir(debugfs_damon):
+        print("damon debugfs dir (%s) not found", debugfs_damon)
+        exit(1)
+
+    for f in [debugfs_attrs, debugfs_record, debugfs_schemes, debugfs_pids,
+            debugfs_monitor_on]:
+        if not os.path.isfile(f):
+            print("damon debugfs file (%s) not found" % f)
+            exit(1)
+
+def cmd_args_to_attrs(args):
+    "Generate attributes with specified arguments"
+    sample_interval = args.sample
+    aggr_interval = args.aggr
+    regions_update_interval = args.updr
+    min_nr_regions = args.minr
+    max_nr_regions = args.maxr
+    rbuf_len = args.rbuf
+    if not os.path.isabs(args.out):
+        args.out = os.path.join(os.getcwd(), args.out)
+    rfile_path = args.out
+    schemes = args.schemes
+    return Attrs(sample_interval, aggr_interval, regions_update_interval,
+            min_nr_regions, max_nr_regions, rbuf_len, rfile_path, schemes)
+
+def set_attrs_argparser(parser):
+    parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
+            default='/sys/kernel/debug', help='debugfs mounted path')
+    parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+            default=5000, help='sampling interval')
+    parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+            default=100000, help='aggregate interval')
+    parser.add_argument('-u', '--updr', metavar='<interval>', type=int,
+            default=1000000, help='regions update interval')
+    parser.add_argument('-n', '--minr', metavar='<# regions>', type=int,
+            default=10, help='minimal number of regions')
+    parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
+            default=1000, help='maximum number of regions')
diff --git a/tools/damon/damo b/tools/damon/damo
index 58e1099ae5fc..ce7180069bef 100755
--- a/tools/damon/damo
+++ b/tools/damon/damo
@@ -5,6 +5,7 @@ import argparse
 
 import record
 import report
+import schemes
 
 class SubCmdHelpFormatter(argparse.RawDescriptionHelpFormatter):
     def _format_action(self, action):
@@ -25,6 +26,10 @@ parser_record = subparser.add_parser('record',
         help='record data accesses of the given target processes')
 record.set_argparser(parser_record)
 
+parser_schemes = subparser.add_parser('schemes',
+        help='apply operation schemes to the given target process')
+schemes.set_argparser(parser_schemes)
+
 parser_report = subparser.add_parser('report',
         help='report the recorded data accesses in the specified form')
 report.set_argparser(parser_report)
@@ -33,5 +38,7 @@ args = parser.parse_args()
 
 if args.command == 'record':
     record.main(args)
+elif args.command == 'schemes':
+    schemes.main(args)
 elif args.command == 'report':
     report.main(args)
diff --git a/tools/damon/record.py b/tools/damon/record.py
index a547d479a103..3bbf7b8359da 100644
--- a/tools/damon/record.py
+++ b/tools/damon/record.py
@@ -6,28 +6,12 @@ Record data access patterns of the target process.
 """
 
 import argparse
-import copy
 import os
 import signal
 import subprocess
 import time
 
-debugfs_attrs = None
-debugfs_record = None
-debugfs_pids = None
-debugfs_monitor_on = None
-
-def set_target_pid(pid):
-    return subprocess.call('echo %s > %s' % (pid, debugfs_pids), shell=True,
-            executable='/bin/bash')
-
-def turn_damon(on_off):
-    return subprocess.call("echo %s > %s" % (on_off, debugfs_monitor_on),
-            shell=True, executable="/bin/bash")
-
-def is_damon_running():
-    with open(debugfs_monitor_on, 'r') as f:
-        return f.read().strip() == 'on'
+import _damon
 
 def do_record(target, is_target_cmd, attrs, old_attrs):
     if os.path.isfile(attrs.rfile_path):
@@ -36,93 +20,29 @@ def do_record(target, is_target_cmd, attrs, old_attrs):
     if attrs.apply():
         print('attributes (%s) failed to be applied' % attrs)
         cleanup_exit(old_attrs, -1)
-    print('# damon attrs: %s' % attrs)
+    print('# damon attrs: %s %s' % (attrs.attr_str(), attrs.record_str()))
     if is_target_cmd:
         p = subprocess.Popen(target, shell=True, executable='/bin/bash')
         target = p.pid
-    if set_target_pid(target):
+    if _damon.set_target_pid(target):
         print('pid setting (%s) failed' % target)
         cleanup_exit(old_attrs, -2)
-    if turn_damon('on'):
+    if _damon.turn_damon('on'):
         print('could not turn on damon' % target)
         cleanup_exit(old_attrs, -3)
     if is_target_cmd:
         p.wait()
     while True:
         # damon will turn it off by itself if the target tasks are terminated.
-        if not is_damon_running():
+        if not _damon.is_damon_running():
             break
         time.sleep(1)
 
     cleanup_exit(old_attrs, 0)
 
-class Attrs:
-    sample_interval = None
-    aggr_interval = None
-    regions_update_interval = None
-    min_nr_regions = None
-    max_nr_regions = None
-    rbuf_len = None
-    rfile_path = None
-
-    def __init__(self, s, a, r, n, x, l, f):
-        self.sample_interval = s
-        self.aggr_interval = a
-        self.regions_update_interval = r
-        self.min_nr_regions = n
-        self.max_nr_regions = x
-        self.rbuf_len = l
-        self.rfile_path = f
-
-    def __str__(self):
-        return "%s %s %s %s %s %s %s" % (self.sample_interval, self.aggr_interval,
-                self.regions_update_interval, self.min_nr_regions,
-                self.max_nr_regions, self.rbuf_len, self.rfile_path)
-
-    def attr_str(self):
-        return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
-                self.regions_update_interval, self.min_nr_regions,
-                self.max_nr_regions)
-
-    def record_str(self):
-        return '%s %s ' % (self.rbuf_len, self.rfile_path)
-
-    def apply(self):
-        ret = subprocess.call('echo %s > %s' % (self.attr_str(), debugfs_attrs),
-                shell=True, executable='/bin/bash')
-        if ret:
-            return ret
-        return subprocess.call('echo %s > %s' % (self.record_str(),
-            debugfs_record), shell=True, executable='/bin/bash')
-
-def current_attrs():
-    with open(debugfs_attrs, 'r') as f:
-        attrs = f.read().split()
-    attrs = [int(x) for x in attrs]
-
-    with open(debugfs_record, 'r') as f:
-        rattrs = f.read().split()
-    attrs.append(int(rattrs[0]))
-    attrs.append(rattrs[1])
-    return Attrs(*attrs)
-
-def cmd_args_to_attrs(args):
-    "Generate attributes with specified arguments"
-    sample_interval = args.sample
-    aggr_interval = args.aggr
-    regions_update_interval = args.updr
-    min_nr_regions = args.minr
-    max_nr_regions = args.maxr
-    rbuf_len = args.rbuf
-    if not os.path.isabs(args.out):
-        args.out = os.path.join(os.getcwd(), args.out)
-    rfile_path = args.out
-    return Attrs(sample_interval, aggr_interval, regions_update_interval,
-            min_nr_regions, max_nr_regions, rbuf_len, rfile_path)
-
 def cleanup_exit(orig_attrs, exit_code):
-    if is_damon_running():
-        if turn_damon('off'):
+    if _damon.is_damon_running():
+        if _damon.turn_damon('off'):
             print('failed to turn damon off!')
     if orig_attrs:
         if orig_attrs.apply():
@@ -133,51 +53,19 @@ def sighandler(signum, frame):
     print('\nsignal %s received' % signum)
     cleanup_exit(orig_attrs, signum)
 
-def chk_update_debugfs(debugfs):
-    global debugfs_attrs
-    global debugfs_record
-    global debugfs_pids
-    global debugfs_monitor_on
-
-    debugfs_damon = os.path.join(debugfs, 'damon')
-    debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
-    debugfs_record = os.path.join(debugfs_damon, 'record')
-    debugfs_pids = os.path.join(debugfs_damon, 'pids')
-    debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')
-
-    if not os.path.isdir(debugfs_damon):
-        print("damon debugfs dir (%s) not found", debugfs_damon)
-        exit(1)
-
-    for f in [debugfs_attrs, debugfs_record, debugfs_pids, debugfs_monitor_on]:
-        if not os.path.isfile(f):
-            print("damon debugfs file (%s) not found" % f)
-            exit(1)
-
 def chk_permission():
     if os.geteuid() != 0:
         print("Run as root")
         exit(1)
 
 def set_argparser(parser):
+    _damon.set_attrs_argparser(parser)
     parser.add_argument('target', type=str, metavar='<target>',
             help='the target command or the pid to record')
-    parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
-            default=5000, help='sampling interval')
-    parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
-            default=100000, help='aggregate interval')
-    parser.add_argument('-u', '--updr', metavar='<interval>', type=int,
-            default=1000000, help='regions update interval')
-    parser.add_argument('-n', '--minr', metavar='<# regions>', type=int,
-            default=10, help='minimal number of regions')
-    parser.add_argument('-m', '--maxr', metavar='<# regions>', type=int,
-            default=1000, help='maximum number of regions')
     parser.add_argument('-l', '--rbuf', metavar='<len>', type=int,
             default=1024*1024, help='length of record result buffer')
     parser.add_argument('-o', '--out', metavar='<file path>', type=str,
             default='damon.data', help='output file path')
-    parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
-            default='/sys/kernel/debug', help='debugfs mounted path')
 
 def main(args=None):
     global orig_attrs
@@ -187,13 +75,14 @@ def main(args=None):
         args = parser.parse_args()
 
     chk_permission()
-    chk_update_debugfs(args.debugfs)
+    _damon.chk_update_debugfs(args.debugfs)
 
     signal.signal(signal.SIGINT, sighandler)
     signal.signal(signal.SIGTERM, sighandler)
-    orig_attrs = current_attrs()
+    orig_attrs = _damon.current_attrs()
 
-    new_attrs = cmd_args_to_attrs(args)
+    args.schemes = ''
+    new_attrs = _damon.cmd_args_to_attrs(args)
     target = args.target
 
     target_fields = target.split()
diff --git a/tools/damon/schemes.py b/tools/damon/schemes.py
new file mode 100644
index 000000000000..408a73813234
--- /dev/null
+++ b/tools/damon/schemes.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Apply given operation schemes to the target process.
+"""
+
+import argparse
+import os
+import signal
+import subprocess
+import time
+
+import _convert_damos
+import _damon
+
+def run_damon(target, is_target_cmd, attrs, old_attrs):
+    if os.path.isfile(attrs.rfile_path):
+        os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
+
+    if attrs.apply():
+        print('attributes (%s) failed to be applied' % attrs)
+        cleanup_exit(old_attrs, -1)
+    print('# damon attrs: %s %s' % (attrs.attr_str(), attrs.record_str()))
+    for line in attrs.schemes.split('\n'):
+        print('# scheme: %s' % line)
+    if is_target_cmd:
+        p = subprocess.Popen(target, shell=True, executable='/bin/bash')
+        target = p.pid
+    if _damon.set_target_pid(target):
+        print('pid setting (%s) failed' % target)
+        cleanup_exit(old_attrs, -2)
+    if _damon.turn_damon('on'):
+        print('could not turn on damon' % target)
+        cleanup_exit(old_attrs, -3)
+    if is_target_cmd:
+        p.wait()
+    while True:
+        # damon will turn it off by itself if the target tasks are terminated.
+        if not _damon.is_damon_running():
+            break
+        time.sleep(1)
+
+    cleanup_exit(old_attrs, 0)
+
+def cleanup_exit(orig_attrs, exit_code):
+    if _damon.is_damon_running():
+        if turn_damon('off'):
+            print('failed to turn damon off!')
+    if orig_attrs:
+        if orig_attrs.apply():
+            print('original attributes (%s) restoration failed!' % orig_attrs)
+    exit(exit_code)
+
+def sighandler(signum, frame):
+    print('\nsignal %s received' % signum)
+    cleanup_exit(orig_attrs, signum)
+
+def chk_permission():
+    if os.geteuid() != 0:
+        print("Run as root")
+        exit(1)
+
+def set_argparser(parser):
+    _damon.set_attrs_argparser(parser)
+    parser.add_argument('target', type=str, metavar='<target>',
+            help='the target command or the pid to record')
+    parser.add_argument('-c', '--schemes', metavar='<file>', type=str,
+            default='damon.schemes',
+            help='data access monitoring-based operation schemes')
+
+def main(args=None):
+    global orig_attrs
+    if not args:
+        parser = argparse.ArgumentParser()
+        set_argparser(parser)
+        args = parser.parse_args()
+
+    chk_permission()
+    _damon.chk_update_debugfs(args.debugfs)
+
+    signal.signal(signal.SIGINT, sighandler)
+    signal.signal(signal.SIGTERM, sighandler)
+    orig_attrs = _damon.current_attrs()
+
+    args.rbuf = 0
+    args.out = 'null'
+    args.schemes = _convert_damos.convert(args.schemes, args.sample, args.aggr)
+    new_attrs = _damon.cmd_args_to_attrs(args)
+    target = args.target
+
+    target_fields = target.split()
+    if not subprocess.call('which %s > /dev/null' % target_fields[0],
+            shell=True, executable='/bin/bash'):
+        run_damon(target, True, new_attrs, orig_attrs)
+    else:
+        try:
+            pid = int(target)
+        except:
+            print('target \'%s\' is neither a command, nor a pid' % target)
+            exit(1)
+        run_damon(target, False, new_attrs, orig_attrs)
+
+if __name__ == '__main__':
+    main()
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC v4 2/7] mm/damon: Account age of target regions
  2020-03-03 12:14 ` [RFC v4 2/7] mm/damon: Account age of target regions SeongJae Park
@ 2020-03-04 15:21   ` Rik van Riel
  2020-03-04 16:07     ` SeongJae Park
  0 siblings, 1 reply; 10+ messages in thread
From: Rik van Riel @ 2020-03-04 15:21 UTC (permalink / raw)
  To: SeongJae Park, akpm
  Cc: SeongJae Park, aarcange, acme, alexander.shishkin, amit,
	brendan.d.gregg, brendanhiggins, cai, colin.king, corbet, dwmw,
	jolsa, kirill, mark.rutland, mgorman, minchan, mingo, namhyung,
	peterz, rdunlap, rientjes, rostedt, shuah, sj38.park, vbabka,
	vdavydov.dev, yang.shi, ying.huang, linux-mm, linux-doc,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Tue, 2020-03-03 at 13:14 +0100, SeongJae Park wrote:
> From: SeongJae Park <sjpark@amazon.de>

> --- a/mm/damon.c
> +++ b/mm/damon.c
> @@ -87,6 +87,10 @@ static struct damon_region
> *damon_new_region(struct damon_ctx *ctx,
>  	ret->sampling_addr = damon_rand(ctx, vm_start, vm_end);
>  	INIT_LIST_HEAD(&ret->list);
>  
> +	ret->age = 0;
> +	ret->last_vm_start = vm_start;
> +	ret->last_vm_end = vm_end;

Wait, what tree is this supposed to apply against?

I see no mm/damon.c file in current Linus upstream.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [RFC v4 2/7] mm/damon: Account age of target regions
  2020-03-04 15:21   ` Rik van Riel
@ 2020-03-04 16:07     ` SeongJae Park
  0 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2020-03-04 16:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: SeongJae Park, akpm, SeongJae Park, aarcange, acme,
	alexander.shishkin, amit, brendan.d.gregg, brendanhiggins, cai,
	colin.king, corbet, dwmw, jolsa, kirill, mark.rutland, mgorman,
	minchan, mingo, namhyung, peterz, rdunlap, rientjes, rostedt,
	shuah, sj38.park, vbabka, vdavydov.dev, yang.shi, ying.huang,
	linux-mm, linux-doc, linux-kernel

Hello Rick,

Thank you for question :)

On Wed, 04 Mar 2020 10:21:29 -0500 Rik van Riel <riel@surriel.com> wrote:

> [-- Attachment #1: Type: text/plain, Size: 558 bytes --]
> 
> On Tue, 2020-03-03 at 13:14 +0100, SeongJae Park wrote:
> > From: SeongJae Park <sjpark@amazon.de>
> 
> > --- a/mm/damon.c
> > +++ b/mm/damon.c
> > @@ -87,6 +87,10 @@ static struct damon_region
> > *damon_new_region(struct damon_ctx *ctx,
> >  	ret->sampling_addr = damon_rand(ctx, vm_start, vm_end);
> >  	INIT_LIST_HEAD(&ret->list);
> >  
> > +	ret->age = 0;
> > +	ret->last_vm_start = vm_start;
> > +	ret->last_vm_end = vm_end;
> 
> Wait, what tree is this supposed to apply against?
> 
> I see no mm/damon.c file in current Linus upstream.

This patchset is supposed to apply against v5.5 plus DAMON patchset[1] plus a
patch from Minchan.  You can get the tree this patchset is applied via:

    $ git clone git://github.com/sjp38/linux -b damos/rfc/v4

Or, the web is also available:
https://github.com/sjp38/linux/releases/tag/damos/rfc/v4

I am posting this as a seperate RFC patchset because 1) this patchset is based
on the tree other than Linus or other maintainers' upstream trees, 2) I
want to keep the size of original patchset small for convenience of reviewers,
3) this patchset is relatively recently made and thus might unstable compared
to the DAMON patchset[1], and 4) I want to share my plan and get early
feedbacks as many as possible.

Sorry if this made you confused.  Also, if you have some opinions regarding
this seperated postings, please let me know.


[1] https://lore.kernel.org/linux-mm/20200224123047.32506-1-sjpark@amazon.com


Thanks,
SeongJae Park

> 
> -- 
> All Rights Reversed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-04 16:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 12:13 [RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes SeongJae Park
2020-03-03 12:14 ` [RFC v4 1/7] mm/madvise: Export madvise_common() to mm internal code SeongJae Park
2020-03-03 12:14 ` [RFC v4 2/7] mm/damon: Account age of target regions SeongJae Park
2020-03-04 15:21   ` Rik van Riel
2020-03-04 16:07     ` SeongJae Park
2020-03-03 12:14 ` [RFC v4 3/7] mm/damon: Implement data access monitoring-based operation schemes SeongJae Park
2020-03-03 12:14 ` [RFC v4 4/7] mm/damon/schemes: Implement a debugfs interface SeongJae Park
2020-03-03 12:14 ` [RFC v4 5/7] mm/damon-test: Add kunit test case for regions age accounting SeongJae Park
2020-03-03 12:14 ` [RFC v4 6/7] mm/damon/selftests: Add 'schemes' debugfs tests SeongJae Park
2020-03-03 12:14 ` [RFC v4 7/7] damon/tools: Support more human friendly 'schemes' control SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).