All of lore.kernel.org
 help / color / mirror / Atom feed
* Memory Trace Project Help
       [not found] <CAOFJiu2p-Wa_g5Pt18RyDFKAVtyfY9S0kp7Fs7=bQs8sWP8UsQ@mail.gmail.com>
@ 2011-08-29  9:25 ` Sameer Pramod Niphadkar
  2011-08-30  4:36   ` Todd Deshane
  0 siblings, 1 reply; 4+ messages in thread
From: Sameer Pramod Niphadkar @ 2011-08-29  9:25 UTC (permalink / raw)
  To: Xen-devel

Hi guys,

I hope to get your valuable inputs to this pet project of mine, please
do feel free to mention your ideas, suggestions and recommendations
for the same.

I've collected a huge number of memory traces almost 10 GB of data.
These memory traces were gathered from a set of servers, desktops, and
laptops in a university CS Department. Each trace file contains a list
of hashes representing the contents of the machine's memory, as well
as some meta information about the running processes and OS type.

The traces have been grouped by type and date. Traces were recorded
approximately every 30 minutes, although if machines were turned off
or away from an internet connection for a long period, no traces were
acquired. Each trace file is split into two portions. The top segment
is ASCII text containing the system meta data about operating system
type and a list of running processes. This is followed by binary data
containing the list of hashes generated for each page in the system.
Hashes are stored as consecutive 32bit values. There is a simple tool
called "traceReader" for extracting the hashes from a trace file. This
takes as an argument the file to be parsed, and will output the hash
list as a series of integer values. If you would like to compare to
traces to estimate the amount of sharing between them, you could run:

./traceReader trace-x.dat > trace-all
./traceReader trace-y.dat >> trace-all
cat trace-all | sort | uniq -c

This will tell you the number of times that each hash occurs in the system.

Now my idea is to take the trace for every interval (every 30 mins)
for each of the systems and find the frequency of each memory hash. I
then plan to collect the highest frequencies (hashes maximally
occurring) of the entire hour (60 mins) and then divide the memory
into 'k' different patterns based on the count of these frequencies.
Like for instance say hashes 14F430C8 ,1550068, 15AD480A, 161384B6,
16985213, 17CA274B, 18E5F038 and 1A3329 have the highest frequencies
then I might divide the memory into 8 patterns (k=8). I plan to use
the Approximate Nearest neighbor algorithm (ANN)
http://www.cs.umd.edu/~mount/ANN/ for this division. In ANN one needs
to provide a set of query points, data points and dimensions. I guess
in my case my query points can be all the remaining hashes other than
the highest frequency ones, the data points are all the hashes for the
hour and dimension can be 1. I can thus formulate the memory patterns
for every hour, I then plan to formulate memory patterns for every 3
hrs, 6 hrs, 12 hrs and finally all the 24 hrs. Armed with these
statistics, I plan to compare the patterns based on the time of the
day. I hope to provide certain overlap with the patterns and create
what I call as "heat zones" for memory based on the time of the day
and finally come up with a suitable report for the same.

The entire objective of this project is to provide a sort of relation
between the memory page access and the interval of time of the day. So
for specific intervals there are certain memory "heat zones". I
understand that these "heat zones" might change and may not be
consistent with every system and user. The study here intends to only
establish this relationship and doesn't do any kind of qualitative or
quantitative analysis of these heat zones per system and user. The
above can be considered to be an extension of this work.

Please feel free to comment and suggest for any new insights

regards
Sameer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory Trace Project Help
  2011-08-29  9:25 ` Memory Trace Project Help Sameer Pramod Niphadkar
@ 2011-08-30  4:36   ` Todd Deshane
  2011-08-30 10:51     ` Sameer Pramod Niphadkar
  0 siblings, 1 reply; 4+ messages in thread
From: Todd Deshane @ 2011-08-30  4:36 UTC (permalink / raw)
  To: Sameer Pramod Niphadkar; +Cc: Xen-devel

On Mon, Aug 29, 2011 at 5:25 AM, Sameer Pramod Niphadkar
<spniphadkar@gmail.com> wrote:

> Please feel free to comment and suggest for any new insights
>

You may find this research on Satori interesting and perhaps related:
http://www.cl.cam.ac.uk/~dgm36/publications/2009-milos2009satori.pdf

Hope that helps.

Thanks,
Todd


-- 
Todd Deshane
http://www.linkedin.com/in/deshantm
http://www.xen.org/products/cloudxen.html
http://runningxen.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory Trace Project Help
  2011-08-30  4:36   ` Todd Deshane
@ 2011-08-30 10:51     ` Sameer Pramod Niphadkar
  2011-08-31  4:54       ` Todd Deshane
  0 siblings, 1 reply; 4+ messages in thread
From: Sameer Pramod Niphadkar @ 2011-08-30 10:51 UTC (permalink / raw)
  To: Todd Deshane; +Cc: Xen-devel

On Tue, Aug 30, 2011 at 10:06 AM, Todd Deshane <todd.deshane@xen.org> wrote:
> On Mon, Aug 29, 2011 at 5:25 AM, Sameer Pramod Niphadkar
> <spniphadkar@gmail.com> wrote:
>
>> Please feel free to comment and suggest for any new insights
>>
>
> You may find this research on Satori interesting and perhaps related:
> http://www.cl.cam.ac.uk/~dgm36/publications/2009-milos2009satori.pdf
>
> Hope that helps.
>
> Thanks,
> Todd
>
>
> --
> Todd Deshane
> http://www.linkedin.com/in/deshantm
> http://www.xen.org/products/cloudxen.html
> http://runningxen.com/
>

Thanks Todd...

I knew about the Satori research before and basically got the idea for
my "heat zones" from their "enlightenment factor" - which is a means
of devising the sharing element to be used by the guest VMs. It has
also been mentioned in the paper that

"When an operating system loads data from disk, it is stored in the
page cache, and other researchers have noted that between 63.8% and
93.0% of shareable pages between VMs are part of the page cache.  For
example, VMs based on the same operating system will load identical
program binaries, configuration files and data files. In these
systems, the kernel text will also be identical, but this is loaded by
Xen domain builder (bootloader), and does not appear in the page
cache"

So the basic idea of my project can be summarized as   :

1. Find out about the processes running most frequently at a
particular time interval on different systems (this may be an easier
option)

 2. Go deeper to the physical memory(PM) trace and find the
relationship between the PM addresses and most frequent access per
universal time clock per system.

I understand that with address space randomized mappings and with
different systems running different processes it might be hard to find
any suitable pattern emerging from this study. But as most of us know
that identical systems belonging in a particular network and during a
time frame might end up accessing similar PM blocks. (A block here
being groups of pages) I intend to  find if there is any kind of
correlation between this time frame and the access. According to the
working set model of a process, there exits a temporal and spatial
locality of memory page access and hence we end up using the
appropriate page replacement algorithms. Now I intent to see if this
same analogy can be applied to the entire memory address space for
access in a system. I mean if there exists some sort of a pattern
emerging for physical memory access based on time and space.

I hope to know if there has been any similar work done before with
memory traces or if there are any other areas which I need to look
into before I can begin this study.

regards
Sameer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Memory Trace Project Help
  2011-08-30 10:51     ` Sameer Pramod Niphadkar
@ 2011-08-31  4:54       ` Todd Deshane
  0 siblings, 0 replies; 4+ messages in thread
From: Todd Deshane @ 2011-08-31  4:54 UTC (permalink / raw)
  To: Sameer Pramod Niphadkar; +Cc: Xen-devel

On Tue, Aug 30, 2011 at 6:51 AM, Sameer Pramod Niphadkar
<spniphadkar@gmail.com> wrote:
> I hope to know if there has been any similar work done before with
> memory traces or if there are any other areas which I need to look
> into before I can begin this study.
>

According to Google Scholar the Satori paper was references 29 times.
You may want to scan through those papers to see if there has been any
newer works that are relevant:

http://scholar.google.com/scholar?cites=14134863493193489553&as_sdt=5,33&sciodt=0,33&hl=en

Hope that helps.

Best of luck with your research.

Thanks,
Todd

-- 
Todd Deshane
http://www.linkedin.com/in/deshantm
http://www.xen.org/products/cloudxen.html
http://runningxen.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-08-31  4:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOFJiu2p-Wa_g5Pt18RyDFKAVtyfY9S0kp7Fs7=bQs8sWP8UsQ@mail.gmail.com>
2011-08-29  9:25 ` Memory Trace Project Help Sameer Pramod Niphadkar
2011-08-30  4:36   ` Todd Deshane
2011-08-30 10:51     ` Sameer Pramod Niphadkar
2011-08-31  4:54       ` Todd Deshane

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.