From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753329AbcBPDHx (ORCPT ); Mon, 15 Feb 2016 22:07:53 -0500 Received: from alln-iport-1.cisco.com ([173.37.142.88]:10776 "EHLO alln-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752813AbcBPDHu convert rfc822-to-8bit (ORCPT ); Mon, 15 Feb 2016 22:07:50 -0500 X-Greylist: delayed 584 seconds by postgrey-1.27 at vger.kernel.org; Mon, 15 Feb 2016 22:07:50 EST X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AOBQAfj8JW/4UNJK1eDoMsgT8GuAiCI?= =?us-ascii?q?YFnhg0CgTs6EgEBAQEBAQGBCoRCAQEEeRACAQgYLjIlAgQBDQWIGrgEAQEBAQE?= =?us-ascii?q?BAQEBAQEBAQEBAQEBAQEWikaEDCSEPAWNX4kaAYg8hRmBXIRDiFREjXoBJwgzg?= =?us-ascii?q?X0CHE8BPTtqhxY9fAEBAQ?= X-IronPort-AV: E=Sophos;i="5.22,453,1449532800"; d="scan'208";a="238998002" From: "Nag Avadhanam (nag)" To: "Theodore Ts'o" , "Daniel Walker (danielwa)" CC: Dave Chinner , Alexander Viro , "Khalid Mughal (khalidm)" , "xe-kernel@external.cisco.com" , "dave.hansen@intel.com" , "hannes@cmpxchg.org" , "riel@redhat.com" , Jonathan Corbet , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Thread-Topic: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Thread-Index: AQHRaEvuzNlYyo5K+UO4i7Eui4IavJ8uOqmA//+e6YA= Date: Tue, 16 Feb 2016 02:58:04 +0000 Message-ID: References: <1455308080-27238-1-git-send-email-danielwa@cisco.com> <20160214211856.GT19486@dastard> <56C216CA.7000703@cisco.com> <20160215230511.GU19486@dastard> <56C264BF.3090100@cisco.com> <20160216004531.GA28260@thunk.org> In-Reply-To: <20160216004531.GA28260@thunk.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.3.140616 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.154.200.50] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <8F6A8761ADCB2246BFAADE662A180BE7@emea.cisco.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have a class of platforms that are essentially swap-less embedded systems that have limited memory resources (2GB and less). There is a need to implement early alerts (before the OOM killer kicks in) based on the current memory usage so admins can take appropriate steps (do not initiate provisioning operations but support existing services, de-provision certain services, etc. based on the extent of memory usage in the system) . There is also a general need to let end users know the available memory so they can determine if they can enable new services (helps in planning). These two depend upon knowing approximate (accurate within few 10s of MB) memory usage within the system. We want to alert admins before system exhibits any thrashing behaviors. We find the source of accounting anomalies to be the page cache accounting. Anonymous page accounting is fine. Page cache usage on our system can be attributed to these ­ file system cache, shared memory store (non-reclaimable) and the in-memory file systems (non-reclaimable). We know the sizes of the shared memory stores and the in memory file system sizes. If we can determine the amount of reclaimable file system cache (+/- few 10s of MB), we can improve the serviceability of these systems. Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs pages + # of bytes of non-reclaimable file system cache pages) gives us a measure of the available memory. Its the calculation of the # of bytes of non-reclaimable file system cache pages that has been troubling us. We do not want to count inactive file pages (of programs/binaries) that were once mapped by any process in the system as reclaimable because that might lead to thrashing under memory pressure (we want to alert admins before system starts dropping text pages). >>From our experiments, we determined running a VM scan looking for droppable pages came close to establishing that number. If there are cheaper ways of determining this stat, please let us know. Thanks, nag On 2/15/16, 4:45 PM, "Theodore Ts'o" wrote: >On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote: >> >>We need it to determine accurately what the free memory in the >> >>system is. If you know where we can get this information already >> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't >> >>accurate enough. >> >> Approximate point-in-time indication is an accurate characterization >> of what we are doing. This is good enough for us. NO matter what we >> do, we are never going to be able to address the "time of check to >> time of useČ window. But, this approximation works reasonably well >> for our use case. > >Why do you need such accuracy, and what do you consider "good enough". >Having something which iterates over all of the inodes in the system >is something that really shouldn't be in a general production kernel >At the very least it should only be accessible by root (so now only a >careless system administrator can DOS attack the system) but the >Dave's original question still stands. Why do you need a certain >level of accuracy regarding how much memory is available after >dropping all of the caches? What problem are you trying to >solve/avoid? > >It may be that you are going about things completely the wrong way, >which is why understanding the higher order problem you are trying to >solve might be helpful in finding something which is safer, >architecturally cleaner, and something that could go into the upstream >kernel. > >Cheers, > > - Ted