linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Cong Wang <cwang@twopensource.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.12 03/82] fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
Date: Thu, 20 Feb 2014 15:51:46 -0800	[thread overview]
Message-ID: <20140220235019.588988495@linuxfoundation.org> (raw)
In-Reply-To: <20140220235019.483734207@linuxfoundation.org>

3.12-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <ebiederm@xmission.com>

commit 96c7a2ff21501691587e1ae969b83cbec8b78e08 upstream.

Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept.  At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available.  Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem.  Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification).  We have already tried everything
reasonable to allocate a page and the only thing left to do is wait.  So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/file.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/file.c
+++ b/fs/file.c
@@ -34,7 +34,7 @@ static void *alloc_fdmem(size_t size)
 	 * vmalloc() if the allocation size will be considered "large" by the VM.
 	 */
 	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
+		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
 		if (data != NULL)
 			return data;
 	}



  parent reply	other threads:[~2014-02-21  0:39 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-20 23:51 [PATCH 3.12 00/82] 3.12.13-stable review Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 01/82] xen: properly account for _PAGE_NUMA during xen pte translations Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 02/82] xen-blkfront: handle backend CLOSED without CLOSING Greg Kroah-Hartman
2014-02-20 23:51 ` Greg Kroah-Hartman [this message]
2014-02-20 23:51 ` [PATCH 3.12 04/82] mm: fix page leak at nfs_symlink() Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 05/82] mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 06/82] CIFS: Fix SMB2 mounts so they dont try to set or get xattrs via cifs Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 07/82] Add protocol specific operation for CIFS xattrs Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 08/82] retrieving CIFS ACLs when mounted with SMB2 fails dropping session Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 09/82] mac80211: move roc cookie assignment earlier Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 10/82] mac80211: release the channel in error path in start_ap Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 11/82] mac80211: fix fragmentation code, particularly for encryption Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 12/82] ath9k_htc: make ->sta_rc_update atomic for most calls Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 13/82] ath9k_htc: Do not support PowerSave by default Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 14/82] ath9k: " Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 15/82] ar5523: fix usb id for Gigaset Greg Kroah-Hartman
2014-02-20 23:51 ` [PATCH 3.12 16/82] s390/dump: Fix dump memory detection Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 17/82] s390: fix kernel crash due to linkage stack instructions Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 18/82] nl80211: Reset split_start when netlink skb is exhausted Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 19/82] spi: Fix crash with double message finalisation on error handling Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 20/82] iwlwifi: mvm: dont allow A band if SKU forbids it Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 21/82] iwlwifi: mvm: print the version of the firmware when it asserts Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 22/82] iwlwifi: mvm: BT Coex - disable BT when TXing probe request in scan Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 23/82] of: fix PCI bus match for PCIe slots Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 24/82] spi: nuc900: Set SPI_LSB_FIRST for master->mode_bits if hw->pdata->lsb is true Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 25/82] serial: sirf: fix kernel panic caused by unpaired spinlock Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 26/82] raw: test against runtime value of max_raw_minors Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 27/82] hwmon: (ntc_thermistor) Avoid math overflow Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 28/82] lockd: send correct lock when granting a delayed lock Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 29/82] tty: n_gsm: Fix for modems with brk in modem status control Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 30/82] tty: Set correct tty name in active sysfs attribute Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 31/82] drm/radeon: fix UVD IRQ support on 7xx Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 33/82] drm/i915: Pair va_copy with va_end in i915_error_vprintf Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 37/82] staging: r8188eu: Fix typo in USB_DEVICE list Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 38/82] staging: comedi: adv_pci1710: fix analog output readback value Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 39/82] staging:iio:ad799x fix error_free_irq which was freeing an irq that may not have been requested Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 40/82] iio: max1363: Use devm_regulator_get_optional for optional regulator Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 41/82] iio: adis16400: Set timestamp as the last element in chan_spec Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 42/82] iio: ak8975: Fix calculation formula for convert micro tesla to gauss unit Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 43/82] x86, smap: Dont enable SMAP if CONFIG_X86_SMAP is disabled Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 44/82] x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 45/82] ftrace/x86: Use breakpoints for converting function graph caller Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 46/82] ALSA: hda - Fix mic capture on Sony VAIO Pro 11 Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 47/82] mei: clear write cb from waiting list on reset Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 48/82] mei: dont unset read cb ptr " Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 49/82] VME: Correct read/write alignment algorithm Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 50/82] Drivers: hv: vmbus: Dont timeout during the initial connection with host Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 52/82] USB: ftdi_sio: add Tagsys RFID Reader IDs Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 53/82] usb-storage: add unusual-devs entry for BlackBerry 9000 Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 54/82] usb-storage: restrict bcdDevice range for Super Top in Cypress ATACB Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 55/82] usb-storage: enable multi-LUN scanning when needed Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 56/82] usb: option: blacklist ZTE MF667 net interface Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 59/82] Revert "xhci: Set scatter-gather limit to avoid failed block writes." Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 60/82] Revert "xhci: Avoid infinite loop when sg urb requires too many trbs" Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 61/82] Revert "usb: xhci: Link TRB must not occur within a USB payload burst" Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 62/82] Revert "usbcore: set lpm_capable field for LPM capable root hubs" Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 63/82] Modpost: fixed USB alias generation for ranges including 0x9 and 0xA Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 64/82] block: __elv_next_request() shouldnt call into the elevator if bypassing Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 65/82] block: Fix nr_vecs for inline integrity vectors Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 66/82] block: add cond_resched() to potentially long running ioctl discard loop Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 67/82] ACPI / hotplug / PCI: Relax the checking of _STA return values Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 68/82] compiler/gcc4: Make quirk for asm_volatile_goto() unconditional Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 69/82] IB/qib: Add missing serdes init sequence Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 70/82] KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio() Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 71/82] tick: Clear broadcast pending bit when switching to oneshot Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 72/82] md/raid1: restore ability for check and repair to fix read errors Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 73/82] md/raid5: Fix CPU hotplug callback registration Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 74/82] i2c: mv64xxx: refactor message start to ensure proper initialization Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 75/82] time: Fix overflow when HZ is smaller than 60 Greg Kroah-Hartman
2014-02-20 23:52 ` [PATCH 3.12 76/82] power: max17040: Fix NULL pointer dereference when there is no platform_data Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 77/82] ring-buffer: Fix first commit on sub-buffer having non-zero delta Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 78/82] target: Fix free-after-use regression in PR unregister Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 79/82] genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 80/82] drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 81/82] EDAC: Poll timeout cannot be zero, p2 Greg Kroah-Hartman
2014-02-20 23:53 ` [PATCH 3.12 82/82] EDAC: Correct workqueue setup path Greg Kroah-Hartman
2014-02-21  5:04 ` [PATCH 3.12 00/82] 3.12.13-stable review Guenter Roeck
2014-02-21 23:41 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140220235019.588988495@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=cwang@twopensource.com \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).