linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dennis Zhou <dennis@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>, Tejun Heo <tj@kernel.org>,
	Huang Ying <ying.huang@intel.com>,
	Oleg Nesterov <oleg@redhat.com>, Omar Sandoval <osandov@fb.com>,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE
Date: Fri, 9 Aug 2019 08:39:17 +0000	[thread overview]
Message-ID: <d5faac47-8a8c-90ff-877d-b793b715ac4d@virtuozzo.com> (raw)

Hi, all.

Then porting patches from mainstream I've found some strange code:

 > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
 > Author: Christoph Hellwig <hch@lst.de>
 > Date:   Tue Nov 1 07:40:09 2016 -0600
 >
 >     block: replace REQ_NOIDLE with REQ_IDLE
 >
 >     Noidle should be the default for writes as seen by all the compounds
 >     definitions in fs.h using it.  In fact only direct I/O really should
 >     be using NODILE, so turn the whole flag around to get the defaults
 >     right, which will make our life much easier especially onces the
 >     WRITE_* defines go away.
 >
 >     This assumes all the existing "raw" users of REQ_SYNC for writes
 >     want noidle behavior, which seems to be spot on from a quick audit.
 >
 >     Signed-off-by: Christoph Hellwig <hch@lst.de>
 >     Signed-off-by: Jens Axboe <axboe@fb.com>
 >
 > diff --git a/include/linux/fs.h b/include/linux/fs.h
 > index ccedccb28ec8..46a74209917f 100644
 > --- a/include/linux/fs.h
 > +++ b/include/linux/fs.h
 > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
 >  #define WRITE                  REQ_OP_WRITE
 >
 >  #define READ_SYNC              0
 > -#define WRITE_SYNC             (REQ_SYNC | REQ_NOIDLE)
 > -#define WRITE_ODIRECT          REQ_SYNC
 > -#define WRITE_FLUSH            (REQ_NOIDLE | REQ_PREFLUSH)
 > -#define WRITE_FUA              (REQ_NOIDLE | REQ_FUA)
 > -#define WRITE_FLUSH_FUA                (REQ_NOIDLE | REQ_PREFLUSH | 
REQ_FUA)
 > +#define WRITE_SYNC             REQ_SYNC
 > +#define WRITE_ODIRECT          (REQ_SYNC | REQ_IDLE)
 > +#define WRITE_FLUSH            REQ_PREFLUSH
 > +#define WRITE_FUA              REQ_FUA
 > +#define WRITE_FLUSH_FUA                (REQ_PREFLUSH | REQ_FUA)
 >
 >  /*
 >   * Attribute flags.  These should be or-ed together to figure out what

The above commit changes the meaning of the REQ_SYNC flag, before the 
patch it was equal to WRITE_ODIRECT and after the patch it is equal to 
WRITE_SYNC. And thus I think it became treated differently (I see only 
one place left in wbt_should_throttle.).

But in __swap_writepage() both before and after the mentioned patch we 
still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:

 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);
 > [snorch@snorch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);

It looks like we've changed the way how we handle swap page writes from 
"odirect" way to "regular" sync write way, these can be wrong. This may 
also affect deprecated cfq io-scheduler on older kernels.

Thanks in advance for any advice on what to do with these, may be I miss 
something.

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

             reply	other threads:[~2019-08-09  8:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09  8:39 Pavel Tikhomirov [this message]
2019-11-11 14:52 ` mm/swap: possible problem introduced when replacing REQ_NOIDLE with REQ_IDLE Pavel Tikhomirov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5faac47-8a8c-90ff-877d-b793b715ac4d@virtuozzo.com \
    --to=ptikhomirov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dennis@kernel.org \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=osandov@fb.com \
    --cc=tj@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).