2016-08-23 16:24:13 +03:00
|
|
|
Budget Fair Queueing I/O Scheduler
|
|
|
|
==================================
|
|
|
|
|
2016-09-09 17:55:10 +03:00
|
|
|
This patchset introduces BFQ-v8r3 into Linux 4.7.0.
|
2016-08-23 16:24:13 +03:00
|
|
|
For further information: http://algogroup.unimore.it/people/paolo/disk_sched/.
|
|
|
|
|
|
|
|
The overall diffstat is the following:
|
|
|
|
|
|
|
|
block/Kconfig.iosched | 30 +
|
|
|
|
block/Makefile | 1 +
|
|
|
|
block/bfq-cgroup.c | 1178 +++++++++++++++++++++
|
|
|
|
block/bfq-ioc.c | 36 +
|
|
|
|
block/bfq-iosched.c | 4895 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
|
|
|
block/bfq-sched.c | 1450 ++++++++++++++++++++++++++
|
|
|
|
block/bfq.h | 848 +++++++++++++++
|
|
|
|
include/linux/blkdev.h | 2 +-
|
|
|
|
8 files changed, 8439 insertions(+), 1 deletion(-)
|
|
|
|
|
|
|
|
CHANGELOG
|
|
|
|
|
2016-09-09 17:55:10 +03:00
|
|
|
v8r3
|
|
|
|
|
|
|
|
. BUGFIX Update weight-raising coefficient when switching from
|
|
|
|
interactive to soft real-time.
|
|
|
|
|
2016-08-23 16:24:13 +03:00
|
|
|
v8r2
|
|
|
|
|
|
|
|
. BUGFIX Removed variables that are not used if tracing is
|
|
|
|
disabled. Reported by Lee Tibbert <lee.tibbert@gmail.com>
|
|
|
|
|
|
|
|
. IMPROVEMENT Ported commit ae11889636: turned blkg_lookup_create into
|
|
|
|
blkg_lookup. As a side benefit, this finally enables BFQ to be used
|
|
|
|
as a module even with full hierarchical support.
|
|
|
|
|
|
|
|
v8r1
|
|
|
|
|
|
|
|
. BUGFIX Fixed incorrect invariant check
|
|
|
|
|
|
|
|
. IMPROVEMENT Privileged soft real-time applications against
|
|
|
|
interactive ones, to guarantee a lower and more stable latency to
|
|
|
|
the former
|
|
|
|
|
|
|
|
v8
|
|
|
|
|
|
|
|
. BUGFIX: Fixed incorrect rcu locking in bfq_bic_update_cgroup
|
|
|
|
|
|
|
|
. BUGFIX Fixed a few cgroups-related bugs, causing sporadic crashes
|
|
|
|
|
|
|
|
. BUGFIX Fixed wrong computation of queue weights as a function of ioprios
|
|
|
|
|
|
|
|
. BUGFIX Fixed wrong Kconfig.iosched dependency for BFQ_GROUP_IOSCHED
|
|
|
|
|
|
|
|
. IMPROVEMENT Preemption-based, idle-less service guarantees. If
|
|
|
|
several processes are competing for the device at the same time, but
|
|
|
|
all processes and groups have the same weight, then the mechanism
|
|
|
|
introduced by this improvement enables BFQ to guarantee the expected
|
|
|
|
throughput distribution without ever idling the device. Throughput
|
|
|
|
is then much higher in this common scenario.
|
|
|
|
|
|
|
|
. IMPROVEMENT Made burst handling more robust
|
|
|
|
|
|
|
|
. IMPROVEMENT Reduced false positives in EQM
|
|
|
|
|
|
|
|
. IMPROVEMENT Let queues preserve weight-raising also when shared
|
|
|
|
|
|
|
|
. IMPROVEMENT Improved peak-rate estimation and autotuning of the
|
|
|
|
parameters related to the device rate
|
|
|
|
|
|
|
|
. IMPROVEMENT Improved the weight-raising mechanism so as to further
|
|
|
|
reduce latency and to increase robustness
|
|
|
|
|
|
|
|
. IMPROVEMENT Added a strict-guarantees tunable. If this tunable is
|
|
|
|
set, then device-idling is forced whenever needed to provide
|
|
|
|
accurate service guarantees. CAVEAT: idling unconditionally may even
|
|
|
|
increase latencies, in case of processes that did stop doing I/O.
|
|
|
|
|
|
|
|
. IMPROVEMENT Improved handling of async (write) I/O requests
|
|
|
|
|
|
|
|
. IMPROVEMENT Ported several good CFQ commits
|
|
|
|
|
|
|
|
. CHANGE Changed default group weight to 100
|
|
|
|
|
|
|
|
. CODE IMPROVEMENT Refactored I/O-request-insertion code
|
|
|
|
|
|
|
|
v7r11:
|
|
|
|
. BUGFIX Remove the group_list data structure, which ended up in an
|
|
|
|
inconsistent state if BFQ happened to be activated for some device
|
|
|
|
when some blkio groups already existed (these groups where not added
|
|
|
|
to the list). The blkg list for the request queue is now used where
|
|
|
|
the removed group_list was used.
|
|
|
|
|
|
|
|
. BUGFIX Init and reset also dead_stats.
|
|
|
|
|
|
|
|
. BUGFIX Added, in __bfq_deactivate_entity, the correct handling of the
|
|
|
|
case where the entity to deactivate has not yet been activated at all.
|
|
|
|
|
|
|
|
. BUGFIX Added missing free of the root group for the case where full
|
|
|
|
hierarchical support is not activated.
|
|
|
|
|
|
|
|
. IMPROVEMENT Removed the now useless bfq_disconnect_groups
|
|
|
|
function. The same functionality is achieved through multiple
|
|
|
|
invocations of bfq_pd_offline (which are in their turn guaranteed to
|
|
|
|
be executed, when needed, by the blk-cgroups code).
|
|
|
|
|
|
|
|
v7r10 <VERSION RETIRED, BECAUSE OF THE BUGS FIXED IN v7r11!!!>:
|
|
|
|
. BUGFIX: Fixed wrong check on whether cooperating processes belong
|
|
|
|
to the same cgroup.
|
|
|
|
|
|
|
|
v7r9:
|
|
|
|
. IMPROVEMENT: Changed BFQ to use the blkio controller instead of its
|
|
|
|
own controller. BFQ now registers itself as a policy to the blkio
|
|
|
|
controller and implements its hierarchical scheduling support using
|
|
|
|
data structures that already exist in blk-cgroup. The bfqio
|
|
|
|
controller's code is completely removed.
|
|
|
|
|
|
|
|
. CODE IMPROVEMENTS: Applied all suggestions from Tejun Heo, received
|
|
|
|
on the last submission to lkml: https://lkml.org/lkml/2014/5/27/314.
|
|
|
|
|
|
|
|
v7r8:
|
|
|
|
. BUGFIX: Let weight-related fields of a bfq_entity be correctly initialized
|
|
|
|
(also) when the I/O priority of the entity is changed before the first
|
|
|
|
request is inserted into the bfq_queue associated to the entity.
|
|
|
|
. BUGFIX: When merging requests belonging to different bfq_queues, avoid
|
|
|
|
repositioning the surviving request. In fact, in this case the repositioning
|
|
|
|
may result in the surviving request being moved across bfq_queues, which
|
|
|
|
would ultimately cause bfq_queues' data structures to become inconsistent.
|
|
|
|
. BUGFIX: When merging requests belonging to the same bfq_queue, reposition
|
|
|
|
the surviving request so that it gets in the correct position, namely the
|
|
|
|
position of the dropped request, instead of always being moved to the head
|
|
|
|
of the FIFO of the bfq_queue (which means to let the request be considered
|
|
|
|
the eldest one).
|
|
|
|
. BUGFIX: Reduce the idling slice for seeky queues only if the scenario is
|
|
|
|
symmetric. This guarantees that also processes associated to seeky queues
|
|
|
|
do receive their reserved share of the throughput.
|
|
|
|
Contributed by Riccardo Pizzetti and Samuele Zecchini.
|
|
|
|
. IMPROVEMENT: Always perform device idling if the scenario is asymmetric in
|
|
|
|
terms of throughput distribution among processes.
|
|
|
|
This extends throughput-distribution guarantees to any process, regardless
|
|
|
|
of the properties of its request pattern and of the request patterns of the
|
|
|
|
other processes, and regardless of whether the device is NCQ-capable.
|
|
|
|
. IMPROVEMENT: Remove the current limitation on the maximum number of in-flight
|
|
|
|
requests allowed for a sync queue (limitation set in place for fairness
|
|
|
|
issues in CFQ, inherited by the first version of BFQ, but made unnecessary
|
|
|
|
by the latest accurate fairness strategies added to BFQ). Removing this
|
|
|
|
limitation enables devices with long internal queues to fill their queues
|
|
|
|
as much as they deem appropriate, also with sync requests. This avoids
|
|
|
|
throughput losses on these devices, because, to achieve a high throughput,
|
|
|
|
they often need to have a high number of requests queued internally.
|
|
|
|
. CODE IMPROVEMENT: Simplify I/O priority change logic by turning it into a
|
|
|
|
single-step procedure instead of a two-step one; improve readability by
|
|
|
|
rethinking the names of the functions involved in changing the I/O priority
|
|
|
|
of a bfq_queue.
|
|
|
|
|
|
|
|
v7r7:
|
|
|
|
. BUGFIX: Prevent the OOM queue from being involved in the queue
|
|
|
|
cooperation mechanism. In fact, since the requests temporarily
|
|
|
|
redirected to the OOM queue could be redirected again to dedicated
|
|
|
|
queues at any time, the state needed to correctly handle merging
|
|
|
|
with the OOM queue would be quite complex and expensive to
|
|
|
|
maintain. Besides, in such a critical condition as an out of
|
|
|
|
memory, the benefits of queue merging may be little relevant, or
|
|
|
|
even negligible.
|
|
|
|
. IMPROVEMENT: Let the OOM queue be initialized only once. Previously,
|
|
|
|
the OOM queue was reinitialized, at each request enqueue, with the
|
|
|
|
parameters related to the process that issued that request.
|
|
|
|
Depending on the parameters of the processes doing I/O, this could
|
|
|
|
easily cause the OOM queue to be moved continuously across service
|
|
|
|
trees, or even across groups. It also caused the parameters of the
|
|
|
|
OOM queue to be continuously reset in any case.
|
|
|
|
. CODE IMPROVEMENT. Performed some minor code cleanups, and added some
|
|
|
|
BUG_ON()s that, if the weight of an entity becomes inconsistent,
|
|
|
|
should better help understand why.
|
|
|
|
|
|
|
|
v7r6:
|
|
|
|
. IMPROVEMENT: Introduced a new mechanism that helps get the job done
|
|
|
|
more quickly with services and applications that create or reactivate
|
|
|
|
many parallel I/O-bound processes. This is the case, for example, with
|
|
|
|
systemd at boot, or with commands like git grep.
|
|
|
|
. CODE IMPROVEMENTS: Small code cleanups and improvements.
|
|
|
|
|
|
|
|
v7r5:
|
|
|
|
. IMPROVEMENT: Improve throughput boosting by idling the device
|
|
|
|
only for processes that, in addition to perform sequential I/O,
|
|
|
|
are I/O-bound (apart from weight-raised queues, for which idling
|
|
|
|
is always performed to guarantee them a low latency).
|
|
|
|
. IMPROVEMENT: Improve throughput boosting by depriving processes
|
|
|
|
that cooperate often of weight-raising.
|
|
|
|
. CODE IMPROVEMENT: Pass of improvement of the readability of both
|
|
|
|
comments and actual code.
|
|
|
|
|
|
|
|
v7r4:
|
|
|
|
. BUGFIX. Modified the code so as to be robust against late detection of
|
|
|
|
NCQ support for a rotational device.
|
|
|
|
. BUGFIX. Removed a bug that hindered the correct throughput distribution
|
|
|
|
on flash-based devices when not every process had to receive the same
|
|
|
|
fraction of the throughput. This fix entailed also a little efficiency
|
|
|
|
improvement, because it implied the removal of a short function executed
|
|
|
|
in a hot path.
|
|
|
|
. CODESTYLE IMPROVEMENT: removed quoted strings split across lines.
|
|
|
|
|
|
|
|
v7r3:
|
|
|
|
. IMPROVEMENT: Improved throughput boosting with NCQ-capable HDDs and
|
|
|
|
random workloads. The mechanism that further boosts throghput with
|
|
|
|
these devices and workloads is activated only in the cases where it
|
|
|
|
does not cause any violation of throughput-distribution and latency
|
|
|
|
guarantees.
|
|
|
|
. IMPROVEMENT: Generalized the computation of the parameters of the
|
|
|
|
low-latency heuristic for interactive applications, so as to fit also
|
|
|
|
slower storage devices. The purpose of this improvement is to preserve
|
|
|
|
low-latency guarantees for interactive applications also on slower
|
|
|
|
devices, such as portable hard disks, multimedia and SD cards.
|
|
|
|
. BUGFIX: Re-added MODULE_LICENSE macro.
|
|
|
|
. CODE IMPROVEMENTS: Small code cleanups; introduced a coherent naming
|
|
|
|
scheme for all identifiers related to weight raising; refactored and
|
|
|
|
optimized a few hot paths.
|
|
|
|
|
|
|
|
v7r2:
|
|
|
|
. BUGFIX/IMPROVEMENT. One of the requirements for an application to be
|
|
|
|
deemed as soft real-time is that it issues its requests in batches, and
|
|
|
|
stops doing I/O for a well-defined amount of time before issuing a new
|
|
|
|
batch. Imposing this minimum idle time allows BFQ to filter out I/O-bound
|
|
|
|
applications that may otherwise be incorrectly deemed as soft real-time
|
|
|
|
(under the circumstances described in detail in the comments to the
|
|
|
|
function bfq_bfqq_softrt_next_start()). Unfortunately, BFQ could however
|
|
|
|
start counting this idle time from two different events: either from the
|
|
|
|
expiration of the queue, if all requests of the queue had also been already
|
|
|
|
completed when the queue expired, or, if the previous condition did not
|
|
|
|
hold, from the first completion of one of the still outstanding requests.
|
|
|
|
In the second case, an application had more chances to be deemed as soft
|
|
|
|
real-time.
|
|
|
|
Actually, there was no reason for this differentiated treatment. We
|
|
|
|
addressed this issue by defining more precisely the above requirement for
|
|
|
|
an application to be deemed as soft real-time, and changing the code
|
|
|
|
consequently: a well-defined amount of time must elapse between the
|
|
|
|
completion of *all the requests* of the current pending batch and the
|
|
|
|
issuing of the first request of the next batch (this is, in the end, what
|
|
|
|
happens with a true soft real-time application). This change further
|
|
|
|
reduced false positives, and, as such, improved responsiveness and reduced
|
|
|
|
latency for actual soft real-time applications.
|
|
|
|
. CODE IMPROVEMENT. We cleaned up the code a little bit and addressed
|
|
|
|
some issues pointed out by the checkpatch.pl script.
|
|
|
|
|
|
|
|
v7r1:
|
|
|
|
. BUGFIX. Replace the old value used to approximate 'infinity', with
|
|
|
|
the correct one to use in case times are compared through the macro
|
|
|
|
time_is_before_jiffies(). In fact, this macro, designed to take
|
|
|
|
wraparound issues into account, easily returns anomalous results if
|
|
|
|
its argument is equal to the value that we used as an approximation
|
|
|
|
of 'infinity', namely ((unsigned long) (-1)). The consequence was
|
|
|
|
that the logical expression used to determine whether a queue
|
|
|
|
belongs to a soft real-time application often yielded an incorrect
|
|
|
|
result. In the end, some application happened to be incorrectly
|
|
|
|
deemed as soft real-time and hence weight-raised. This affected both
|
|
|
|
throughput and latency guarantees.
|
|
|
|
. BUGFIX. Fixed a scriverner's error made in an attempt to use the
|
|
|
|
above macro in a logical expression.
|
|
|
|
. IMPROVEMENT/BUGFIX. On the expiration of a queue, use a more general
|
|
|
|
condition to allow a weight-raising period to start if the queue is
|
|
|
|
soft real-time. The previous condition could prevent an empty,
|
|
|
|
soft-real time queue from being correctly deemed as soft real-time.
|
|
|
|
. IMPROVEMENT/MINOR BUGFIX. Use jiffies-comparison macros also in the
|
|
|
|
following cases:
|
|
|
|
. to establish whether an application initially deemed as interactive
|
|
|
|
is now meeting the requirements for being classified as soft
|
|
|
|
real-time;
|
|
|
|
. to determine if a weight-raising period must be ended.
|
|
|
|
. CODE IMPROVEMENT. Change the type of the time quantities used in the
|
|
|
|
weight-raising heuristics to unsigned long, as the type of the time
|
|
|
|
(jiffies) is unsigned long.
|
|
|
|
|
|
|
|
v7:
|
|
|
|
- IMPROVEMENT: In the presence of weight-raised queues and if the
|
|
|
|
device is NCQ-enabled, device idling is now disabled for non-raised
|
|
|
|
readers, i.e., for their associated sync queues. Hence a sync queue
|
|
|
|
is expired immediately if it becomes empty, and a new queue is
|
|
|
|
served. As explained in detail in the papers about BFQ, not idling
|
|
|
|
the device for sync queues when the latter become empty causes BFQ to
|
|
|
|
assign higher timestamps to these queues when they get backlogged
|
|
|
|
again, and hence to serve these queues less frequently. This fact,
|
|
|
|
plus to the fact that, because of the immediate expiration itself,
|
|
|
|
these queues get less service while they are granted access to the
|
|
|
|
disk, reduces the relative rate at which the processes associated to
|
|
|
|
these queues ask for requests from the I/O request pool. If the pool
|
|
|
|
is saturated, as it happens in the presence of write hogs, reducing
|
|
|
|
the above relative rate increases the probability that a request is
|
|
|
|
available (soon) in the pool when a weight-raised process needs it.
|
|
|
|
This change does seem to mitigate the typical starvation problems
|
|
|
|
that occur in the presence of write hogs and NCQ, and hence to
|
|
|
|
guarantee a higher application and system responsiveness in these
|
|
|
|
hostile scenarios.
|
|
|
|
- IMPROVEMENT/BUGFIX: Introduced a new classification rule to the soft
|
|
|
|
real-time heuristic, which takes into account also the isochronous
|
|
|
|
nature of such applications. The computation of next_start has been
|
|
|
|
fixed as well. Now it is correctly done from the time of the last
|
|
|
|
transition from idle to backlogged; the next_start is therefore
|
|
|
|
computed from the service received by the queue from its last
|
|
|
|
transition from idle to backlogged. Finally, the code which
|
|
|
|
preserved weight-raising for a soft real-time queue even with no
|
|
|
|
idle->backlogged transition has been removed.
|
|
|
|
- IMPROVEMENT: Add a few jiffies to the reference time interval used to
|
|
|
|
establish whether an application is greedy or not. This reference
|
|
|
|
interval was, by default, HZ/125 seconds, which could generate false
|
|
|
|
positives in the following two cases (especially if both cases occur):
|
|
|
|
1) If HZ is so low that the duration of a jiffie is comparable to or
|
|
|
|
higher than the above reference time interval. This happens, e.g.,
|
|
|
|
on slow devices with HZ=100.
|
|
|
|
2) If jiffies, instead of increasing at a constant rate, may stop
|
|
|
|
increasing for some time, then suddenly 'jump' by several units to
|
|
|
|
recover the lost increments. This seems to happen, e.g., in virtual
|
|
|
|
machines.
|
|
|
|
The added number of jiffies has been found experimentally. In particular,
|
|
|
|
according to our experiments, adding this number of jiffies seems to make
|
|
|
|
the filter quite precise also in embedded systems and KVM/QEMU virtual
|
|
|
|
machines. Also contributed by
|
|
|
|
Alexander Spyridakis <a.spyridakis@virtualopensystems.com>.
|
|
|
|
- IMPROVEMENT/BUGFIX: Keep disk idling also for NCQ-provided
|
|
|
|
rotational devices, which boosts the throughput on NCQ-enabled
|
|
|
|
rotational devices.
|
|
|
|
- BUGFIX: The budget-timeout condition in the bfq_rq_enqueued() function
|
|
|
|
was checked only if the request is large enough to provoke an unplug. As
|
|
|
|
a consequence, for a process always issuing small I/O requests the
|
|
|
|
budget timeout was never checked. The queue associated to the process
|
|
|
|
therefore expired only when its budget was exhausted, even if the
|
|
|
|
queue had already incurred a budget timeout from a while.
|
|
|
|
This fix lets a queue be checked for budget timeout at each request
|
|
|
|
enqueue, and, if needed, expires the queue accordingly even if the
|
|
|
|
request is small.
|
|
|
|
- BUGFIX: Make sure that weight-raising is resumed for a split queue,
|
|
|
|
if it was merged when already weight-raised.
|
|
|
|
- MINOR BUGFIX: Let bfq_end_raising_async() correctly end weight-raising
|
|
|
|
also for the queues belonging to the root group.
|
|
|
|
- IMPROVEMENT: Get rid of the some_coop_idle flag, which in its turn
|
|
|
|
was used to decide whether to disable idling for an in-service
|
|
|
|
shared queue whose seek mean decreased. In fact, disabling idling
|
|
|
|
for such a queue turned out to be useless.
|
|
|
|
- CODE IMPROVEMENT: The bfq_bfqq_must_idle() function and the
|
|
|
|
bfq_select_queue() function may not change the current in-service
|
|
|
|
queue in various cases. We have cleaned up the involved conditions,
|
|
|
|
by factoring out the common parts and getting rid of the useless
|
|
|
|
ones.
|
|
|
|
- MINOR CODE IMPROVEMENT: The idle_for_long_time condition in the
|
|
|
|
bfq_add_rq_rb() function should be evaluated only on an
|
|
|
|
idle->backlogged transition. Now the condition is set to false
|
|
|
|
by default, evaluating it only if the queue was not busy on a
|
|
|
|
request insertion.
|
|
|
|
- MINOR CODE IMPROVEMENT: Added a comment describing the rationale
|
|
|
|
behind the condition evaluated in the function
|
|
|
|
bfq_bfqq_must_not_expire().
|
|
|
|
|
|
|
|
v6r2:
|
|
|
|
- Fairness fix: the case of queue expiration for budget timeout is
|
|
|
|
now correctly handled also for sync queues, thus allowing also
|
|
|
|
the processes corresponding to these queues to be guaranteed their
|
|
|
|
reserved share of the disk throughput.
|
|
|
|
- Fixed a bug that prevented group weights from being correctly
|
|
|
|
set via the sysfs interface.
|
|
|
|
- Fixed a bug that cleared a previously-set group weight if the
|
|
|
|
same value was re-inserted via the sysfs interface.
|
|
|
|
- Fixed an EQM bug that allowed a newly-started process to skip
|
|
|
|
its initial weight-raising period if its queue was merged before
|
|
|
|
its first request was inserted.
|
|
|
|
- Fixed a bug that preserved already-started weight-raising periods
|
|
|
|
even if the low_latency tunable was disabled.
|
|
|
|
- The raising_max_time tunable now shows, more user-friendly, the
|
|
|
|
maximum raising time in milliseconds.
|
|
|
|
|
|
|
|
v6r1:
|
|
|
|
- Fix use-after-free of queues in __bfq_bfqq_expire(). It may happen that
|
|
|
|
a call to bfq_del_bfqq_busy() puts the last reference taken on a queue
|
|
|
|
and frees it. Subsequent accesses to that same queue would result in a
|
|
|
|
use-after-free. Make sure that a queue that has just been deleted from
|
|
|
|
busy is no more touched.
|
|
|
|
- Use the uninitialized_var() macro when needed. It may happen that a
|
|
|
|
variable is initialized in a function that is called by the function
|
|
|
|
that defined it. Use the uninitialized_var() macro in these cases.
|
|
|
|
|
|
|
|
v6:
|
|
|
|
- Replacement of the cooperating-queue merging mechanism borrowed from
|
|
|
|
CFQ with Early Queue Merge (EQM), a unified mechanism to get a
|
|
|
|
sequential read pattern, and hence a high throughput, with any set of
|
|
|
|
processes performing interleaved I/O. EQM also preserves low latency.
|
|
|
|
(see http://algogroup.unimore.it/people/paolo/disk_sched/description.php
|
|
|
|
for more details). Contributed by Mauro Andreolini and Arianna Avanzini.
|
|
|
|
The code for detecting whether two queues have to be merged is a
|
|
|
|
slightly modified version of the CFQ code for detecting whether two
|
|
|
|
queues belong to cooperating processes and whether the service of a
|
|
|
|
queue should be preempted to boost the throughput.
|
|
|
|
- Fix a bug that caused the peak rate of a disk to be computed as zero
|
|
|
|
in case of multiple I/O errors. Subsequent estimations of the weight
|
|
|
|
raising duration caused a division-by-zero error.
|
|
|
|
|
|
|
|
v5r1:
|
|
|
|
- BUG FIX: Fixed stall occurring when the active queue is moved to
|
|
|
|
a different group while idling (this caused the idling timer to be
|
|
|
|
cancelled and hence no new queue to be selected, and no new
|
|
|
|
request to be dispatched).
|
|
|
|
- BUG FIX: Fixed wrong assignment of too high budgets to queues during
|
|
|
|
the first few seconds after initialization.
|
|
|
|
- BUG FIX: Added proper locking to the function handling the "weights"
|
|
|
|
tunable.
|
|
|
|
|
|
|
|
v5:
|
|
|
|
- Added an heuristic that, if the tunable raising_max_time is set to
|
|
|
|
0, automatically computes the duration of the weight raising
|
|
|
|
according to the estimated peak rate of the device. This enables
|
|
|
|
flash-based devices to reach maximum throughput as soon as possible,
|
|
|
|
without sacrificing latency.
|
|
|
|
|
|
|
|
v4:
|
|
|
|
- Throughput-boosting for flash-based devices: improved version of commits
|
|
|
|
a68bbdd and f7d7b7a, which boosts the throughput while still preserving
|
|
|
|
latency guarantees for interactive and soft real-time applications.
|
|
|
|
- Better identification of NCQ-capable disks: port of commit e459dd0.
|
|
|
|
|
|
|
|
v3-r4:
|
|
|
|
- Bugfixes
|
|
|
|
* Removed an important memory leak: under some circumstances the process references
|
|
|
|
to a queue were not decremented correctly, which prevented unused shared bfq_queue
|
|
|
|
to be correctly deallocated.
|
|
|
|
* Fixed various errors related to hierarchical scheduling:
|
|
|
|
* Removed an error causing tasks to be attached to the bfqio cgroup
|
|
|
|
controller even when BFQ was not the active scheduler
|
|
|
|
* Corrected wrong update of the budgets from the leaf to the root upon
|
|
|
|
forced selection of a service tree or a bfq_queue
|
|
|
|
* Fixed the way how active leaf entities are moved to the root group before
|
|
|
|
the group entity is deactivated when a cgroup is destroyed
|
|
|
|
- Throughput-boosting improvement for cooperating queues: close detection is now based
|
|
|
|
on a fixed threshold instead of the queue's average seek. This is a port of one of
|
|
|
|
the changes in the CFQ commit 3dde36d by Corrado Zoccolo.
|
|
|
|
|
|
|
|
v3-r3:
|
|
|
|
- Bugfix: removed an important error causing occasional kernel panics when
|
|
|
|
moving a process to a new cgroup. The panic occurred if:
|
|
|
|
1) the queue associated to the process was idle when the process was moved
|
|
|
|
and
|
|
|
|
2) a new disk request was inserted into the queue just after the move.
|
|
|
|
- Further latency improvement through a better treatment of low-bandwidth
|
|
|
|
async queues.
|
|
|
|
|
|
|
|
v3-r2:
|
|
|
|
- Bugfix: added a forgotten condition that prevents weights of low-bw async
|
|
|
|
queues from being raised when low_latency is off.
|
|
|
|
- Latency improvement: low-bw async queues are now better identified.
|
|
|
|
|
|
|
|
v3-r1:
|
|
|
|
- Fixed an important request-dispatch bug causing occasional IO hangs.
|
|
|
|
- Added a new mechanism to reduce the latency of low-bw async queues.
|
|
|
|
This reduces the latency of also the sync queues synchronized with
|
|
|
|
the above async queues.
|
|
|
|
- Fixed a minor bug in iocontext locking (port of commits 9b50902 and 3181faa
|
|
|
|
from CFQ).
|
|
|
|
|
|
|
|
v3:
|
|
|
|
|
|
|
|
- Improved low-latency mechanisms, including a more accurate criterion to
|
|
|
|
distinguish between greedy-but-seeky and soft real-time applications.
|
|
|
|
Interactive applications now enjoy noticeably lower latencies.
|
|
|
|
|
|
|
|
- Switch to the simpler one-request-dispatch-at-a-time scheme as in CFQ.
|
|
|
|
|
|
|
|
- Ported cooperating-queues merging from CFQ (6d048f5, 1afba04,
|
|
|
|
d9e7620, a36e71f, 04dc6e7, 26a2ac0, 3ac6c9f, f2d1f0a, 83096eb,
|
|
|
|
2e46e8b, df5fe3e, b3b6d04, e6c5bc7, c0324a0, f04a642, 8682e1f,
|
|
|
|
b9d8f4c, 2f7a2d8, ae54abe, e9ce335, 39c01b2, d02a2c0, c10b61f).
|
|
|
|
Contributed by Arianna Avanzini. Queues of processes performing IO
|
|
|
|
on interleaved, yet contiguous disk zones are merged to boost the
|
|
|
|
throughput. Some little optimizations to get a more stable throughput
|
|
|
|
have been added to the original CFQ version.
|
|
|
|
|
|
|
|
- Added static fallback queue for extreme OOM conditions (porting of
|
|
|
|
CFQ commits d5036d7, 6118b70, b706f64, 32f2e80). Port contributed by
|
|
|
|
Francesco Allertsen.
|
|
|
|
|
|
|
|
- Ported CFQ commits b0b78f8, 40bb54d, 30996f4, dddb745, ad5ebd2, cf7c25c;
|
|
|
|
mainly code cleanup and fix of minor bugs. Port contributed by
|
|
|
|
Francesco Allertsen.
|
|
|
|
|
|
|
|
v2:
|
|
|
|
|
|
|
|
- An issue that may cause little throughput loss on fast disks has been solved.
|
|
|
|
BFQ-v1 and CFQ may suffer from this problem.
|
|
|
|
- The disk-idling timeout has been better tuned to further file latency
|
|
|
|
(especially for the idle- or light-loaded-disk scenarios).
|
|
|
|
- One of the parameters of the low-latency heuristics has been tuned a little
|
|
|
|
bit more, so as to reduce the probability that a disk-bound process may
|
|
|
|
hamper the reduction of the latency of interactive and soft real-time
|
|
|
|
applications.
|
|
|
|
|
|
|
|
- Same low-latency guarantees with and without NCQ.
|
|
|
|
|
|
|
|
- Latency for interactive applications about halved with respect to BFQ-v1.
|
|
|
|
|
|
|
|
- When the low_latency tunable is set, also soft real-time applications
|
|
|
|
now enjoy reduced latency.
|
|
|
|
|
|
|
|
- A very little minimum bandwidth is now guaranteed to the
|
|
|
|
Idle IO-scheduling class also when the other classes are
|
|
|
|
backlogged, just to prevent them from starving.
|
|
|
|
|
|
|
|
v1:
|
|
|
|
|
|
|
|
This is a new version of BFQ with respect to the versions you can
|
|
|
|
find on Fabio's site: http://feanor.sssup.it/~fabio/linux/bfq.
|
|
|
|
Here is what we changed with respect to the previous versions:
|
|
|
|
|
|
|
|
1) re-tuned the budget feedback mechanism: it is now slighlty more
|
|
|
|
biased toward assigning high budgets, to boost the aggregated
|
|
|
|
throughput more, and more quickly as new processes are started
|
|
|
|
|
|
|
|
2) introduced more tolerance toward seeky queues (I verified that the
|
|
|
|
phenomena described below used to occur systematically):
|
|
|
|
|
|
|
|
2a: if a queue is expired after having received very little
|
|
|
|
service, then it is not punished as a seeky queue, even if it
|
|
|
|
occurred to consume that little service too slowly; the
|
|
|
|
rationale is that, if the new active queue has been served for
|
|
|
|
a too short time interval, then its possible sequential
|
|
|
|
accesses may not yet prevail on the initial latencies for
|
|
|
|
moving the disk head on the first sector requested
|
|
|
|
|
|
|
|
2b: the waiting time (disk idling) of a queue detected as seeky as
|
|
|
|
a function of the position of the requests it issued is reduced
|
|
|
|
to a very low value only after the queue has consumed a minimum
|
|
|
|
fraction of the assigned budget; this prevents processes
|
|
|
|
generating (partly) seeky workloads from being too ill-treated
|
|
|
|
|
|
|
|
2c: if a queue has consumed 'enough' budget upon a budget timeout, then,
|
|
|
|
even if it did not consume all of its budget, that queue is not punished
|
|
|
|
as any seeky queue; the rationale is that, depending on the disk zones,
|
|
|
|
a queue may be served at a lower rate than the estimated peak rate.
|
|
|
|
|
|
|
|
Changes 2a and 2b have been critical in lowering latencies, whereas
|
|
|
|
change 2c, in addition to change 1, helped a lot increase the disk
|
|
|
|
throughput.
|
|
|
|
|
|
|
|
3) slightly changed the peak rate estimator: a low-pass filter is now
|
|
|
|
used instead of just keeping the highest rate sampled; the rationale
|
|
|
|
is that the peak rate of a disk should be quite stable, so the filter
|
|
|
|
should converge more or less smoothly to the right value; it seemed to
|
|
|
|
correctly catch the peak rate with all disks we used
|
|
|
|
|
|
|
|
4) added the low latency mechanism described in detail in
|
|
|
|
http://algogroup.unimore.it/people/paolo/disk_sched/description.php.
|
|
|
|
|