mirror of
https://github.com/ARM-software/arm-trusted-firmware.git
synced 2025-04-16 17:44:19 +00:00
docs: add Juno runtime instrumentation data
Add results from running the TFTF test suite Runtime Instrumentation on Juno. Change-Id: I4c5b64e1a80b5b88e42835f0700294a02edc8032 Signed-off-by: Harrison Mutai <harrison.mutai@arm.com>
This commit is contained in:
parent
08d7a10157
commit
a3077ae1e9
1 changed files with 173 additions and 44 deletions
|
@ -25,62 +25,189 @@ x Cortex-A57 clusters running at the following frequencies:
|
|||
Juno supports CPU, cluster and system power down states, corresponding to power
|
||||
levels 0, 1 and 2 respectively. It does not support any retention states.
|
||||
|
||||
We used the upstream `TF master as of 31/01/2017`_, building the platform using
|
||||
the ``ENABLE_RUNTIME_INSTRUMENTATION`` option:
|
||||
Given that runtime instrumentation using PMF is invasive, there is a small
|
||||
(unquantified) overhead on the results. PMF uses the generic counter for
|
||||
timestamps, which runs at 50MHz on Juno.
|
||||
|
||||
.. code:: shell
|
||||
The following source trees and binaries were used:
|
||||
|
||||
make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \
|
||||
SCP_BL2=<path/to/scp-fw.bin> \
|
||||
BL33=<path/to/test-fw.bin> \
|
||||
all fip
|
||||
- TF-A [`v2.9-rc0`_]
|
||||
- TFTF [`v2.9-rc0`_]
|
||||
|
||||
When using the debug build of TF, there was no noticeable difference in the
|
||||
results.
|
||||
Please see the Runtime Instrumentation `Testing Methodology`_ page for more
|
||||
details.
|
||||
|
||||
The tests are based on an ARM-internal test framework. The release build of this
|
||||
framework was used because the results in the debug build became skewed; the
|
||||
console output prevented some of the tests from executing in parallel.
|
||||
Procedure
|
||||
---------
|
||||
|
||||
The tests consist of both parallel and sequential tests, which are broadly
|
||||
described as follows:
|
||||
#. Build TFTF with runtime instrumentation enabled:
|
||||
|
||||
- **Parallel Tests** This type of test powers on all the non-lead CPUs and
|
||||
brings them and the lead CPU to a common synchronization point. The lead CPU
|
||||
then initiates the test on all CPUs in parallel.
|
||||
.. code:: shell
|
||||
|
||||
- **Sequential Tests** This type of test powers on each non-lead CPU in
|
||||
sequence. The lead CPU initiates the test on a non-lead CPU then waits for the
|
||||
test to complete before proceeding to the next non-lead CPU. The lead CPU then
|
||||
executes the test on itself.
|
||||
make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
|
||||
TESTS=runtime-instrumentation all
|
||||
|
||||
#. Fetch Juno's SCP binary from TF-A's archive:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
|
||||
https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
|
||||
|
||||
#. Build TF-A with the following build options:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
|
||||
BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
|
||||
ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
|
||||
|
||||
#. Load the following images onto the development board: ``fip.bin``,
|
||||
``scp_bl2.bin``.
|
||||
|
||||
Results
|
||||
-------
|
||||
|
||||
``CPU_SUSPEND`` to deepest power level
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
|
||||
parallel
|
||||
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
|
||||
+=========+======+===========+=========+=============+
|
||||
| 0 | 0 | 243.76 | 239.92 | 6.32 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 1 | 663.5 | 30.32 | 167.82 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 0 | 105.12 | 22.84 | 5.88 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 1 | 384.16 | 19.06 | 4.7 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 2 | 523.98 | 270.46 | 4.74 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 3 | 950.54 | 220.9 | 89.2 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
|
||||
.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
|
||||
serial
|
||||
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
|
||||
+=========+======+===========+=========+=============+
|
||||
| 0 | 0 | 266.96 | 31.74 | 167.92 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 1 | 266.9 | 31.52 | 167.82 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 0 | 279.86 | 23.42 | 87.52 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 1 | 101.38 | 18.8 | 4.64 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 2 | 101.18 | 19.28 | 4.64 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 3 | 101.32 | 19.02 | 4.62 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
|
||||
``CPU_SUSPEND`` to power level 0
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
|
||||
parallel
|
||||
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
|
||||
+=========+======+===========+=========+=============+
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 0 | 661.94 | 22.88 | 9.66 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 1 | 801.64 | 23.38 | 9.62 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 0 | 105.56 | 16.02 | 8.12 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 1 | 245.42 | 16.26 | 7.78 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 2 | 384.42 | 16.1 | 7.84 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 3 | 523.74 | 15.4 | 8.02 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
|
||||
.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
|
||||
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
|
||||
+=========+======+===========+=========+=============+
|
||||
| 0 | 0 | 102.16 | 23.64 | 6.7 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 1 | 101.66 | 23.78 | 6.6 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 0 | 277.74 | 15.96 | 4.66 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 1 | 98.0 | 15.88 | 4.64 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 2 | 97.66 | 15.88 | 4.62 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 3 | 97.76 | 15.38 | 4.64 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
|
||||
``CPU_OFF`` on all non-lead CPUs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
|
||||
core to the deepest power level.
|
||||
|
||||
.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
|
||||
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| Cluster | Core | Powerdown | Wakekup | Cache Flush |
|
||||
+=========+======+===========+=========+=============+
|
||||
| 0 | 0 | 265.38 | 34.12 | 167.36 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 0 | 1 | 265.72 | 33.98 | 167.48 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 0 | 185.3 | 23.18 | 87.42 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 1 | 101.58 | 23.46 | 4.48 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 2 | 101.66 | 22.02 | 4.72 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
| 1 | 3 | 101.48 | 22.22 | 4.52 |
|
||||
+---------+------+-----------+---------+-------------+
|
||||
|
||||
``CPU_VERSION`` in parallel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
|
||||
|
||||
+-------------+--------+--------------+
|
||||
| Cluster | Core | Latency |
|
||||
+=============+========+==============+
|
||||
| 0 | 0 | 1.22 |
|
||||
+-------------+--------+--------------+
|
||||
| 0 | 1 | 1.2 |
|
||||
+-------------+--------+--------------+
|
||||
| 1 | 0 | 0.6 |
|
||||
+-------------+--------+--------------+
|
||||
| 1 | 1 | 1.08 |
|
||||
+-------------+--------+--------------+
|
||||
| 1 | 2 | 1.04 |
|
||||
+-------------+--------+--------------+
|
||||
| 1 | 3 | 1.04 |
|
||||
+-------------+--------+--------------+
|
||||
|
||||
Annotated Historic Results
|
||||
--------------------------
|
||||
|
||||
The following results are based on the upstream `TF master as of 31/01/2017`_.
|
||||
TF-A was built using the same build instructions as detailed in the procedure
|
||||
above.
|
||||
|
||||
In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
|
||||
CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
|
||||
CPU.
|
||||
|
||||
``PSCI_ENTRY`` refers to the time taken from entering the TF PSCI implementation
|
||||
to the point the hardware enters the low power state (WFI). Referring to the TF
|
||||
runtime instrumentation points, this corresponds to:
|
||||
``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``.
|
||||
|
||||
``PSCI_EXIT`` refers to the time taken from the point the hardware exits the low
|
||||
power state to exiting the TF PSCI implementation. This corresponds to:
|
||||
``(RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR)``.
|
||||
|
||||
``CFLUSH_OVERHEAD`` refers to the part of ``PSCI_ENTRY`` taken to flush the
|
||||
caches. This corresponds to: ``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``.
|
||||
|
||||
Note there is very little variance observed in the values given (~1us), although
|
||||
the values for each CPU are sometimes interchanged, depending on the order in
|
||||
which locks are acquired. Also, there is very little variance observed between
|
||||
executing the tests sequentially in a single boot or rebooting between tests.
|
||||
|
||||
Given that runtime instrumentation using PMF is invasive, there is a small
|
||||
(unquantified) overhead on the results. PMF uses the generic counter for
|
||||
timestamps, which runs at 50MHz on Juno.
|
||||
|
||||
Results and Commentary
|
||||
----------------------
|
||||
``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
|
||||
``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
|
||||
|
||||
``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
@ -290,3 +417,5 @@ effects, given that these measurements are at the nano-second level.
|
|||
|
||||
.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
|
||||
.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
|
||||
.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
|
||||
.. _Testing Methodology: ../perf/psci-performance-methodology.html
|
||||
|
|
Loading…
Add table
Reference in a new issue