Errata application is painful for performance. For a start, it's done
when the core has just come out of reset, which means branch predictors
and caches will be empty so a branch to a workaround function must be
fetched from memory and that round trip is very slow. Then it also runs
with the I-cache off, which means that the loop to iterate over the
workarounds must also be fetched from memory on each iteration.
We can remove both branches. First, we can simply apply every erratum
directly instead of defining a workaround function and jumping to it.
Currently, no errata that need to be applied at both reset and runtime,
with the same workaround function, exist. If the need arose in future,
this should be achievable with a reset + runtime wrapper combo.
Then, we can construct a function that applies each erratum linearly
instead of looping over the list. If this function is part of the reset
function, then the only "far" branches at reset will be for the checker
functions. Importantly, this mitigates the slowdown even when an erratum
is disabled.
The result is ~50% speedup on N1SDP and ~20% on AArch64 Juno on wakeup
from PSCI calls that end in powerdown. This is roughly back to the
baseline of v2.9, before the errata framework regressed on performance
(or a little better). It is important to note that there are other
slowdowns since then that remain unknown.
Change-Id: Ie4d5288a331b11fd648e5c4a0b652b74160b07b9
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
We strive to apply errata as close to reset as possible with as few
things enabled as possible. Importantly, the I-cache will not be
enabled. This means that repeated branches to these tiny functions must
be re-fetched all the way from memory each time which has glacial speed.
Cores are allowed to fetch things ahead of time though as long as
execution is fairly linear. So we can trade a little bit of space (3 to
7 instructions per erratum) to keep things linear and not have to go to
memory.
While we're at it, optimise the the cpu_rev_var_{ls, hs, range}
functions to take up less space. Dropping the moves allows for a bit of
assembly magic that produces the same result in 2 and 3 instructions
respectively.
Change-Id: I51608352f23b2244ea7a99e76c10892d257f12bf
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
The existing DSU errata workarounds hijack the errata framework's inner
workings to register with it. However, that is undesirable as any change
to the framework may end up missing these workarounds. So convert the
checks and workarounds to macros and have them included with the
standard wrappers.
The only problem with this is the is_scu_present_in_dsu weak function.
Fortunately, it is only needed for 2 of the errata and only on 3 cores.
So drop it, assuming the default behaviour and have the callers handle
the exception.
Change-Id: Iefa36325804ea093e938f867b9a6f49a6984b8ae
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
The errata framework has a helper to invoke workarounds, complete with a
cpu rev_var check. We can use that directly instead of the
apply_cpu_pwr_dwn_errata to save on some code, as well as an extra
branch. It's also more readable.
Also, apply_erratum invocation in cpu files don't need to check the
rev_var as that was already done by the cpu_ops dispatcher for us to end
up in the file.
Finally, X2 erratum 2768515 only applies in the powerdown sequence, i.e.
at runtime. It doesn't achieve anything at reset, so we can label it
accordingly.
Change-Id: I02f9dd7d0619feb54c870938ea186be5e3a6ca7b
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Errata printing is done directly via generic_errata_report.
This commit removes the unused \_cpu\()_errata_report
functions for all cores, and removes errata_func from cpu_ops.
Change-Id: I04fefbde5f0ff63b1f1cd17c864557a14070d68c
Signed-off-by: Ryan Everett <ryan.everett@arm.com>
Testing:
- Manual comparison of disassembly with and without conversion.
- Using the test script in gerrit - 19136
- Building with errata and stepping through from ArmDS and running tftf.
Change-Id: I126f09de44b16e8bbb7e32477b880b4650eef23b
Signed-off-by: Govindraj Raja <govindraj.raja@arm.com>
Cortex-A76 erratum 2743102 is a Cat B erratum that applies to
all revisions <=r4p1 and is still open. The workaround is to
insert a dsb before the isb in the power down sequence.
SDEN documentation:
https://developer.arm.com/documentation/SDEN885749/latest
Signed-off-by: Bipin Ravi <bipin.ravi@arm.com>
Change-Id: Ie2cd73bd91417d30b5633d80b2fbee32944bc2de
fix to support ARM CPU errata based on core used.
Signed-off-by: Saurabh Gorecha <quic_sgorecha@quicinc.com>
Change-Id: If1a438f98f743435a7a0b683a32ccf14164db37e
This patch applies CVE-2022-23960 workarounds for Cortex-A75,
Cortex-A73, Cortex-A72 & Cortex-A57. This patch also implements
the new SMCCC_ARCH_WORKAROUND_3 and enables necessary discovery
hooks for Coxtex-A72, Cortex-A57, Cortex-A73 and Cortex-A75 to
enable discovery of this SMC via SMC_FEATURES. SMCCC_ARCH_WORKAROUND_3
is implemented for A57/A72 because some revisions are affected by both
CVE-2022-23960 and CVE-2017-5715 and this allows callers to replace
SMCCC_ARCH_WORKAROUND_1 calls with SMCCC_ARCH_WORKAROUND_3. For details
of SMCCC_ARCH_WORKAROUND_3, please refer SMCCCv1.4 specification.
Signed-off-by: Bipin Ravi <bipin.ravi@arm.com>
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: Ifa6d9c7baa6764924638efe3c70468f98d60ed7c
Re-factored the prior implementation of workaround for CVE-2018-3639
using branch and link instruction to save vector space to include the
workaround for CVE-2022-23960.
Signed-off-by: Bipin Ravi <bipin.ravi@arm.com>
Change-Id: Ib3fe949583160429b5de8f0a4a8e623eb91d87d4
Cortex A76 erratum 1946160 is a Cat B erratum, present in some revisions
of the A76 processor core. The workaround is to insert a DMB ST before
acquire atomic instructions without release semantics. This issue is
present in revisions r0p0 - r4p1 but this workaround only applies to
revisions r3p0 - r4p1, there is no workaround for older versions.
SDEN can be found here:
https://documentation-service.arm.com/static/5fbb77d7d77dd807b9a80cc1
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: Ief33779ee76a89ce2649812ae5214b86a139e327
This errata workaround did not work as intended and was revised in
subsequent SDEN releases so we are reverting this change.
This is the patch being reverted:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/4684
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: I560749a5b55e22fbe49d3f428a8b9545d6bdaaf0
Cortex A76 erratum 1868343 is a Cat B erratum, present in older
revisions of the Cortex A76 processor core. The workaround is to
set a bit in the CPUACTLR_EL1 system register, which delays instruction
fetch after branch misprediction. This workaround will have a small
impact on performance.
This workaround is the same as workarounds for errata 1262606 and
1275112, so all 3 have been combined into one function call.
SDEN can be found here:
https://documentation-service.arm.com/static/5f2bed6d60a93e65927bc8e7
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: I7f2f9965f495540a1f84bb7dcc28aff45d6cee5d
Reported the status (applies, missing) of AT speculative workaround
which is applicable for below CPUs.
+---------+--------------+
| Errata | CPU |
+=========+==============+
| 1165522 | Cortex-A76 |
+---------+--------------+
| 1319367 | Cortex-A72 |
+---------+--------------+
| 1319537 | Cortex-A57 |
+---------+--------------+
| 1530923 | Cortex-A55 |
+---------+--------------+
| 1530924 | Cortex-A53 |
+---------+--------------+
Also, changes are done to enable common macro 'ERRATA_SPECULATIVE_AT'
if AT speculative errata workaround is enabled for any of the above
CPUs using 'ERRATA_*' CPU specific build macro.
Signed-off-by: Manish V Badarkhe <Manish.Badarkhe@arm.com>
Change-Id: I3e6a5316a2564071f3920c3ce9ae9a29adbe435b
Cortex A76 erratum 1800710 is a Cat B erratum, present in older
revisions of the Cortex A76 processor core. The workaround is to
set a bit in the ECTLR_EL1 system register, which disables allocation
of splintered pages in the L2 TLB.
This errata is explained in this SDEN:
https://static.docs.arm.com/sden885749/g/Arm_Cortex_A76_MP052_Software_Developer_Errata_Notice_v20.pdf
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: Ifc34f2e9e053dcee6a108cfb7df7ff7f497c9493
Cortex A76 erratum 1791580 is a Cat B erratum present in earlier
revisions of the Cortex A76. The workaround is to set a bit in the
implementation defined CPUACTLR2 register, which forces atomic store
operations to write-back memory to be performed in the L1 data cache.
This errata is explained in this SDEN:
https://static.docs.arm.com/sden885749/g/Arm_Cortex_A76_MP052_Software_Developer_Errata_Notice_v20.pdf
Signed-off-by: John Powell <john.powell@arm.com>
Change-Id: Iefd58159b3f2e2286138993317b98e57dc361925
Even though ERET always causes a jump to another address, aarch64 CPUs
speculatively execute following instructions as if the ERET
instruction was not a jump instruction.
The speculative execution does not cross privilege-levels (to the jump
target as one would expect), but it continues on the kernel privilege
level as if the ERET instruction did not change the control flow -
thus execution anything that is accidentally linked after the ERET
instruction. Later, the results of this speculative execution are
always architecturally discarded, however they can leak data using
microarchitectural side channels. This speculative execution is very
reliable (seems to be unconditional) and it manages to complete even
relatively performance-heavy operations (e.g. multiple dependent
fetches from uncached memory).
This was fixed in Linux, FreeBSD, OpenBSD and Optee OS:
679db7080129fb48ace43a08873eceabfd092aa1
It is demonstrated in a SafeSide example:
https://github.com/google/safeside/blob/master/demos/eret_hvc_smc_wrapper.cchttps://github.com/google/safeside/blob/master/kernel_modules/kmod_eret_hvc_smc/eret_hvc_smc_module.c
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Change-Id: Iead39b0b9fb4b8d8b5609daaa8be81497ba63a0f
Some cores support only AArch64 mode. In those cores, only a limited
subset of the AArch32 system registers are implemented. Hence, if TF-A
is supposed to run on AArch64-only cores, it must be compiled with
CTX_INCLUDE_AARCH32_REGS=0.
Currently, the default settings for compiling TF-A are with the AArch32
system registers included. So, if we compile TF-A the default way and
attempt to run it on an AArch64-only core, we only get a runtime panic.
Now a compile-time check has been added to ensure that this flag has the
appropriate value when AArch64-only cores are included in the build.
Change-Id: I298ec550037fafc9347baafb056926d149197d4c
Signed-off-by: John Tsichritzis <john.tsichritzis@arm.com>
The workaround for Cortex-A76 errata #1286807 is implemented
in this patch.
Change-Id: I6c15af962ac99ce223e009f6d299cefb41043bed
Signed-off-by: Soby Mathew <soby.mathew@arm.com>
The workarounds for errata 1257314, 1262606, 1262888 and 1275112 are
added to the Cortex-A76 cpu specific file. The workarounds are disabled
by default and have to be explicitly enabled by the platform integrator.
Change-Id: I70474927374cb67725f829d159ddde9ac4edc343
Signed-off-by: Soby Mathew <soby.mathew@arm.com>
This patch fixes this issue:
https://github.com/ARM-software/tf-issues/issues/660
The introduced changes are the following:
1) Some cores implement cache coherency maintenance operation on the
hardware level. For those cores, such as - but not only - the DynamIQ
cores, it is mandatory that TF-A is compiled with the
HW_ASSISTED_COHERENCY flag. If not, the core behaviour at runtime is
unpredictable. To prevent this, compile time checks have been added and
compilation errors are generated, if needed.
2) To enable this change for FVP, a logical separation has been done for
the core libraries. A system cannot contain cores of both groups, i.e.
cores that manage coherency on hardware and cores that don't do it. As
such, depending on the HW_ASSISTED_COHERENCY flag, FVP includes the
libraries only of the relevant cores.
3) The neoverse_e1.S file has been added to the FVP sources.
Change-Id: I787d15819b2add4ec0d238249e04bf0497dc12f3
Signed-off-by: John Tsichritzis <john.tsichritzis@arm.com>
Under certain near idle conditions, DSU may miss response transfers on
the ACE master or Peripheral port, leading to deadlock. This workaround
disables high-level clock gating of the DSU to prevent this.
Change-Id: I820911d61570bacb38dd325b3519bc8d12caa14b
Signed-off-by: Louis Mayencourt <louis.mayencourt@arm.com>
Switched from a static check to a runtime assert to make sure a
workaround is implemented for CVE_2018_3639.
This allows platforms that know they have the SSBS hardware workaround
in the CPU to compile out code under DYNAMIC_WORKAROUND_CVE_2018_3639.
The gain in memory size without the dynamic workaround is 4KB in bl31.
Change-Id: I61bb7d87c59964b0c7faac5d6bc7fc5c4651cbf3
Signed-off-by: Ambroise Vincent <ambroise.vincent@arm.com>
Concurrent instruction TLB miss and mispredicted return instruction
might fetch wrong instruction stream. Set bit 6 of CPUACTLR_EL1 to
prevent this.
Change-Id: I2da4f30cd2df3f5e885dd3c4825c557492d1ac58
Signed-off-by: Louis Mayencourt <louis.mayencourt@arm.com>
Streaming store under specific conditions might cause deadlock or data
corruption. Set bit 25:24 of CPUECTLR_EL1, which disables write
streaming to the L2 to prevent this.
Change-Id: Ib5cabb997b35ada78b27e75787afd610ea606dcf
Signed-off-by: Louis Mayencourt <louis.mayencourt@arm.com>
TLBI VAAE1 or TLBI VAALE1 targeting a page within hardware page
aggregated address translation data in the L2 TLB might cause
corruption of address translation data. Set bit 59 of CPUACTLR2_EL1 to
prevent this.
Change-Id: I59f3edea54e87d264e0794f5ca2a8c68a636e586
Signed-off-by: Louis Mayencourt <louis.mayencourt@arm.com>
Enforce full include path for includes. Deprecate old paths.
The following folders inside include/lib have been left unchanged:
- include/lib/cpus/${ARCH}
- include/lib/el3_runtime/${ARCH}
The reason for this change is that having a global namespace for
includes isn't a good idea. It defeats one of the advantages of having
folders and it introduces problems that are sometimes subtle (because
you may not know the header you are actually including if there are two
of them).
For example, this patch had to be created because two headers were
called the same way: e0ea0928d5 ("Fix gpio includes of mt8173 platform
to avoid collision."). More recently, this patch has had similar
problems: 46f9b2c3a2 ("drivers: add tzc380 support").
This problem was introduced in commit 4ecca33988 ("Move include and
source files to logical locations"). At that time, there weren't too
many headers so it wasn't a real issue. However, time has shown that
this creates problems.
Platforms that want to preserve the way they include headers may add the
removed paths to PLAT_INCLUDES, but this is discouraged.
Change-Id: I39dc53ed98f9e297a5966e723d1936d6ccf2fc8f
Signed-off-by: Antonio Nino Diaz <antonio.ninodiaz@arm.com>
The Armv8.5 extensions introduces PSTATE.SSBS (Speculation Store Bypass
Safe) bit to mitigate against Variant 4 vulnerabilities. Although an
Armv8.5 feature, this can be implemented by CPUs implementing earlier
version of the architecture.
With this patch, when both PSTATE.SSBS is implemented and
DYNAMIC_WORKAROUND_CVE_2018_3639 is active, querying for
SMCCC_ARCH_WORKAROUND_2 via. SMCCC_ARCH_FEATURES call would return 1 to
indicate that mitigation on the PE is either permanently enabled or not
required.
When SSBS is implemented, SCTLR_EL3.DSSBS is initialized to 0 at reset
of every BL stage. This means that EL3 always executes with mitigation
applied.
For Cortex A76, if the PE implements SSBS, the existing mitigation (by
using a different vector table, and tweaking CPU ACTLR2) is not used.
Change-Id: Ib0386c5714184144d4747951751c2fc6ba4242b6
Signed-off-by: Jeenu Viswambharan <jeenu.viswambharan@arm.com>
If the system is in near idle conditions, this erratum could cause a
deadlock or data corruption. This patch applies the workaround that
prevents this.
This DSU erratum affects only the DSUs that contain the ACP interface
and it was fixed in r2p0. The workaround is applied only to the DSUs
that are actually affected.
Link to respective Arm documentation:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.epm138168/index.html
Change-Id: I033213b3077685130fc1e3f4f79c4d15d7483ec9
Signed-off-by: John Tsichritzis <john.tsichritzis@arm.com>
Check_vector_size checks if the size of the vector fits
in the size reserved for it. This check creates problems in
the Clang assembler. A new macro, end_vector_entry, is added
and check_vector_size is deprecated.
This new macro fills the current exception vector until the next
exception vector. If the size of the current vector is bigger
than 32 instructions then it gives an error.
Change-Id: Ie8545cf1003a1e31656a1018dd6b4c28a4eaf671
Signed-off-by: Roberto Vargas <roberto.vargas@arm.com>
The Cortex-A76 implements SMCCC_ARCH_WORKAROUND_2 as defined in
"Firmware interfaces for mitigating cache speculation vulnerabilities
System Software on Arm Systems"[0].
Dynamic mitigation for CVE-2018-3639 is enabled/disabled by
setting/clearning bit 16 (Disable load pass store) of `CPUACTLR2_EL1`.
NOTE: The generic code that implements dynamic mitigation does not
currently implement the expected semantics when dispatching an SDEI
event to a lower EL. This will be fixed in a separate patch.
[0] https://developer.arm.com/cache-speculation-vulnerability-firmware-specification
Change-Id: I8fb2862b9ab24d55a0e9693e48e8be4df32afb5a
Signed-off-by: Dimitris Papastamos <dimitris.papastamos@arm.com>
Both Cortex-Ares and Cortex-A76 CPUs use the ARM DynamIQ Shared Unit
(DSU). The power-down and power-up sequences are therefore mostly
managed in hardware, and required software operations are simple.
Change-Id: I3a9447b5bdbdbc5ed845b20f6564d086516fa161
Signed-off-by: Isla Mitchell <isla.mitchell@arm.com>