mirror of
https://github.com/ARM-software/arm-trusted-firmware.git
synced 2025-04-15 00:54:22 +00:00
docs: Add RAS framework documentation
Change-Id: Ibf2b21b12ebc0af5815fc6643532a3be9100bf02 Signed-off-by: Jeenu Viswambharan <jeenu.viswambharan@arm.com>
This commit is contained in:
parent
e31d76fd16
commit
63eb2410c1
4 changed files with 263 additions and 2 deletions
2
docs/draw.io/ras.svg
Normal file
2
docs/draw.io/ras.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 22 KiB |
1
docs/draw.io/ras.xml
Normal file
1
docs/draw.io/ras.xml
Normal file
|
@ -0,0 +1 @@
|
|||
<mxfile userAgent="Mozilla/5.0 (X11; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0" version="9.4.6" editor="www.draw.io" type="device"><diagram id="f2d74f7d-2b47-d0f0-3260-3a0b726db48c" name="Page-1">5VtLc6M4EP41rto9rAs9eB2TjD0zVbtVqclhd05TipFtNhhRQk7i/fUrQGBAECsJYGcml0Ajgfi6++tWN56hm93zZ06S7V8soNEMWsHzDH2aQQgtx5H/MsmhkABgeYVkw8NAyY6Cu/A/qoSWku7DgKaNgYKxSIRJU7hicUxXoiEjnLOn5rA1i5pPTciGaoK7FYl06d9hILblazj+8cIXGm626tEedIsL92T1sOFsH6vnzSBa53/F5R0p76VeNN2SgD3VRGgxQzecMVEc7Z5vaJSBW8JWzFv2XK3WzWksjCaoGY8k2tNyyfnCxKEEI38dmk2wZuj6aRsKepeQVXb1SepfyrZiF8kzIA9TwdlDBZp8n+t1GEU3LGJcnscsltOuA5Ju8xtmM/Qll2uiXNDnmki9wmfKdlTwgxyirmKs4CztrYT36ag8WI7Z1vSGHCUkymA21b2PoMkDhVsPhug0hjSQ9qVOGRdbtmExiRZH6XUT5Rqi/1IhDspFyF4wKTre4U/Gkl7ke5FN2Z6v1MrUWgXhG1qiUoiyNb+IPqcREeFj02e6kFRTb1koF1JpDcGW1ryWMoo1qVktfVTLMFIRxh0qciKR2WL4KA83IserEN3ztkTevzFuOO2m8iXFVUZZUrCKSJqGq1K8DKNyGI2DcpDyISlR163JjGRsi2i78XgGATV7SOTKf1DyY0viIKL8t9/H5EEZFj7hBVjaw/AfaHuSo/NfNabOf3gA+kMGIWQoB1G2X/eOCawf6RRZDhvOI4zh1tDmJP2ohos9A8N1RjJcaH/kuI07jHJwmjaFUg+vmVFKmqac7xMxmW0ulwvb98exTeT7c3s668R6iDpD3vF2Iz6VsRgZuT1N3oH8uef6wIPAdgCSim4oHgNnXq5k+ExEJ6HF84omImSxFH/JHCeMN/JwycmOPjH+MJUPFSNrV6z8b6CUBTQx97yms3UFAsuaWxC7HgLYxbbvIt3zAIBzu1/9ps7nGPheFIVJSk8DTtKkKFGsw+dMSa8hfHM8oY3mwIE+9oGPbey6TgtPt7LhRlJYotUAcRAM3en4651poUSdH/5Rk/KT79nJ3DagsTdGb1uP3s65gje41DzoJb0YoezrKONRIoo0EXKoDUiywJD2Bxzc2rBhG9Y1dnp8s6wnD4oVvDUI+R0WoJVHvl3d9dVDtLEJDxkPRYZJRB/z8rDKAd9cZHlVlHs5ptWNbNQoJ2829yGq4hxuadHryigNwpw9RNnSugi3f8nJX0GudR/3p8kaoY/mjpur0cUetkHLpUE7ivYkja8lD+i0NyatGv+J8TI96CeP009rF+UKwh2kKNdFQ8NZZKPrkJ3cEiE3qHEugRZ8V6wy3E8ZhS3g6TYNBjdq44LTuDxxAfW9LrzPV+ArV1PDO4+8FpUg8fyJK8aD9H0BctJSiguaHFIVSCYp83X1fi4rvVU7jTL2GdttR35bvtzYwc/1W0GlrarhKiRADwuFQ1T1xXSWt/3J4QP7BOroWY/nEzqnF5Cue4tM8rVEF1yt1n5Ht59E4SYLsiua6UsKMpDCFYmu1IVdGARRXyWlqcEBgG/XdX1D3OEAuCOTwtJAGc4FBNau5BwNXtM1NvquRmWx9QzLXedXaaFE0OxJcZC9IGf3tLZBDYfZnk7aZGszzaRNNtRR31Z5TFUQGA9JZfYDgOibtNhHA1FPCX9q2oAdtOGejTbgadq4DtOsxl+yRsTYwz45A22MmqBM2/9Eev/gpyCOqWEcd/czLluYfSfa+FSngzrKDfUZSic6ddwtvuVajhLKx97Cj/mlzqRbeGRQGHxze6CyWvPOwNDfL7/cJkCup0M9UZMA6xvFXwl5eEbkuz4MycrVJBt1dS8pVNPEZezRh9ZRqwEBbJ15ur7tH2K/jrv261qz82tZhzLsZP6carJbX5z4eDo16bliTSfWbdWM/iUUkX3OZjd10RGsR9PFhE28KfaqrazxLb+nsV9U1x/WHMCmvkqGe2eN3sa4ZQie39JwT/NWu5UD27eyDQv+r+1uA7c70exb2Ynxfd3tcjlsvU7pu7/h1LOkxZdlFqZvv+pJ/sWyjkYxHcbea8bAhU09uKNxjjw9/kKyUNfxd6ho8T8=</diagram></mxfile>
|
|
@ -2646,8 +2646,8 @@ you can keep the sample implementation here (which calls ``console_flush()``).
|
|||
If you're trying to debug crashes in BL1, you can call the console_xx_core_flush
|
||||
function exported by some console drivers from here.
|
||||
|
||||
Extternal Abort handling and RAS Support
|
||||
----------------------------------------
|
||||
External Abort handling and RAS Support
|
||||
---------------------------------------
|
||||
|
||||
Function : plat_ea_handler
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
258
docs/ras.rst
Normal file
258
docs/ras.rst
Normal file
|
@ -0,0 +1,258 @@
|
|||
RAS support in Trusted Firmware-A
|
||||
=================================
|
||||
|
||||
.. section-numbering::
|
||||
:suffix: .
|
||||
|
||||
.. contents::
|
||||
:depth: 2
|
||||
|
||||
.. |EHF| replace:: Exception Handling Framework
|
||||
.. |TF-A| replace:: Trusted Firmware-A
|
||||
|
||||
This document describes |TF-A| support for Arm Reliability, Availability, and
|
||||
Serviceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
|
||||
later CPUs, and also an optional extension to the base Armv8.0 architecture.
|
||||
|
||||
In conjunction with the |EHF|, support for RAS extension enables firmware-first
|
||||
paradigm for handling platform errors, in which exceptions resulting from
|
||||
errors—viz. Synchronous External Abort (SEA), Asynchronous External Abort
|
||||
(signalled as SErrors), Fault Handling and Error Recovery interrupts are routed
|
||||
to and handled in EL3. The |EHF| document mentions various `error handling
|
||||
use-cases`__.
|
||||
|
||||
.. __: exception-handling.rst#delegation-use-cases
|
||||
|
||||
For the description of Arm RAS extensions, Standard Error Records, and the
|
||||
precise definition of RAS terminology, please refer to the Arm Architecture
|
||||
Reference Manual. The rest of this document assumes familiarity with
|
||||
architecture and terminology.
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
As mentioned above, the RAS support in |TF-A| enables routing to and handling of
|
||||
exceptions resulting from platform errors in EL3. It allows the platform to
|
||||
define an External Abort handler, and to register RAS nodes and interrupts. RAS
|
||||
framework also provides `helpers`__ for accessing Standard Error Records as
|
||||
introduced by the RAS extensions.
|
||||
|
||||
.. __: `Standard Error Record helpers`_
|
||||
|
||||
The build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
|
||||
time firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
|
||||
be set ``1``.
|
||||
|
||||
.. _ras-figure:
|
||||
|
||||
.. image:: draw.io/ras.svg
|
||||
|
||||
See more on `Engaging the RAS framework`_.
|
||||
|
||||
Platform APIs
|
||||
-------------
|
||||
|
||||
The RAS framework allows the platform to define handlers for External Abort,
|
||||
Uncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
|
||||
refer to the porting guide for the `RAS platform API descriptions`__.
|
||||
|
||||
.. __: porting-guide.rst#external-abort-handling-and-ras-support
|
||||
|
||||
Registering RAS error records
|
||||
-----------------------------
|
||||
|
||||
RAS nodes are components in the system capable of signalling errors to PEs
|
||||
through one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS
|
||||
nodes contain one or more error records, which are registers through which the
|
||||
nodes advertise various properties of the signalled error. Arm recommends that
|
||||
error records are implemented in the Standard Error Record format. The RAS
|
||||
architecture allows for error records to be accessible via. system or
|
||||
memory-mapped registers.
|
||||
|
||||
The platform should enumerate the error records providing for each of them:
|
||||
|
||||
- A handler to probe error records for errors;
|
||||
- When the probing identifies an error, a handler to handle it;
|
||||
- For memory-mapped error record, its base address and size in KB; for a system
|
||||
register-accessed record, the start index of the record and number of
|
||||
continuous records from that index;
|
||||
- Any node-specific auxiliary data.
|
||||
|
||||
With this information supplied, when the run time firmware receives one of the
|
||||
notification mechanisms, the RAS framework can iterate through and probe error
|
||||
records for error, and invoke the appropriate handler to handle it.
|
||||
|
||||
The RAS framework provides the macros to populate error record information. The
|
||||
macros are versioned, and the latest version as of this writing is 1. These
|
||||
macros create a structure of type ``struct err_record_info`` from its arguments,
|
||||
which are later passed to probe and error handlers.
|
||||
|
||||
For memory-mapped error records:
|
||||
|
||||
.. code:: c
|
||||
|
||||
ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
|
||||
|
||||
And, for system register ones:
|
||||
|
||||
.. code:: c
|
||||
|
||||
ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
|
||||
|
||||
The probe handler must have the following prototype:
|
||||
|
||||
.. code:: c
|
||||
|
||||
typedef int (*err_record_probe_t)(const struct err_record_info *info,
|
||||
int *probe_data);
|
||||
|
||||
The probe handler must return a non-zero value if an error was detected, or 0
|
||||
otherwise. The ``probe_data`` output parameter can be used to pass any useful
|
||||
information resulting from probe to the error handler (see `below`__). For
|
||||
example, it could return the index of the record.
|
||||
|
||||
.. __: `Standard Error Record helpers`_
|
||||
|
||||
The error handler must have the following prototype:
|
||||
|
||||
.. code:: c
|
||||
|
||||
typedef int (*err_record_handler_t)(const struct err_record_info *info,
|
||||
int probe_data, const struct err_handler_data *const data);
|
||||
|
||||
The ``data`` constant parameter describes the various properties of the error,
|
||||
viz. the reason for the error, exception syndrome, and also ``flags``,
|
||||
``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
|
||||
|
||||
.. __: interrupt-framework-design.rst#el3-interrupts
|
||||
|
||||
The platform is expected populate an array using the macros above, and register
|
||||
the it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
|
||||
passing it the name of the array describing the records. Note that the macro
|
||||
must be used in the same file where the array is defined.
|
||||
|
||||
Standard Error Record helpers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The |TF-A| RAS framework provides probe handlers for Standard Error Records, for
|
||||
both memory-mapped and System Register accesses:
|
||||
|
||||
.. code:: c
|
||||
|
||||
int ras_err_ser_probe_memmap(const struct err_record_info *info,
|
||||
int *probe_data);
|
||||
|
||||
int ras_err_ser_probe_sysreg(const struct err_record_info *info,
|
||||
int *probe_data);
|
||||
|
||||
When the platform enumerates error records, for those records in the Standard
|
||||
Error Record format, these helpers maybe used instead of rolling out their own.
|
||||
Both helpers above:
|
||||
|
||||
- Return non-zero value when an error is detected in a Standard Error Record;
|
||||
- Set ``probe_data`` to the index of the error record upon detecting an error.
|
||||
|
||||
Registering RAS interrupts
|
||||
--------------------------
|
||||
|
||||
RAS nodes can signal errors to the PE by raising Fault Handling and/or Error
|
||||
Recovery interrupts. For the firmware-first handling paradigm for interrupts to
|
||||
work, the platform must setup and register with |EHF|. See `Interaction with
|
||||
Exception Handling Framework`_.
|
||||
|
||||
For each RAS interrupt, the platform has to provide structure of type ``struct
|
||||
ras_interrupt``:
|
||||
|
||||
- Interrupt number;
|
||||
- The associated error record information (pointer to the corresponding
|
||||
``struct err_record_info``);
|
||||
- Optionally, a cookie.
|
||||
|
||||
The platform is expected to define an array of ``struct ras_interrupt``, and
|
||||
register it with the RAS framework using the macro
|
||||
``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
|
||||
macro must be used in the same file where the array is defined.
|
||||
|
||||
The array of ``struct ras_interrupt`` must be sorted in the increasing order of
|
||||
interrupt number. This allows for fast look of handlers in order to service RAS
|
||||
interrupts.
|
||||
|
||||
Double-fault handling
|
||||
---------------------
|
||||
|
||||
A Double Fault condition arises when an error is signalled to the PE while
|
||||
handling of a previously signalled error is still underway. When a Double Fault
|
||||
condition arises, the Arm RAS extensions only require for handler to perform
|
||||
orderly shutdown of the system, as recovery may be impossible.
|
||||
|
||||
The RAS extensions part of Armv8.4 introduced new architectural features to deal
|
||||
with Double Fault conditions, specifically, the introduction of ``NMEA`` and
|
||||
``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
|
||||
software which runs part of its entry/exit routines with exceptions momentarily
|
||||
masked—meaning, in such systems, External Aborts/SErrors are not immediately
|
||||
handled when they occur, but only after the exceptions are unmasked again.
|
||||
|
||||
|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
|
||||
This means that all exceptions routed to EL3 are handled immediately. |TF-A|
|
||||
thus is able to detect a Double Fault conditions in software, without needing
|
||||
the intended advantages of Armv8.4 Double Fault architecture extensions.
|
||||
|
||||
Double faults are fatal, and terminate at the platform double fault handler, and
|
||||
doesn't return.
|
||||
|
||||
Engaging the RAS framework
|
||||
--------------------------
|
||||
|
||||
Enabling RAS support is a platform choice conjunctional of three distinct but
|
||||
related build options:
|
||||
|
||||
- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
|
||||
|
||||
- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
|
||||
`Interaction with Exception Handling Framework`_;
|
||||
|
||||
- ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
|
||||
EL3.
|
||||
|
||||
The RAS support in |TF-A| introduces a default implementation of
|
||||
``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
|
||||
is set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
|
||||
top-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
|
||||
to through platform-supplied error records, probe them, and when an error is
|
||||
identified, look up and invoke the corresponding error handler.
|
||||
|
||||
Note that, if the platform chooses to override the ``plat_ea_handler`` function
|
||||
and intend to use the RAS framework, it must explicitly call
|
||||
``ras_ea_handler()`` from within.
|
||||
|
||||
Similarly, for RAS interrupts, the framework defines
|
||||
``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
|
||||
when a RAS interrupt taken at EL3. The function bisects the platform-supplied
|
||||
sorted array of interrupts to look up the error record information associated
|
||||
with the interrupt number. That error handler for that record is then invoked to
|
||||
handle the error.
|
||||
|
||||
Interaction with Exception Handling Framework
|
||||
---------------------------------------------
|
||||
|
||||
As mentioned in earlier sections, RAS framework interacts with the |EHF| to
|
||||
arbitrate handling of RAS exceptions with others that are routed to EL3. This
|
||||
means that the platform must partition a `priority level`__ for handling RAS
|
||||
exceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
|
||||
priority level used for RAS exceptions. Platforms would typically want to
|
||||
allocate the highest secure priority for RAS handling.
|
||||
|
||||
.. __: exception-handling.rst#partitioning-priority-levels
|
||||
|
||||
Handling of both `interrrupt`__ and `non-interrupt`__ exceptions follow the
|
||||
sequences outlined in the |EHF| documentation. I.e., for interrupts, the
|
||||
priority management is implicit; but for non-interrupt exceptions, they're
|
||||
explicit using `EHF APIs`__.
|
||||
|
||||
.. __: exception-handling.rst#interrupt-flow
|
||||
.. __: exception-handling.rst#non-interrupt-flow
|
||||
.. __: exception-handling.rst#activating-and-deactivating-priorities
|
||||
|
||||
----
|
||||
|
||||
*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*
|
Loading…
Add table
Reference in a new issue