When conditions are right, eg a small memcpy of a known size and
alignment, the compiler may know of a sequence that is optimal for the
given constraints and inline it. If the compiler doesn't find one, it
will emit a call to the generic function (in the libc) which will
implement this in the most generic and unconstrained manner. That
generic function is rarely the most optimal when constraints are known.
So give the compiler a chance to do this. Replace calls to libc
functions that have builtins to the builtin and keep the generic
implementation if it decides to emit a call anyway.
And example of this in action is usage of FEAT_MOPS. When the compiler
is aware of the feature (-march=armv8.8-a) then it will emit the 3 MOPS
instructions instead of calls to our memcpy() and memset()
implementations.
Change-Id: I9860cfada1d941b613ebd4da068e9992c387952e
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>