To accelerate algorithms on multi-core systems, you must first identify the code within the application that can be parallelized, then figure out how to parallelize it.
Performance portability means that code can achieve good performance across a range of computer architectures while maintaining a single body of source code.