Nested loops occur very often in numerical problems. The aim of the presentation is to show various strategies of parallelizing nested loops on modern architectures -- like Intel Xeon and Intel Xeon Phi. We employ both parallelism and vectorization to accelerate nested loops. It is possible to shorten the runtime when utlilizing the appropriate strategies with the use of good scheduling.
We do it on the example of the WZ factorization. In the WZ factorization the outermost parallel loop decreases the number of iterations executed at each step and this changes the amount of parallelism in each step what makes the problem more interesting.