Improving processing performance with Parallel.ForEach
By Rod McBride
Parallelization allows computers to run more than one process at once, which is invaluable for business use cases. The applications we depend on every day rely on multitasking to handle data efficiently. Today, processing can be distributed using cloud resources to harness remote computing power — unleashing technological transformation in organizations.
In a .NET development context, improving the performance of a host computer increases an application’s power and helps it handle a larger data workload. Parallel.ForEach is part of the .NET toolkit and allows developers to leverage parallelization.
How Parallel.ForEach increases performance
The .NET Framework Task Parallel Library (TPL) can significantly increase processing performance by using all available cores on the host computer more efficiently. With the typical execution model, a task (i.e., a unit of work) executes sequentially on a single CPU core.
However, for a long-running task, you could also leverage parallelization to distribute work across multiple processors to improve processing time.
Let’s walk through an example.
Say a CPU has four cores and eight logical processors, and the task is record deletion. Over a million records were added to a table in Dynamics 365 online in error, and they need to be removed.
There are multiple ways to accomplish the task. However, the standard Bulk Delete process could take a long time or error out. You could use CRM SDK and LINQPad to run a delete script. (That process would retrieve the list of IDs to delete, then loop through the result set and delete the records.) But that would also be slow work, taking about an hour to delete each batch of 50,000 records.
Instead, the deletion could be split into tasks that are executed efficiently on their own: parallelism with the TPL.
The TPL provides a basic form of structured parallelism via three static methods in the Parallel Class:
- Parallel.Invoke executes each of the provided actions, possibly in parallel.
- Parallel.For executes a for loop in which iterations may run in parallel.
- Parallel.ForEach executes a for loop in which iterations may run in parallel.
For this task, the Parallel.ForEach method could be run with only minor changes to the original query:
var entities = history.Where(e => e.tls_User.Id == userId)
.Select(e => new { e.Id })
.ToList();
foreach(var e in entities)
{
Delete("history", e.Id);
}
Listing 1: Original query
var entities = history.Where(e => e.tls_User.Id == userId)
.Select(e => new { e.Id })
.ToList();
Parallel.ForEach(entities, (e) => {
Delete("history", e.Id);
});
Results of using Parallel.ForEach
Switching from the standard C# foreach to Parallel.ForEach made the query run almost 10x faster. Instead of ~833 records per minute (~50K an hour), the revised query processed ~8K records per minute (~480K an hour).
LINQ queries can also be parallelized. Add AsParallel() method and then parallelize the foreach using the ForAll() method, like this:
"abcdef".AsParallel().Select(c
=> char.ToUpper(c)).ForAll(Console.Write);
Performance improvements may not always be this dramatic. It depends on several factors, such as the number of CPUs, the iterations involved, the type of parallelism (e.g., data, task or dataflow) and whether it’s an embarrassingly parallel problem.
However, programs that are properly designed for parallelism can execute faster than their sequential counterparts, which is often a significant improvement.
How Wipfli can help
If you need to improve the efficiency of your business computing environment, Wipfli can help. Our technology professionals support organizations with digital solutions, custom programming and overall transformation. Contact Wipfli if you have questions about your computing environment and how to maximize performance.
Related content: