C#: .NET Parallel Programming (TPL)

Introduction#

This article aims to provide an in-depth understanding of multithreading, asynchronous programming, tasks, and parallel computing, collectively referred to as Parallel Programming (Task Parallel Library).

Multithreading and Asynchronous Programming#

Multithreading and asynchronous programming are two different concepts. Confusing them can lead to the following erroneous code:

void button1_Click()
{
    new Thread(() =>
    {
        var client = new WebClient();
        var content = client.DownloadString("https://myesn.cn");
        Console.WriteLine(content);
    }).Start();
}

The above code creates a thread to download web content when the button is clicked, aiming to avoid blocking the UI thread.

However, this is an inefficient implementation. To understand this, we need to start with the principles of computer architecture. In the hardware of a computer, many hardware subsystems support "IO operations with DMA (Direct Memory Access)", which is Direct Memory Access. The asynchronous programming model provided by CLR in .NET allows us to fully utilize the hardware DMA capabilities to reduce the pressure on the CPU.

The graphical representation of the above code is as follows:
create a thread to download the web content

The inefficiency arises because the Thread occupies CPU resources while executing its internal code until it completes. If we implement it using an asynchronous approach:

void button1_Click()
{
    var client = new WebClient();
    client.DownloadStringCompleted += (sender, e) =>
    {
        Console.WriteLine(e.Result);
    };
    client.DownloadStringAsync(new Uri("https://myesn.cn"));
}

The modified code adopts an asynchronous model, with the underlying management using a thread pool (CLR Thread Pool). When the asynchronous operation starts, the CLR delegates the web download operation to a thread in the thread pool. When the IO operation begins, the asynchronous operation returns the working thread to the thread pool, no longer occupying CPU resources. After the asynchronous operation is complete, the WebClient notifies the completion event, allowing the CLR to respond to the completion of the asynchronous operation. Thus, the asynchronous model significantly saves CPU resources by leveraging the thread pool. The graphical representation is as follows:
use asynchrony to download web content

Therefore, the execution flow of multithreading and asynchronous programming is roughly as follows:
multi-threaded and asynchronous execution flow

Applicable scenarios for both: Use multithreading for CPU-intensive tasks and asynchronous programming for I/O-intensive tasks (reading/writing and data transmission).

Anything related to reading/writing and data transmission falls under I/O-intensive tasks; otherwise, it is CPU-intensive, also known as compute-intensive.

Thread Synchronization#

In a multithreaded environment, thread synchronization ensures safe access to shared resources, preventing multiple threads from simultaneously modifying shared resources, which can lead to data inconsistency or other issues. Locking mechanisms are typically used to achieve thread synchronization.

In object-oriented languages, data types can be divided into value types and reference types. Value types include integers, floating-point numbers, structures, etc., while reference types refer to references pointing to objects, such as classes and arrays.

In most cases, we can implement thread synchronization on reference types, meaning we can ensure safe access to shared resources by locking the object representing the shared resource. This can be achieved using built-in locking mechanisms (such as using the lock keyword or synchronized) or other synchronization tools.

However, due to the copy behavior of value types and each thread having its own stack, value types cannot be directly locked and waited on. Each thread operates on its own copy of the value type variable, without interfering with each other.

In C#, the lock keyword, which is a syntactic sugar provided by Microsoft, is actually a simplified usage of the Monitor class. The lock keyword automatically calls the Monitor.Enter and Monitor.Exit methods to lock and unlock an object.

Signal synchronization is a mechanism for inter-thread communication, used to coordinate and synchronize operations among multiple threads. It ensures that threads wait under specific conditions and notify other threads to continue execution when conditions are met. The types involved in the signal synchronization mechanism inherit from the abstract class WaitHandle, and their relationships are as follows:

signal synchronization

EventWaitHandle is a boolean value generated by the operating system kernel, indicating the blocking state. Calling the Set method can set it to true (signaled true, non-signaled false), releasing the thread from blocking. Both AutoResetEvent and ManualResetEvent are subclasses of EventWaitHandle.
- AutoResetEvent will automatically reset the blocking state to false after calling the Set method, waking only one waiting thread.
- ManualResetEvent will not automatically reset the blocking state after calling the Set method; all waiting threads will be awakened until the Reset method is called to reset the blocking state to false.
Semaphore maintains an integer variable generated by the system kernel as a counter. If the counter value is 0, it indicates waiting; if greater than 0, it releases the block and decrements the counter. It can limit the maximum number of threads that can wait during initialization.
Mutex addresses the ability to block and unlock threads across application domains. It also maintains a flag generated by the system kernel to synchronize access to shared resources. Only one thread can acquire the Mutex lock for access, while other threads must wait for the lock to be released.

An example using AutoResetEvent is as follows:

var test = new Test();
test.StartThread();

Console.ReadKey();

test.SendSignal();

Console.ReadKey();

class Test
{
  private AutoResetEvent _autoResetEvent { get; set; } = new AutoResetEvent(false);

  public void StartThread()
  {
    new Thread(() =>
    {
      Console.WriteLine("Thread 1 started, waiting for signal...");
      _autoResetEvent.WaitOne(); //todo: handle some complex work
      Console.WriteLine("Thread 1 continues working...");
    }).Start();

    new Thread(() =>
    {
        Console.WriteLine("Thread 2 started, waiting for signal...");
        _autoResetEvent.WaitOne(); //todo: handle some complex work
        Console.WriteLine("Thread 2 continues working...");
    }).Start();
  }

  public void SendSignal() => _autoResetEvent.Set();
}

First, create an AutoResetEvent instance and set its initial value to false, indicating no signal. Then call the StartThread function to start two threads, each of which waits for a signal. After calling the SendSignal function, it internally uses Set() to send a signal, at which point only one waiting thread will be awakened. To wake all waiting threads, use ManualResetEvent.

Can any reference type be locked?#

Locking is a thread synchronization mechanism that ensures only one thread can occupy shared resources during multithreaded access. However, not all objects can be used as locks.

When choosing a lock object, it is essential to note:

The lock object should be the same object visible across multiple threads.
In non-static methods, static variables should not be used as lock objects.
Value types cannot be used as lock objects; value types cannot be directly locked and waited on. Each thread operates on its own copy of the value type variable, without interfering with each other.
Avoid using strings as lock objects; strings exist as constants in memory, and when multiple variables are assigned the same string, they reference the same memory space.
Reduce the visibility of lock objects; strings are the most widely visible lock objects, followed by the result of typeof(class), as all instances of the class point to the result returned by typeof.

Static methods of a class should ensure thread safety, while non-static methods do not need to ensure thread safety.

Generally, lock objects should not be public variables or properties. Some commonly used collection types in .NET (implementations of System.Collections.ICollection), such as List, provide a public property SyncRoot to allow us to implement thread-safe collection operations. However, most application scenarios for collection operations are single-threaded. Thread synchronization itself is relatively time-consuming, and this field is exposed to allow the caller to decide whether thread safety is needed during operations. Still, in multithreaded scenarios, it is recommended to use thread-safe collections (under the System.Collections.Concurrent namespace): ConcurrentBag, ConcurrentDictionary, etc.

Thread's IsBackground#

In .NET, threads can be set to run in the foreground (default) or background, and each thread has an IsBackground property:

Foreground threads (false default): When all foreground threads complete, the application will exit immediately. Typically, we set critical tasks that need to be completed as foreground threads to ensure they are executed completely.
Background threads (true): Threads that will also end when the main application thread ends. When only background threads are running, the application will exit immediately without waiting for the background threads to complete. Typically, non-critical, auxiliary tasks are set as background threads to allow the application to exit faster, such as logging, monitoring, etc.

Threads do not start immediately#

Most operating systems are not real-time operating systems, including Windows. The execution of threads does not happen immediately; it is determined by the operating system based on its scheduling algorithm. Each thread is allocated a small amount of CPU time to perform work, so even if multiple threads are running simultaneously, it may feel like they are executing almost simultaneously. The system will decide at the appropriate time, based on the algorithm, which thread to schedule next.

Threads are not inherently part of the programming language; their scheduling is a very complex process. Switching between threads requires a certain amount of time and space, and it is not real-time. For example:

for (int i = 0; i < 10; i++)
{
	new Thread(() =>
	{
		Console.WriteLine(i);
	}).Start();
}

Output:
Untitled

From the output, we can see that threads do not start immediately (multiple threads printed the same i value, such as 5). When starting threads directly within a loop, each thread shares the same variable i. When a thread begins execution, it may happen that another thread has already modified i, leading to multiple threads potentially accessing the same i value.

This is because threads run on different CPU cores, and there is a multi-level structure between the CPU and memory: registers, L1 Cache, L2 Cache, L3 Cache, memory. If memory is not locked, a variable's value modified on one CPU core may not have been written back to memory yet, while another CPU reads the old value, resulting in dirty reads.

If we want to execute as expected (each thread responsible for receiving its own i value), we can encapsulate the behavior of starting threads into a function, passing the current i as a parameter each time the function is called, creating a new local variable i. This way, each thread has its own independent local variable i, unaffected by other threads. Thus, each thread can obtain the expected different i values:

for (int i = 0; i < 10; i++)
{
    StartThread(i);
}

void StartThread(int i)
{
    new Thread(() =>
    {
        Console.WriteLine(i);
    }).Start();
}

Thread Priority (ThreadPriority)#

Threads in C# have different priorities (ThreadPriority). All threads we start, including those from the ThreadPool and Task, default to Normal level. Priority relates to how the operating system schedules threads. The Windows system uses a preemptive scheduling model based on thread priority, where higher priority threads always get more CPU time and are executed first when ready (indicating the thread is prepared to start executing and is waiting for the operating system to schedule it, such as when the thread has been created and started without being blocked or suspended).

Generally, it is not recommended to modify thread priorities unless they are very critical threads. High-priority threads should be characterized by short run times and immediately entering a waiting state; otherwise, they may occupy CPU resources for extended periods, leading to various issues.

Canceling Running Threads (Thread Cancel)#

After a certain period, canceling an executing thread has implications:

Threads cannot start immediately, nor can they stop immediately. Regardless of the method used to notify a thread to stop, it will finish its most urgent tasks and then cancel the thread when it deems appropriate. For example, Thread.Abort will not throw a thread cancellation exception if the thread is executing unmanaged code; it will only raise the exception when the code returns to the CLR, and of course, the exception is not raised immediately.
Canceling a thread depends on whether the thread can respond to the stop request. Threads should provide a Canceled interface and check the Canceled status during work. The thread will only exit when it detects that Canceled is true.

.NET provides a standard cancellation pattern: Cooperative Cancellation, which is the mechanism mentioned in point 2 above, for example:

var cts = new CancellationTokenSource();
new Thread(() =>
{
    while (true)
    {
        if (cts.IsCancellationRequested)
        {
            Console.WriteLine("Thread canceled");
            break;
        }

        Thread.Sleep(100);
    }
}).Start();

Console.ReadKey();
cts.Cancel();

The main thread notifies the worker thread to exit through the Cancel method of CancellationTokenSource, and the worker thread checks for external Cancel signals at a fixed frequency and exits at the appropriate time. The worker thread itself plays a primary role in ensuring correct termination.

The Token of CancellationTokenSource has a Register method that is triggered when cts.Cancel() is called:

var cts = new CancellationTokenSource();
cts.Token.Register(() => Console.WriteLine("cts canceled"));

Console.ReadKey();
cts.Cancel();

The cancellation model for ThreadPool and Thread is the same.

Controlling the Number of Threads#

Data displayed in Task Manager > Performance > CPU can estimate that each process has an average of about 10 threads, so each program will not start too many threads:
Untitled 1

In network programming, using multithreading to open a thread for each socket connection to listen for requests can lead to an increase in the number of threads as the number of users grows. When the number of threads reaches a certain level, it can overwhelm the management of computer resources. Each thread requires a certain amount of memory space, and the memory limit for 32-bit systems is typically around 2GB - 3GB. When the number of threads reaches a certain level, it can exhaust all memory. Additionally, too many threads can lead to excessive CPU time spent on switching between threads, wasting a significant amount of CPU time. For I/O-intensive applications like sockets, it is more suitable to use asynchronous processing.

Creating too many threads can lead to excessive resource consumption, severely impacting performance and even causing system crashes. Moreover, the overhead of thread switching is significant, making it difficult for threads to get enough CPU time, resulting in long wait times for operations within threads.

In actual development, avoid creating too many threads and reasonably utilize thread pools or asynchronous methods to handle tasks to improve performance and reduce resource consumption. Asynchronous and thread pool technologies can efficiently manage a large number of threads, with the actual number of working threads being relatively low.

Thread Pool#

The space overhead of threads mainly comes from:

Thread Kernel Object: Each thread creates such an object, which mainly contains thread context information, occupying about 700 bytes of memory.
Thread Environment Block: Occupies 4KB of memory.
User Mode Stack, i.e., thread stack: The thread stack is used to save method parameters, local variables, and return values. Each thread stack occupies 1MB of memory. It is easy to exhaust this memory by writing a non-terminating recursive method that continuously consumes memory with method parameters and return values, quickly leading to an OutOfMemoryException.
Kernel Mode Stack: When calling kernel mode functions of the operating system, the system copies function parameters from the user mode stack to the kernel mode stack, occupying 12KB of memory.

The time overhead of threads comes from:

When a thread is created, the system sequentially initializes the above memory spaces (space overhead).
The CLR will call the DLLMain method of all loaded DLLs and pass connection flags (the DLL's DLLMain method will also be called when the thread terminates, passing the disconnection flag).
Thread context switching: A system can load many processes, and each process contains several threads. However, only one thread can execute on a CPU at any given time. To make each thread appear to be running, the system continuously switches the "thread context": each thread gets a time slice of about several milliseconds before switching to the next thread. This process can be divided into the following five steps:
1. Enter kernel mode.
2. Save context information (mainly some CPU register information) to the currently executing thread's kernel object.
3. The system acquires a Spinlock, determines the next thread to execute, and then releases the Spinlock. If the next thread is not in the same process, virtual address swapping is required.
4. Load context information from the kernel object of the thread to be executed.
5. Exit kernel mode.

Creating and destroying threads has both time and space costs. To manage thread usage, Microsoft provides thread pool technology. The thread pool recycles threads back into the pool after work is completed for use by other tasks, with the creation and destruction of threads determined by the CLR's algorithm. In actual projects, it is recommended to use the thread pool to manage threads. You can use ThreadPool and BackgroundWorker classes to implement this, which is simple and convenient, for example:

using System.ComponentModel;

// ThreadPool
ThreadPool.QueueUserWorkItem(state => Console.WriteLine("from ThreadPool"));

// BackgroundWorker
var bw = new BackgroundWorker();
bw.DoWork += (object? sender, DoWorkEventArgs e) => Console.WriteLine("from BackgroundWorker");
bw.RunWorkerAsync();

Console.ReadKey();

ThreadPool and BackgroundWorker are two different thread handling technologies, and they have some differences in usage scenarios and characteristics:

ThreadPool: The thread pool is a mechanism for managing and reusing threads by pre-creating a set of threads and scheduling and managing them to allocate available threads when tasks need to be executed. ThreadPool is suitable for parallel execution of a large number of short-lived small tasks, avoiding the overhead of frequently creating and destroying threads. You can use the ThreadPool.QueueUserWorkItem method or Task.Run method to add work items to the thread pool.
BackgroundWorker: BackgroundWorker is a component that encapsulates asynchronous operations, simplifying the process of using background threads to execute long-running tasks in WinForms and WPF applications. BackgroundWorker provides progress reporting and completion events, making it easy to communicate between the UI thread and background threads. It is suitable for situations where long-running tasks need to be executed on a background thread and also need to update the UI.

Task#

**[Task](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task?view=net-7.0) is a high-level abstraction for asynchronous programming provided in .NET 4+**. It can represent an asynchronous operation or a piece of code that can be scheduled, and tasks can be assigned to threads in the thread pool for execution. Task transcends ThreadPool and BackgroundWorker, providing more APIs to manage threads. You can use the Status property and the IsCanceled, IsCompleted, and IsFaulted properties to determine the status of a task:

// Create a CancellationTokenSource to simulate canceling a Task
var cts = new CancellationTokenSource();
Task.Run(() =>
{
    Console.WriteLine("I am an asynchronous thread...");
}).ContinueWith(t =>
{
    if (t.IsCanceled)
    {
        Console.WriteLine("The thread was canceled");
    }
 
    if (t.IsFaulted)
    {
        Console.WriteLine("An exception occurred and was canceled");
    }
 
    if (t.IsCompleted)
    {
        Console.WriteLine("The thread successfully completed execution");
    }
});

Console.ReadKey();
// Cancel the Task
// cts.Cancel();

ContinueWith naturally supports handling notifications of task completion, data returns, task cancellations, exceptions, etc. The Result property of Task can retrieve the value returned by the thread upon completion, blocking the thread until the return result is obtained.

Typically, Task.Factory.StartNew is used to instantiate and start a Task, while Task.Factory.ContinueWhenAll (wait for all) and Task.Factory.ContinueWhenAny (wait for any) are used to operate on the execution results of multiple Tasks.

If you want to make a Task synchronous, simply call the Wait method.

async / await#

Using the async modifier on a method or expression designates it as an asynchronous method, and the return type of an asynchronous method must be Task, Task<T>, or ValueTask<T>. The await operator waits for an asynchronous operation to complete and can temporarily suspend the execution of the current method until the asynchronous operation completes, with the two almost always appearing together.

Synchronous Waiting and Asynchronous Waiting#

Consider the following code; what will the final output be approximately:

using System.Diagnostics;

var sw = Stopwatch.StartNew();
var task = MyMethod();
Thread.Sleep(4000);
var result = task.Result;
Console.WriteLine(sw.ElapsedMilliseconds);

static async Task<string> MyMethod()
{
    await Task.Delay(5000);
    return "aaa";
}

In this code, the asynchronous method MyMethod is first called, starting a thread. In this thread, it uses await to asynchronously wait for 5000ms. Meanwhile, the main thread synchronously waits for 4000ms.

After the main thread waits for 4000ms, it has another variable waiting for the asynchronous thread to return a value. Before this, both the main thread and the asynchronous thread are waiting simultaneously. Therefore, while waiting for the asynchronous method to return a result, an additional 1000ms will need to be waited for.

Finally, when the MyMethod method returns the result "aaa", the program execution time output will be 5000ms.

After the asynchronous thread executes, if there is no interaction with the main thread, it will not block the execution of the main thread. Only when the main thread needs to wait for the asynchronous method to return a result, and the asynchronous thread has not yet completed, will it cause the main thread to block.

Parallel Computing (Parallel)#

System.Threading.Tasks.Parallel is a static class for parallel programming. It provides a set of static methods that simplify the coding process for concurrently executing Tasks, mainly offering the Invoke, For, and ForEach functions.

The most commonly used methods are Parallel.For and Parallel.ForEach. These two methods can parallelize iteration operations to execute simultaneously on multiple threads:

Action a = () => Console.WriteLine(DateTime.Now.ToString("HH:mm:ss"));
Parallel.Invoke(Enumerable.Range(1, 5).Select(x => a).ToArray());

Parallel.For(0, 5, i =>
{
    // Loop body logic
    Console.WriteLine("Current iteration: " + i);
});

List<int> numbers = new List<int> { 1, 2, 3, 4, 5 };
Parallel.ForEach(numbers, number =>
{
    // Logic for iterating elements
    Console.WriteLine("Current number: " + number);
});

Console.WriteLine("hi");

Output:
Untitled 3

As can be seen, each method's execution is unordered, and after Parallel starts, it runs synchronously, meaning it will block the current thread.

When using Task, you typically call the Run method to start an asynchronous task and can use the await keyword to wait for the task to complete. This allows for asynchronous task execution and, when needed, waiting for the task to complete before continuing with other operations.

The Parallel class is designed to simplify parallel computation, providing static methods like Parallel.For and Parallel.ForEach for parallel execution of loops and iteration operations. The Parallel class automatically assigns tasks to multiple threads, executing them simultaneously on multi-core CPUs to achieve parallel computing effects.

In summary, Task is used for asynchronous programming and task coordination, while the Parallel class is used to simplify parallel computation; the usage methods and mechanisms of the two differ.

Incorrect Usage of Parallel#

The loop operations of Parallel support initialization operations at the start of each task and cleanup operations at the end, and allow monitoring of task status. Note that the previous statement about "tasks" should be corrected to "threads". For example, the following code:

var list = new List<int>() { 1, 2, 3, 4, 5, 6 };
var sum = 0;

Parallel.For(0, list.Count,
() =>
{
    Console.WriteLine($"localInit i:1, ThreadId:{Environment.CurrentManagedThreadId}");

    return 1;
},
(i, state, total) =>
{
    Console.WriteLine($"body i:{i}, total:{total}, ThreadId:{Environment.CurrentManagedThreadId}");
    total += i;
    return total;
},
i =>
{
    Console.WriteLine($"localFinally i:{i}, ThreadId:{Environment.CurrentManagedThreadId}");
    Interlocked.Add(ref sum, i);
});
Console.WriteLine(sum);

First, carefully look at the parameter descriptions for Parallel.For:

/// <summary>
/// Executes a for loop in parallel.
/// </summary>
/// <typeparam name="TLocal">The type of the thread-local data.</typeparam>
/// <param name="fromInclusive">The starting index (inclusive).</param>
/// <param name="toExclusive">The ending index (exclusive).</param>
/// <param name="localInit">A function delegate to return the initial state of local data for each thread.</param>
/// <param name="body">The delegate called for each iteration.</param>
/// <param name="localFinally">The delegate for the final operation on each thread's local state.</param>
/// <returns>A <see cref="System.Threading.Tasks.ParallelLoopResult">ParallelLoopResult</see> structure containing information about which part of the loop has completed.</returns>
/// <remarks>
/// <para>
/// For each value in the range [fromInclusive, toExclusive), the <paramref name="body"/> delegate is called once.
/// It receives the following parameters: the iteration count (an Int32), a <see cref="System.Threading.Tasks.ParallelLoopState">ParallelLoopState</see> instance for early exit from the loop,
/// and some local state that can be shared between iterations executed on the same thread.
/// </para>
/// <para>
/// **The <paramref name="localInit"/> delegate is called once for each thread participating in the loop, returning the initial local state for each thread.**
/// **This initial state will be passed to the first <paramref name="body"/> call on each thread.**
/// **Subsequent body calls will return a possibly modified state value that will be passed to the next body call.**
/// **Finally, the last body call on each thread returns a state value that is passed to the <paramref name="localFinally"/> delegate.**
/// **Each thread's local state will execute the <paramref name="localFinally"/> delegate once to perform the final operation.**
/// </para>
/// </remarks>
public static ParallelLoopResult For<TLocal>(
            int fromInclusive, int toExclusive,
            Func<TLocal> localInit,
            Func<int, ParallelLoopState, TLocal, TLocal> body,
            Action<TLocal> localFinally)

Parallel.For starts the For loop concurrently, with the loop body being processed by the thread pool. The three parameters that are most difficult to understand are as follows:

localInit: This function is executed once for each new thread created, performing initialization behavior and returning the initial state. In other words, the number of times localInit is executed represents the number of threads Parallel has started.
body: The loop body content, the initial state (localInit) will be passed to the first body on each thread, and each subsequent body call returns a possibly modified state value that is passed to the next body call. This means that if the concurrent loop runs 6 times but only creates one thread, that thread will execute the body parameter 6 times, and localInit will only execute once because there is only one thread. The parameters (i, state, total) of body represent (the current loop value in the range [0-list.Count), the current state of Parallel, the value returned by localInit).
localFinally: This is executed once after the last body execution on each thread, meaning it executes once for each thread. Its parameter i is the last body returned state value, meaning the number of times localFinally executes represents the number of threads Parallel has started.

// The sum of [0-list.Count)
0+1+2+3+4+5=15

Each thread will execute localInit once, each thread may execute multiple bodies, and each thread will execute localFinally once after the last body execution. Depending on the number of threads created, the final sum value will vary:
Thread Count  Result
1             16
2             17
3             18
4             19
5             20
6             21

To clarify with a clear example:

var list = new List<string>() { "aa", "bb", "cc", "dd", "ee", "ff", "gg" };
var str = string.Empty;
Parallel.For(0, list.Count, () => "-", (i, state, total) => total += list[i], s =>
{
    str += s;
    Console.WriteLine("end:" + s);
});
Console.WriteLine(str);

Untitled 4
From the results, we can see that "end" was output 4 times, indicating that 7 (list.Count) concurrent loops created 4 threads. Each thread:

Executes localInit once upon creation.
Only the first body can obtain the state value returned by localInit during execution. Subsequent body executions will have their total values based on the return value of the previous body.
After the last body execution on each thread, localFinally will execute once, passing the last body return value to it.

Parallel Computing is Always Faster than Serial Computing#

Since parallel computing requires creating threads, and both the creation and destruction of threads incur time and space overhead, when the loop body execution time is very short (with no time-consuming operations), parallel execution may be slower than serial execution. For example:

using System.Diagnostics;

var sw = Stopwatch.StartNew();
for (int i = 0; i < 2000; i++)
{
    for (int j = 0; j < 10; j++)
    {
        var sum = i + j;
    }
}

Console.WriteLine("Serial loop time: " + sw.ElapsedMilliseconds);

sw.Restart();
Parallel.For(0, 2000, i =>
{
    for (int j = 0; j < 10; j++)
    {
        var sum = i + j;
    }
});
Console.WriteLine("Parallel loop time: " + sw.ElapsedMilliseconds);

Untitled 5
However, if the loop body execution time is increased, parallel loops will outperform serial loops:

for (int j = 0; j < 100000; j++)

Untitled 6
Therefore, only consider using parallel computing when the loop body execution time is relatively long.

Parallel Computing and Locking#

Since parallel computing runs in multiple threads, if shared resources need to be accessed, locking is necessary to ensure data consistency. Locking is suitable for synchronizing code or occupying shared resources for extended periods.

When performing atomic operations on integer variables, the Interlocked.Add method can be used, significantly reducing synchronization performance loss.

var list = new List<int>() { 1, 2, 3, 4, 5, 6 };
int sum = 0;
Parallel.For(0, list.Count, () => 1, (i, state, total) =>
{
    total += i;
    return total;
}, i => Interlocked.Add(ref sum, i));
Console.WriteLine(sum);

In the above code, if atomic operations are not performed at the end, it may lead to alignment issues in the final assembly language during the last mov operation. The Interlocked class resolves this issue. Additionally, .NET provides the [volatile](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile) keyword to address atomic operations on variables, but it is not suitable for multithreaded scenarios:

var mc = new MyClass();
Parallel.For(0, 100000, i =>
{
    mc.AddCount();
});
Console.WriteLine(mc.Count);

public class MyClass
{
    public volatile int Count;

    public void AddCount()
    {
        Count++;
    }
}

The above output will always be less than 100000 because there is no locking when accessing shared resources, leading to unexpected results. In multithreaded access scenarios, use Interlocked or [lock](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/lock) statements to protect access to shared resources:

// Modify the above code using either of the following two methods

// Method One
public void AddCount()
{
    Interlocked.Add(ref Count, 1);
    //Count++;
}

// Method Two
Parallel.For(0, 100000, i =>
{
    lock (mc)
    {
        mc.AddCount();
    }
});

However, this brings new issues. The presence of synchronization locks increases system overhead (CPU time and memory), thread switching time, etc. This means that if you need to lock all the code within the loop, there is no need to use parallel computing, as it would take longer than serial computation.

PLINQ#

Traditional LINQ executes in a single-threaded serial manner, while PLINQ is the parallel implementation of LINQ, also known as Parallel LINQ. PLINQ's implementation is primarily found in the System.Linq.ParallelEnumerable class, and its execution mode: PLINQ Introduction | Microsoft Learn summarizes that PLINQ will choose to execute in parallel or serially based on the analysis results to achieve optimal query speed. For example:

var list = Enumerable.Range(1, 6);
var query = from i in list
            select i;
foreach (var i in query)
{
    Console.WriteLine(i);
}

Console.WriteLine("----------");
var query2 = from i in list.AsParallel()
             select i;
foreach (var i in query2)
{
    Console.WriteLine(i);
}

Untitled 7
As seen, LINQ outputs in order, while PLINQ outputs unordered (concurrently).

In actual development, parallel execution is not always faster than serial execution; it is essential to find the best approach based on the usage scenario.

Exception Handling in Parallel Programming#

Consider whether the following code will throw an exception:

MyMethod();

static async Task MyMethod()
{
  await Task.CompletedTask;
  throw new Exception();
}

In fact, it will not throw an exception because MyMethod is an asynchronous method that executes and raises exceptions in another thread. However, since it does not interact with the calling thread (the main thread), the caller is unaware of any exceptions.

Exception Handling in Task#

If a Task can interact, such as when calling blocking methods like Wait, WaitAny, WaitAll, or accessing the Result property of the Task, exceptions occurring within the Task can be caught, with the exception type being AggregateException, which is the top-level exception in parallel programming.

If blocking (synchronous) access is possible, use the Task's Wait* blocking methods or await Task. After blocking, exceptions in synchronous code need to be captured using try/catch.

If you need to capture Task exceptions in a non-blocking (asynchronous) manner, use Task's ContinueWith or event notifications (which can be cumbersome).

Exception Handling in Parallel#

Compared to Task, exception handling in Parallel is much simpler because Parallel runs synchronously, blocking the main thread, allowing exceptions thrown within it to be directly caught by the main thread:

using System.Collections.Concurrent;

// Thread-safe queue
var exs = new ConcurrentQueue<Exception>();
try
{
    Parallel.For(0, 2, i =>
    {
        try
        {
            throw new ArgumentException();
        }
        catch (Exception e)
        {
            exs.Enqueue(e);
            throw new AggregateException(exs);
        }
    });
}
catch (AggregateException e)
{
    foreach (var ex in e.InnerExceptions)
    {
        Console.WriteLine($"Exception type: {ex.GetType()}, Exception source: {ex.Source}, Exception message: {ex.Message}");
    }
}

System.Console.WriteLine(exs.Count);

myEsn2E9