These are my notes after watching Hartmut Kaiser’s keynote talk in C++14 con. youtube link

Although seems like there is no textbook definition of a thread, the talk mentions that thread has four properties:

Four properties:

• A single flow of control
• A program counter marking what’s currently being executed
• An associated execution context. (stack, register set, static and dynamic memory local vars)
• A state (initialized, pending suspended terminated)

The talk also divulges into std::thread and execution agent but I felt they were only included for complete-ness sake and I just give these bullet points as is:

all references are from N4231: Torvald Riegel: Terms and definitions related to threads

• Thread of execution: “single flow of control within a program” (S1.10p1)
• Execution agent: In (S30.2.5.1p1) “an entity such as a thread that may perform work in parallel with other execution agents.”

## Parallelism v/s concurrency

• Concurrency: When two or more tasks can start, run, and complete in overlapping time periods. It doesn’t necessarily mean they’ll ever both be running at the same instant.
• Parallelism: When tasks literally run at the same time.

• So these two points kind of make draw out this analogy: Parallelism is like SIMD,SPMD Concurrency is like MPMD

• The talk delves into this by saying: - Parallelism is independent - Concurrency relates to the same global state.

Which I feel seems like a more contrived way of saying things, but, it captures the essense in whole.

captial for emphasis and not shouting

WE SHOULD MAKE CONCURRENCY HIDDEN AWAY UNDER THE HOOD

LIKE, WE DON’T USE GOTO STATEMENTS ANYWHERE BUT COMPILER STILL USES AND IT’S STASHED AWAY UNDER THE HOOD

WE HAVE TO DO THE SAME WITH THREADS

It seems like Parallelism is better. I don’t get this. And, the speaker advocates that parallelism is much much better than concurrency.

Edward E. Lee’s paper (2006) ‘The problem with threads’

Are you multi-threaded? Only one way to find out. It’s like how device kernel can’t launch another device kernel. That luxury is missing here. What if I launch a team of threads and if each of the thread launches other threads. ayy lmao, we just over-subscribed.

OpenMP offers a modicum of luxury here. We can set the max number of threads we want to spawn, but that’s super super local. Library itself has to offer a way to set the number of threads which for the miniscule amount of time my brain thought about it, seems like it’ll be pain.

• Can you disable or control the parallelism?

I want to have 0/2/4/8 threads.

OpenMP lets you control this.

This is a problem everywhere, hpx, omp, ompi.

This is in the hands of the problem statement or the algorithm.

• Minor issues: - No ‘standard’ way of ‘returning’ values from threads. - pointer, pointer, pointer - requires explicit synchronization bc we don’t know when our thread is ready. - Threads make concurrency explicit. - Parallelism > concurrency. - THREADS ARE SLOW - 1 thread, 10 ms overhead

## The 4 Horsemen of the Apocalypse: SLOW

• Starvation: - Insufficient concurrent work to maintain high utilization of resources
• Latencies: - Time-distance delay of remote resources access and services
• Overhead: - Work for management of parallel actions and resources on critical path which are not necessary in sequential variant
• Waiting for contention resolution - Things are hella busy so you gotta wait lassy

## Cue the Amdahl’s Law (strong scaling)

Avoid serial part like it’s plague.

I so want to replicate the graph T - T

Overheads are the most dangerous. Let’s do a gedanken experiment. If we have a work for 10seconds, and we vary number of threads, what’s the speed-up we get for a fixed overhead of threads?

	work = 10 seconds = work_per_thread * num_threads

Graph x-axis is num_threads: 10mil to 1. The graph is convex.

_When we are the left end of the graph, we have ~ 10mil threads, the overhead to maintaining, book-keeping easily overpowers the speedup._

_When we are the on the right end of the graph, we have ~ 1 threads, the talks says the contention kicks me_, which is actually puzzling.
Since, the performance should be almost identical to sequential execution.


Screen grab from here

## HPX

• A general purpose runtime system for applications of any scale
• A well defined, new execution model (ParalleX)
• HPX is wow.

## Execution tree.

• Writing HPX code becomes like drawing flowchart, where std::future is the argument and it also is the result.
• I am not going to post more about it but actually, will be using HPX to do 1D/2D FDTD Maxwell’s equation simulation.
• In a later blogpost, lmaaaaooo