Strange thread scheduling

dmccorm1 · Post by **dmccorm1** » Sun Apr 30, 2023 9:58 am

In the course of testing out how well the Clang/C++17 port works, I wrote a little test program which performs CPU-bound tasks on 10 worker threads. The threads seem to be scheduled in a strange way. I'm raising this here because I don't know if this is due to Qemu, some issue with OpenVMS's implementation of std::thread, or even some issue in OpenVMS itself.

The physical host has four cores, and my Qemu command line looks like this:

Code: Select all

qemu-system-x86_64 \
    -machine type=q35,accel=kvm \
    -cpu host \
    -smp 4 \
    -m 8G \
    -bios /usr/share/edk2/ovmf/OVMF.inteltdx.fd \
    -serial pty \
    -drive file=./hd1.img,index=0,media=disk,format=raw \
    -nic user,model=e1000,ipv6=off,hostfwd=tcp::2223-10.0.2.3:23

The four virtual cores are correctly picked up by OpenVMS:

Code: Select all

$ show cpu

System:	VMS1, QEMU Standard PC (Q35 + ICH9, 2009)

CPU ownership sets:
   Active           	0-3
   Configure        	0-3

CPU state sets:
   Potential        	0-3
   Autostart        	0-3
   Powered Down     	None
   Not Present      	None
   Hard Excluded    	None
   Failover         	None

However, what I observe when I run my program is that threads don't appear to be scheduled to run in parallel on multiple cores. Instead, I see individual threads running for extended periods on a single core before being preempted. Here's some program output (the full program source that creates this is at the bottom of this message):

Code: Select all

...
...
Thread 0xE9280 has done another 10,000,000 iterations.
Thread 0xE9280 has done another 10,000,000 iterations.
Thread 0xE9280 has done another 10,000,000 iterations.
Thread 0xD7280 has done another 10,000,000 iterations.
Thread 0xD7280 has done another 10,000,000 iterations.
Thread 0xD7280 has done another 10,000,000 iterations.
Thread 0x143280 has done another 10,000,000 iterations.
Thread 0x143280 has done another 10,000,000 iterations.
Thread 0x131280 has done another 10,000,000 iterations.
Thread 0x10D280 has done another 10,000,000 iterations.
Thread 0x10D280 has done another 10,000,000 iterations.
Thread 0x179280 has done another 10,000,000 iterations.
Thread 0x179280 has done another 10,000,000 iterations.
Thread 0x167280 has done another 10,000,000 iterations.
Thread 0x167280 has done another 10,000,000 iterations.
Thread 0x11F280 has done another 10,000,000 iterations.
Thread 0x11F280 has done another 10,000,000 iterations.
Thread 0x11F280 has done another 10,000,000 iterations.
...
...

More "evidence" that something is amiss is the CPU load graph for the host. What I would expect to see is all four cores being maxed out for the duration of the program execution. But what I actually see is this:

Finally, here's the program source. Note that I would have preferred to attach the file to this post but the .cpp file type isn't permitted:

Code: Select all

#include <atomic>
#include <condition_variable>
#include <future>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>

std::atomic_size_t g_NumThreadsRunning;
std::condition_variable g_CV;
std::mutex g_Mutex;

void WorkerThreadStart()
{
    size_t result = 0;
    for (size_t i = 0; i < 100'000'000; i++)
    {
        if (((i + 1) % 10'000'000) == 0)
        {
            std::cout
                << "Thread 0x"
                << std::this_thread::get_id()
                << " has done another 10,000,000 iterations."
                << std::endl;
        }
        if ((i % 17) == 0 && (i % 13) == 0)
        {
            result++;
        }
    }
    std::unique_lock lock{ g_Mutex };
    g_NumThreadsRunning--;
    lock.unlock();
    g_CV.notify_one();
}

int main()
{
    std::vector<std::thread> workerThreads;
    for (size_t i = 0; i < 10; i++)
    {
        g_NumThreadsRunning++;
        workerThreads.emplace_back(WorkerThreadStart);
    }
    std::unique_lock lock{ g_Mutex };
    g_CV.wait(lock, []{ return g_NumThreadsRunning == 0; });
    for (auto& workerThread : workerThreads)
    {
        workerThread.join();
    }
    return 0;
}

arne_v · Post by **arne_v** » Sun Apr 30, 2023 10:16 am

How do you link?

On Alpha and Itanium you need to link with:

$ LINK/THREADS_ENABLE ...

to use kernel threads.

jonesd · Post by **jonesd** » Sun Apr 30, 2023 10:46 am

You can also use the set image/flags=mkthreads to set the bit after the link. Show image will show you what the current settings are (and thoughtfully shows you the original link flags).

dmccorm1 · Post by **dmccorm1** » Sun Apr 30, 2023 4:02 pm

Thanks! That fixed the problem.

arne_v · Post by **arne_v** » Sun Apr 30, 2023 8:32 pm

VMS was not born with threads, but have had them for a long time (at least since 1991).

I don't know why enabling kernel threads is not the default.

To me the logic is:
* it should not matter for single threaded applications
* for multi threaded applications I would expect practically everybody to want kernel threads and not green threads

But there must have been some reason 3 decades ago.

jonesd · Post by **jonesd** » Sun Apr 30, 2023 10:23 pm

arne_v wrote: ↑
Sun Apr 30, 2023 8:32 pm
VMS was not born with threads, but have had them for a long time (at least since 1991).

I don't know why enabling kernel threads is not the default.

DECthreads has been around since 1991, but kernel threads in VMS didn't appear until
version 7.0 (and only on Alpha). The 1991 DECthreads was based on an early draft (
'draft 4') of the Posix threads standard and its API varies from the later DECthreads
library that conformed to Posx 1003.1c.

Threaded code that runs reliably in the draft 4 environment can become very brittle
when exposed to the true concurrency you get with kernel threads, so having
kernel threads off by default might be a reasonable strategy. Once it became off by
default, inertia kept it that way.

Upcalls are another matter. The user-mode scheduler in the draft 4 library behaved
very badly when any thread did a synchronous I/O call. Enabling upcalls by default
reduces the surprise factor a lot more than not having kernel threads by default.

dmccorm1 · Post by **dmccorm1** » Mon May 01, 2023 4:24 am

So you mention upcalls, which the linker tells me it is enabling for my image. As I understand it, upcalls are a mechanism to improve the performance of green threads. They enable the kernel to notify the user mode thread manager that a thread has entered a wait state, thereby enabling the thread manager to switch execution context to another green thread. When kernel threads are also enabled for an image, does the green thread context switcher in user mode effectively stand down, or is there a sort of hybrid where the process is using both thread models simultaneously?

jonesd · Post by **jonesd** » Mon May 01, 2023 6:02 am

dmccorm1 wrote: ↑
Mon May 01, 2023 4:24 am
...When kernel threads are also enabled for an image, does the green thread context switcher in user mode effectively stand down, or is there a sort of hybrid where the process is using both thread models simultaneously?

I'll refer you to appendix B.12 in the POSIX Threads Library manual:

Under OpenVMS Alpha Version 7.0 and later, the Threads Library implements
a two-level scheduling model. This model is based on the concept of virtual
processors. Virtual processors are implemented as a result of using kernel
thread technology in the OpenVMS Alpha operating system.

The Threads Library schedules threads onto virtual processors similar to the
way that OpenVMS schedules processes onto the processors of a multiprocessing
machine. Thus, to the runtime environment, a scheduled thread is executed on a
virtual processor until it blocks or until it exhausts its timeslice quantum; then
the Threads Library schedules a new thread to run.

dmccorm1 · Post by **dmccorm1** » Mon May 01, 2023 6:19 am

Thanks for that. I’m finding it hard to navigate the OpenVMS documentation.

arne_v · Post by **arne_v** » Mon May 01, 2023 7:50 am

jonesd wrote: ↑
Sun Apr 30, 2023 10:23 pm

arne_v wrote: ↑
Sun Apr 30, 2023 8:32 pm
VMS was not born with threads, but have had them for a long time (at least since 1991).

I don't know why enabling kernel threads is not the default.
DECthreads has been around since 1991, but kernel threads in VMS didn't appear until
version 7.0 (and only on Alpha). The 1991 DECthreads was based on an early draft (
'draft 4') of the Posix threads standard and its API varies from the later DECthreads
library that conformed to Posx 1003.1c.

Threaded code that runs reliably in the draft 4 environment can become very brittle
when exposed to the true concurrency you get with kernel threads, so having
kernel threads off by default might be a reasonable strategy. Once it became off by
default, inertia kept it that way.

Ah. Now I remember. Threads in 5.5 1991 and kernel threads in 7.0 1995 (and not on VAX).

Do you remember what was the cause of the "brittleness":
A) bugs in first implementation
B) developers unfamiliar with the concept of threads at the time
C) the fact that Alpha has a very weak memory model so stuff that may work elsewhere may not work on Alpha
D) all of the above
?

VSI OpenVMS Forum

Strange thread scheduling

Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling

Re: Strange thread scheduling