0100,0100,0100On 3 Jan 2003, at 2:12, Henrik Nordstrom < wrote:
7F00,0000,0000> Robert Collins wrote:
>
> > Nope. cond_signal is 1 or more releases.
>
> In all documentation I have seen cond_signal signals exacly 1 thread per
> call while there is threads waiting on the condition.
It seems to me you have abit too bright expectations from threads.
Thread coding is quite some style. It insists that every shared data be
accessed only under mutex lock. It is designed to scale for tens of
cpus and thousands of concurrent threads. That easily means there are
hundreds of thousands mutex locks and unlocks per second. They CAN'T
be heavy calls, they just can't always pass through scheduler, or any
hardware platform would melt down instantly. Therefore they don't.
Mutex locks are primitives, fundamentals, and they are simple.
Mutex lock is just memory instruction that all cpus can see happen.
If mutex is locked, scheduler is called that blocks the thread.
If mutex is not locked, then no scheduler is called.
LWP is given time quanta, and it runs to use it up in fullest,
therefore taking away cpu on simple mutex unlock is wrong and never
happens. cpu is taken away when it either blocks or time quanta expires
by systick. Only then is scheduler called. LWP may have many threads,
and LWP may implement its own scheduler within threads lib. If threads
are not kernel threads, libthread would prefer thread switch on every
pass through it, as these are real fast switches and gives sense of
more parallelism. But when thread switch implies context switch, it
means high overhead. libthread may indeed do that, but thats waste.
Ever thought of mutex deadlocks "why the hell it can't detect it
itself owns the bloody mutex?" Its just simple compare of long, but
they don't do it, because its "timeconsuming"... Everything for raw
performance. Do you really think scheduler is checked at every mutex?
Mutex unlock is even simpler, clear the lock and thats it. Thats fast.
To kickstart any blocked thread, OS would need to run scheduler. slow.
Lib must provide decent progress for every thread. So if there are many
threads in process, they are made to progress by some lightwheight
means, like round-robin. Lib MAY (rfc speak) make thread switch at
suitable points, like mutex unlock if it doesn't cost anything. Such
case is for intra-CPU/intra-process/LWP thread switch. If its implemented
to make RR progress of threads, it may decide to switch threads on
every switch-point (or every Nth). But then, it may not. Its a matter
of tuning, implementation. In any case, not our business. God knows.
On SMP system, unlocking mutex does nothing to magically kickstart a
thread on other cpu. If other cpu is busy, it works. If its idle, it
sits halted until hw IRQ awakes it and it then runs scheduler to
discover thread that was unblocked by mutex unlock. Mutex unlock
may at best add list of unblocked threads at tail of runnable threads.
Currently running LWP cannot kickstart other LWP to run. Only kernel
can do that through scheduler. Current LWP may only block on smth
to allow kernel reschedule, or maybe yield cpu. Again matter of
libthread implementation. For coder - unpredictable.
Cond_signal is documented to unblock "at least one thread".
There is good reason for the "at least" part. Cond variable is nothing
more than just plain boolean flag. Signalling is nothing more than
setting it to true. Now, when kernel scheduler is called, it goes
through all runnable threads and gives them time quanta by their prio.
If there is 1 cpu, only 1 thread at a time can be run, and cond can
only unblock exactly one thread. But if there is SMP and many cpus,
then many threads are run concurrently, and they all are put to
"compete" for the same mutex. They spin on mutex, and eventually only
one will unblock. IF, mutex is used at all. cond_signal does not
require mutex, in this case, potentially several threads are unblocked
in MP system. But whether 1 or more depends on many things, including
microtiming, therefore they say "at least one thread".
7F00,0000,0000> > Huh? The state should be (slightly compressed):
> > thread S, W1, W2
> > active B-waitingforsignal B-waitingforsignal
>
> for clarity this should be added:
> mutex_lock() B-waitingforsignal B-waitingforsignal
>
> > cond_signal() B-inpthread_mutex_lock B-waitingforsignal
> > cond_signal() B-inpthread_mutex_lock B-inpthread_mutex_lock
> > mutex_unlock() active B-inpthread_mutex_lock
> > active mutex_unlock() active
> > active active active
You miss few very crucial steps on the list: runnable thread
scheduler and CPU context/stack/VM switches. This above is assumption
result of vague documentation meant to give general idea and
avoiding details. But nowhere, not a single serious threading
model will tell you that threads are unblocked in exactly above
way. On contrary, if you look closer, they'll tell that threads
are run in unspecified moments, and only way to force the order
is to use mutex syncronisation. Only hitting locked mutex is
guaranteed to make thread switch. They use "unspecified" to account
for all the blabla I made above and awful lot of details of thread
implementation not mentioned, that depends on UP/MP, thread scope,
mutex scope, number of cpus, threads, processes, hell knows what else.
cond_var and its mutex are not related in any way for OS. mutex
unlock here is not any more important than any other mutex unlock
anywhere in the code. And whether cond_signal is made doesn't
change its importance.
So if you think that mutex_unlock after signal always causes
thread switch, then you are wrong. If you think that it never does
thread switch, you are wrong. If you think that its likely to make
switch, you are wrong, if you think its unlikely, you are wrong.
If you want to know exactly, your only option is to dig into kernel
and libthread sources and understand one specific implementation.
There's obviously hell of a lot more to the details than I could
ever type here. But the point is that each thread is independant
entity and coder must make no assumptions about its progress moments
unless he forces that by algoritm.
All of the above is completely irrelevant for coding, unless you
are stuck with some wrong assumptions. The only assumption I've
found to hold is that _anything_ that CAN happen, WILL happen.
About 100 times sooner than you'd expect. for eg. take 2 threads
changing (read-modify-write) single atomic int once per request
without a mutex lock. Likelyhood that they'll hit it concurrently
is neglible, especially on an UP system. But you can bet money that
this prog will crash in less than a day. This means that thread
switch happens at moment when one thread has read the var, modified
it, but not yet written out. imagine. nanosec timing.
7F00,0000,0000> > Ok, whats high overhead in the current model? I'll check the code in
> > detail in the weekend.
Current overhead is quite small. I meant more like to not introduce
any new overhead.
7F00,0000,0000> > Ah. Well, what about:
> > pthread_mutex_lock(&queuemutex)
> > do {
> > schedule_io_request()
> > pthread_cond_signal()
> > } for < requests
> > pthread_mutex_unlock(&queuemutex)
> > pthread_mutex_lock(&queuemutex)
> > pthread_mutex_unlock(&queuemutex)
this will loose all signals but last one. Only 1 thread will run.
7F00,0000,0000> By intention the main thread never uses pthread_mutex_lock, only
> pthread_mutex_trylock. This to avoid lock congestion.
yes, when schedule per request. But if we'd append to queue and after
comm run append whole queue to requestqueue, we'd want thread run.
Creating reliable thread switch for specific job is quite a brain
exercise. It should be possible with 2 mutexes alone, but try, knowing
that mutex unlock may but also might not cause a thread switch...
I thought we might only send 1 signal to worker threads if there is
work to do and there are idle threads. Then let threads themselves
make sure they awake up enough workers. We make only sure that first
worker gets reliably chance to run, before we proceed with poll().
Main:
while comm_loop {
...comm_stuff... // gather io-requests to ioqueue
if (!ioqueue) continue;
m_lock(queue);
append(request, ioqueue) // send requests to worker threads
c_signal(queue,run);
// to make sure backsignal isn't made before we block
// unlock queue later..
m_lock(main);
if (idle_threads>0)
m_unlock(queue);
c_wait(main,allrun); // maybe timed
else
m_unlock(queue);
m_unlock(main);
// by here at least 1st iothread has been reliably run
}
io_thread:
while () {
m_lock(queue);
idle_threads++;
c_wait(queue,run);
pop(request);
idle_threads--;
if (request && idle_threads>0)
c_signal(queue,run);
else
m_lock(main);
c_signal(main,allrun);
m_unlock(main);
m_unlock(queue);
... do_aio_stuff()
}
------------------------------------
Andres Kroonmaa
CTO, Microlink Data AS
Tel: 6501 731, Fax: 6501 725
Pärnu mnt. 158, Tallinn
11317 Estonia