0100,0100,0100On 3 Jan 2003, at 2:12, Henrik Nordstrom < wrote: 7F00,0000,0000> Robert Collins wrote: > > > Nope. cond_signal is 1 or more releases. > > In all documentation I have seen cond_signal signals exacly 1 thread per > call while there is threads waiting on the condition. It seems to me you have abit too bright expectations from threads. Thread coding is quite some style. It insists that every shared data be accessed only under mutex lock. It is designed to scale for tens of cpus and thousands of concurrent threads. That easily means there are hundreds of thousands mutex locks and unlocks per second. They CAN'T be heavy calls, they just can't always pass through scheduler, or any hardware platform would melt down instantly. Therefore they don't. Mutex locks are primitives, fundamentals, and they are simple. Mutex lock is just memory instruction that all cpus can see happen. If mutex is locked, scheduler is called that blocks the thread. If mutex is not locked, then no scheduler is called. LWP is given time quanta, and it runs to use it up in fullest, therefore taking away cpu on simple mutex unlock is wrong and never happens. cpu is taken away when it either blocks or time quanta expires by systick. Only then is scheduler called. LWP may have many threads, and LWP may implement its own scheduler within threads lib. If threads are not kernel threads, libthread would prefer thread switch on every pass through it, as these are real fast switches and gives sense of more parallelism. But when thread switch implies context switch, it means high overhead. libthread may indeed do that, but thats waste. Ever thought of mutex deadlocks "why the hell it can't detect it itself owns the bloody mutex?" Its just simple compare of long, but they don't do it, because its "timeconsuming"... Everything for raw performance. Do you really think scheduler is checked at every mutex? Mutex unlock is even simpler, clear the lock and thats it. Thats fast. To kickstart any blocked thread, OS would need to run scheduler. slow. Lib must provide decent progress for every thread. So if there are many threads in process, they are made to progress by some lightwheight means, like round-robin. Lib MAY (rfc speak) make thread switch at suitable points, like mutex unlock if it doesn't cost anything. Such case is for intra-CPU/intra-process/LWP thread switch. If its implemented to make RR progress of threads, it may decide to switch threads on every switch-point (or every Nth). But then, it may not. Its a matter of tuning, implementation. In any case, not our business. God knows. On SMP system, unlocking mutex does nothing to magically kickstart a thread on other cpu. If other cpu is busy, it works. If its idle, it sits halted until hw IRQ awakes it and it then runs scheduler to discover thread that was unblocked by mutex unlock. Mutex unlock may at best add list of unblocked threads at tail of runnable threads. Currently running LWP cannot kickstart other LWP to run. Only kernel can do that through scheduler. Current LWP may only block on smth to allow kernel reschedule, or maybe yield cpu. Again matter of libthread implementation. For coder - unpredictable. Cond_signal is documented to unblock "at least one thread". There is good reason for the "at least" part. Cond variable is nothing more than just plain boolean flag. Signalling is nothing more than setting it to true. Now, when kernel scheduler is called, it goes through all runnable threads and gives them time quanta by their prio. If there is 1 cpu, only 1 thread at a time can be run, and cond can only unblock exactly one thread. But if there is SMP and many cpus, then many threads are run concurrently, and they all are put to "compete" for the same mutex. They spin on mutex, and eventually only one will unblock. IF, mutex is used at all. cond_signal does not require mutex, in this case, potentially several threads are unblocked in MP system. But whether 1 or more depends on many things, including microtiming, therefore they say "at least one thread". 7F00,0000,0000> > Huh? The state should be (slightly compressed): > > thread S, W1, W2 > > active B-waitingforsignal B-waitingforsignal > > for clarity this should be added: > mutex_lock() B-waitingforsignal B-waitingforsignal > > > cond_signal() B-inpthread_mutex_lock B-waitingforsignal > > cond_signal() B-inpthread_mutex_lock B-inpthread_mutex_lock > > mutex_unlock() active B-inpthread_mutex_lock > > active mutex_unlock() active > > active active active You miss few very crucial steps on the list: runnable thread scheduler and CPU context/stack/VM switches. This above is assumption result of vague documentation meant to give general idea and avoiding details. But nowhere, not a single serious threading model will tell you that threads are unblocked in exactly above way. On contrary, if you look closer, they'll tell that threads are run in unspecified moments, and only way to force the order is to use mutex syncronisation. Only hitting locked mutex is guaranteed to make thread switch. They use "unspecified" to account for all the blabla I made above and awful lot of details of thread implementation not mentioned, that depends on UP/MP, thread scope, mutex scope, number of cpus, threads, processes, hell knows what else. cond_var and its mutex are not related in any way for OS. mutex unlock here is not any more important than any other mutex unlock anywhere in the code. And whether cond_signal is made doesn't change its importance. So if you think that mutex_unlock after signal always causes thread switch, then you are wrong. If you think that it never does thread switch, you are wrong. If you think that its likely to make switch, you are wrong, if you think its unlikely, you are wrong. If you want to know exactly, your only option is to dig into kernel and libthread sources and understand one specific implementation. There's obviously hell of a lot more to the details than I could ever type here. But the point is that each thread is independant entity and coder must make no assumptions about its progress moments unless he forces that by algoritm. All of the above is completely irrelevant for coding, unless you are stuck with some wrong assumptions. The only assumption I've found to hold is that _anything_ that CAN happen, WILL happen. About 100 times sooner than you'd expect. for eg. take 2 threads changing (read-modify-write) single atomic int once per request without a mutex lock. Likelyhood that they'll hit it concurrently is neglible, especially on an UP system. But you can bet money that this prog will crash in less than a day. This means that thread switch happens at moment when one thread has read the var, modified it, but not yet written out. imagine. nanosec timing. 7F00,0000,0000> > Ok, whats high overhead in the current model? I'll check the code in > > detail in the weekend. Current overhead is quite small. I meant more like to not introduce any new overhead. 7F00,0000,0000> > Ah. Well, what about: > > pthread_mutex_lock(&queuemutex) > > do { > > schedule_io_request() > > pthread_cond_signal() > > } for < requests > > pthread_mutex_unlock(&queuemutex) > > pthread_mutex_lock(&queuemutex) > > pthread_mutex_unlock(&queuemutex) this will loose all signals but last one. Only 1 thread will run. 7F00,0000,0000> By intention the main thread never uses pthread_mutex_lock, only > pthread_mutex_trylock. This to avoid lock congestion. yes, when schedule per request. But if we'd append to queue and after comm run append whole queue to requestqueue, we'd want thread run. Creating reliable thread switch for specific job is quite a brain exercise. It should be possible with 2 mutexes alone, but try, knowing that mutex unlock may but also might not cause a thread switch... I thought we might only send 1 signal to worker threads if there is work to do and there are idle threads. Then let threads themselves make sure they awake up enough workers. We make only sure that first worker gets reliably chance to run, before we proceed with poll(). Main: while comm_loop { ...comm_stuff... // gather io-requests to ioqueue if (!ioqueue) continue; m_lock(queue); append(request, ioqueue) // send requests to worker threads c_signal(queue,run); // to make sure backsignal isn't made before we block // unlock queue later.. m_lock(main); if (idle_threads>0) m_unlock(queue); c_wait(main,allrun); // maybe timed else m_unlock(queue); m_unlock(main); // by here at least 1st iothread has been reliably run } io_thread: while () { m_lock(queue); idle_threads++; c_wait(queue,run); pop(request); idle_threads--; if (request && idle_threads>0) c_signal(queue,run); else m_lock(main); c_signal(main,allrun); m_unlock(main); m_unlock(queue); ... do_aio_stuff() } ------------------------------------ Andres Kroonmaa CTO, Microlink Data AS Tel: 6501 731, Fax: 6501 725 Pärnu mnt. 158, Tallinn 11317 Estonia