Re: [PATCH] profiler updates from Alex Rousskov on 2012-06-11 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 11 Jun 2012 16:29:07 -0600

On 06/10/2012 11:46 PM, Amos Jeffries wrote:
> On 11/06/2012 4:47 a.m., Alex Rousskov wrote:
>> On 06/08/2012 07:02 PM, Amos Jeffries wrote:
>>> On 8/06/2012 2:46 a.m., Alex Rousskov wrote:
>>>> On 06/06/2012 07:25 PM, Amos Jeffries wrote:
>>>>>>> + if (delta< head->best)
>>>>>>> + head->best = delta;
>>>>>>> + if (delta> head->worst)
>>>>>>> + head->worst = delta;
>>>>>>> + head->summ += delta;
>>>>>>> + head->count++;
>>>>>>> + head->overheads += get_tick() - stopped_;

>> The above XProfilerNode::stop() code is not thread-safe because
>> multiple threads may update the same "head" and you are not using atomics for
>> *head members.

> Here is the patch implementing the above.
>
> The __sync_* atomics seem to be okay for simple boolean equality test,
> add or subtract. But not so much for protecting against
> less/greater-than tests or arbitrary alterations.
>
> Here is the patch implementing the spinlocks on *head before multiple
> read/compare/change operations.

I do not think we should add explicit spinlocks to Squid, especially in
profiling code being called twice from every profiled scope, and
especially when such locks are avoidable.

The old code quoted above was not thread-safe because *head members
could get corrupted without locks or atomics. Atomics alone address the
corruption issue. No locks are needed for that.

Specifically:

> + while(std::atomic_flag_test_and_set_explicit(&xprof_globalLock, std::memory_order_acquire))
> + ; // spin until the global lock is acquired
> + xprof_ActiveTimers[type] += started_;
> + xprof_ActiveTimerCount++;
> + std::atomic_flag_clear_explicit(&xprof_globalLock, std::memory_order_release);

> + while(std::atomic_flag_test_and_set_explicit(&xprof_globalLock, std::memory_order_acquire))
> + ; // spin until the global lock is acquired
> + xprof_ActiveTimers[type_] -= started_;
> + xprof_ActiveTimerCount[type_]--;
> + std::atomic_flag_clear_explicit(&xprof_globalLock, std::memory_order_release);

Atomics handle increment/decrement correctly without locks. If you think
we must keep xprof_ActiveTimerCount in sync with
xprof_ActiveTimers[type], then we can store the start times in the
array(s) with one entry per active timer or perhaps even use
thread-local storage. FWIW, I am not sure it is important to keep the
two in sync at all times.

> + while(std::atomic_flag_test_and_set_explicit(&head->lock, std::memory_order_acquire))
> + ; // spin until the list head lock is acquired
> + if (delta < head->best)
> + head->best = delta;
> + if (delta > head->worst)
> + head->worst = delta;
> + head->summ += delta;
> + head->count++;
> + head->overheads += get_tick() - stopped_;
> + std::atomic_flag_clear_explicit(&head->lock, std::memory_order_release);

I think a good-enough lockless min/max can be implemented using our
Atomic::WordT in this case. Here is a sketch:

    const Time seenBest = best;
    if (delta < seenBest)
        best.swap_if(seenBest, delta);

The risk of recording delta1 and ignoring a better delta2 is both small
and decreasing with the number of deltas considered (as the recorded
minimum becomes smaller and smaller).

Similarly, I do not think it is so important to keep head->count and
head->summ in sync that we should spinlock in every profiled context:
There is no loss of data here, and after 100 counts or so (which may
happen after less than a second for a popular scope), virtually any
out-of-sync increment would contribute less than 1% towards the total,
with each subsequent contribution decreasing on average.

> +void
> +XProfilerNode::start()
> +{
> + if (started_ > 0)
> + return;
> +
> + started_ = get_tick();
> +
> + // kick initialization of timers statistics (just in case)
> + xprof_InitLib();

I agree with Kinkie that this xprof_InitLib() may be too expensive for
this context. If you think it is needed, then I recommend checking for
already-initialized xprof_inited guard explicitly here, before calling
that method.

BTW, xprof_inited is not thread-safe, making xprof_InitLib() itself unsafe.

> +void
> +XProfilerNode::stop()
> +{
> + if (started_ < 0)
> + return;
...
> + // kick initialization of timers statistics (just in case)
> + xprof_InitLib();

I do not think the stop() method needs to worry about statistics
initialization because the start() method handles that and we cannot get
past the started_ guard without start() being called. If you do not
trust get_tick() to return non-negative numbers, let's wrap get_tick()
instead.

> + extern uint64_t *xprof_ActiveTimerCount;
...
> +uint64_t xprof_ActiveTimerCount = 0;
...
> + xprof_ActiveTimerCount = static_cast<uint64_t *>(calloc(XPROF_LAST + 2, sizeof(uint64_t)));
...
> + xprof_ActiveTimerCount++;
...
> + xprof_ActiveTimerCount[type_]--;

The above dual usage of xprof_ActiveTimerCount as an integer and as a
pointer to integer(s) looks buggy. I also failed to find where this
global variable is actually used (besides being updated).

> + /// timer statistics histograms
> extern TimersArray *xprof_Timers;
>
> + /// count of active timers
> + extern uint64_t *xprof_ActiveTimerCount;
> + /// currently running timer accumulations
> + extern uint64_t *xprof_ActiveTimers;
> +

It would be useful to add "indexed by ..." or "ordered by ..." to the
above comments, emphasizing that these are arrays or stacks of things.

If spinlocks remain despite the above objections, I think they should at
least be implemented as a SpinLock class/object (constructor and an
explicit lock() call locks, destructor and an explicit unlock() call
unlocks; the lock passed as a constructor parameter).

Thank you,

Alex.
Received on Mon Jun 11 2012 - 22:29:16 MDT

This archive was generated by hypermail 2.2.0 : Tue Jun 12 2012 - 12:00:08 MDT