Re: [squid-users] Antwort: Re: [squid-users] Memory and CPU usage squid-3.1.4 from Marcus Kool on 2010-07-05 (squid-users)

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Mon, 05 Jul 2010 21:57:08 -0300

The memory leak is caused by idnsGrokReply: the caller
of rfc1035MessageUnpack should free the memory using
rfc1035MessageDestroy.

The function idnsGrokReply has several changes between
3.0.x and 3.1.x.
It is obvious that not all paths call rfc1035MessageDestroy
but I do not know this code and feel uncomfortable
making a patch, so I suggest that the person who maintains
this part of the code looks at it.

Marcus

Martin.Pichlmaier_at_continental-corporation.com wrote:
> Hello list,
>
> I just wanted to post the results with valgrind.
> Unfortunately the memcheck thread needs so much CPU that I could not
> put a high load on the squid as maximum only about 5-10 req/s.
>
> # ./squid -v
> Squid Cache: Version 3.1.3
> configure options: '--prefix=/appl' '--localstate=/var'
> '--with-filedescriptors=16384' '--enable-storeio=aufs'
> '--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group'
> '--enable-icap-client' --enable-ltdl-convenience
>
> also recompiled and tried with:
> # squid -v
> Squid Cache: Version 3.1.3
> configure options: '--prefix=/appl' '--localstate=/var'
> '--with-filedescriptors=16384' '--enable-storeio=aufs'
> '--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group'
> '--enable-icap-client' '--with-valgrind-debug' 'CFLAGS=-g -O2'
> --enable-ltdl-convenience
>
> I ran valgrind repeatedly with: "valgrind --leak-check=yes -v squid -N &"
> and found:
>
> ==24141== 3,311,957 bytes in 3,784 blocks are definitely lost in loss
> record 26 of 27
> ==24141== at 0x4A05809: malloc (vg_replace_malloc.c:149)
> ==24141== by 0x5ABAA7: xmalloc (util.c:508)
> ==24141== by 0x5AA35A: rfc1035MessageUnpack (rfc1035.c:433)
> ==24141== by 0x4B15A7: idnsGrokReply(char const*, unsigned long)
> (dns_interna l.cc:939)
> ==24141== by 0x4B22F0: idnsRead(int, void*) (dns_internal.cc:1178)
> ==24141== by 0x4AC154: comm_select (comm_epoll.cc:308)
> ==24141== by 0x5455AC: CommSelectEngine::checkEvents(int)
> (comm.cc:2682)
> ==24141== by 0x4B712D: EventLoop::checkEngine(AsyncEngine*, bool)
> (EventLoop.cc:51)
> ==24141== by 0x4B7282: EventLoop::runOnce() (EventLoop.cc:125)
> ==24141== by 0x4B7377: EventLoop::run() (EventLoop.cc:95)
> ==24141== by 0x4FB36C: SquidMain(int, char**) (main.cc:1379)
> ==24141== by 0x4FB975: main (main.cc:1141)
>
> I looked a bit in the source code but didn't really find what could cause
> this.
>
> Sometimes DNS did not seem to loose mem but I found this instead:
>
> ==29780== 987,870 (987,046 direct, 824 indirect) bytes in 1,321 blocks are
> definitely lost in loss record 27 of 28
> ==29780== at 0x4A05809: malloc (vg_replace_malloc.c:149)
> ==29780== by 0x5ABAA7: xmalloc (util.c:508)
> ==29780== by 0x5ABBAB: xstrdup (util.c:756)
> ==29780== by 0x4B3E15: errorTryLoadText(char const*, char const*, bool)
> (errorpage.cc:313)
> ==29780== by 0x4B494F: ErrorState::BuildContent() (errorpage.cc:1007)
> ==29780== by 0x4B551D: ErrorState::BuildHttpReply() (errorpage.cc:881)
> ==29780== by 0x4B58E5: errorAppendEntry (errorpage.cc:432)
> ==29780== by 0x51D656: store_client::callback(long, bool)
> (store_client.cc:164)
> ==29780== by 0x51DA2F: store_client::scheduleMemRead()
> (store_client.cc:448)
> ==29780== by 0x51E567: storeClientCopy2(StoreEntry*, store_client*)
> (store_client.cc:331)
> ==29780== by 0x51E8D3: store_client::copy(StoreEntry*, StoreIOBuffer,
> void (*)(void*, StoreIOBuffer), void*) (store_client.cc:264)
> ==29780== by 0x4A0D0E: clientReplyContext::doGetMoreData()
> (client_side_reply.cc:1675)
>
> When running valgrind with 3.0.STABLE 23 I did not find similar lost
> blocks,
> only some KB lost when initializing but 3.1 looses some KB as well at that
> point.
>
> I monitored a squid3.0.STABLE25 and squid 3.1.3/3.1.4 over a longer period
> and
> found out that both need more memory over time but 3.0 eventually does not
> grow.
> 3.1 continues to grow until CPU rises to nearly 100%; then the memory
> consumption
> seem to stop.
>
> Has someone an idea where the problem could be?
>
>
> Martin
>
>
>
>
> Marcus Kool <marcus.kool_at_urlfilterdb.com> wrote on 17.06.2010 16:15:09:
>
>> Martin,
>>
>> Valgrind is a memory leak detection tool.
>> You need some developer skills to run it.
>>
>> If you have a test environment with low load you may want
>> to give it a try.
>> - download the squid sources
>> - run configure with CFLAGS="-g -O2"
>> - run squid with valgrind
>> - wait
>> - kill squid with a TERM signal and look and the valgrind log file
>>
>> Valgrind uses a lot of memory for its own administration and has
>> a lot of CPU overhead, so reduce cache_mem to a small value like 32MB.
>>
>> Most likely you will see many memory leaks because
>> Squid does not free everything when it exits. This is normal.
>> You need to look at repeated memory leaks; the leaks that
>> occur often and file a bug report.
>>
>> Please do not post the whole valgrind output to this list.
>>
>> Marcus
>>
>>
>>
>> Martin.Pichlmaier_at_continental-corporation.com wrote:
>>> Hello,
>>>
>>> I just wanted to report back the last tests:
>>>
>>> After the memory cache is filled to 100% the squid (3.1.4 or 3.1.3)
>>> still needs more memory over time when under load, about 1-2 GB a day.
>>> memory_pool off did not change anything, the process size still rises.
>>> The high CPU usage seem to start when rising over a certain size limit
>>> but I am not sure about that.
>>> Example memory consuption of squid-3.1.4:
>>> from 8.0 GB (4pm) to 8.4 GB (7pm) to 8.5 GB (4am next day) to 9,4 GB
>>> (2pm).
>>> At night there is low load on the squid, maybe 20-50 req/s.
>>> 3.1.3 behaves the same, so it does not seem to be related to the
>>> "ctx enter/exit level" topic discussed in the last mails.
>>>
>>> I am now reverting the proxies back to 3.0.STABLE25 but will keep one
>>> proxy
>>> on 3.1.4 for testing.
>>> Probably something in my setup causes squid to consume too much
> memory.
>>> Amos, do you have another idea what might causes this and where to
> look,
>>> for example which debug depth? I can do some tests and have the
>>> possibility
>>> to slowly put this proxy under load and take it out of productive
>>> environment afterwards...
>>>
>>> Regards,
>>> Martin
>>>
>>>
>>>
>>> Amos Jeffries <squid3_at_treenet.co.nz> wrote on 15.06.2010 13:31:40:
>>>
>>> <snip>
>>>>> I am now checking with mem pools off on one of the proxies and
> report
>>>>> later whether it changes anything.
>>> <snip>
>>>>> 2010/06/15 11:37:06| ctx: enter level 2059: '<lost>'
>>>>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
>>>>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP
> header
>>> {
>>>>> 2010/06/15 11:37:06| ctx: exit level 2060
>>>>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
>>>>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP
> header
>>> {
>>>>> 2010/06/15 11:37:06| ctx: exit level 2060
>>>>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
>>>>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP
> header
>>> {
>>>>> 2010/06/15 11:37:06| ctx: exit level 2060
>>>>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
>>>>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP
> header
>>> {
>>>>> 2010/06/15 11:37:06| ctx: exit level 2060
>>>>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
>>>>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP
> header
>>> {
>>>>> 2010/06/15 11:40:56| ctx: exit level 2060
>>>>> 2010/06/15 11:40:56| ctx: enter level 2060: '<lost>'
>>>>> 2010/06/15 11:40:56| ctx: enter level 2061: '<lost>'
>>>>>
>>>> Ouch. We've been wondering about these ctx loops. It is not something
> to
>>>> be terribly worried about, but can cause some "weird stuff" (yes that
> is
>>>> probably the best explanation).
>>>>
>>>> Thanks to your reminder, I've just had another look and found one
> more
>>>> in header processing. Hopefully that was it.
>>>>
>>>> Amos
>>>> --
>>>> Please be using
>>>> Current Stable Squid 2.7.STABLE9 or 3.1.4
>>>
>>>
>
>
>
Received on Tue Jul 06 2010 - 00:57:16 MDT

This archive was generated by hypermail 2.2.0 : Tue Jul 06 2010 - 12:00:02 MDT