Re: stringng-cherrypick r12764

From: Kinkie <gkinkie_at_gmail.com>
Date: Thu, 26 Sep 2013 21:43:28 +0200

)Thanks, and sorry for taking so long.
The mail is long, if you want to skip to the main topic, it's the one
about copyright - it is an open point.

On Mon, Aug 26, 2013 at 4:22 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 07/30/2013 03:56 AM, Kinkie wrote:
>> Ok, it's now done, including documentation.
>> lp:~squid/squid/stringng-cherrypick has the most uptodate code.
>
>
> I found a few bugs in stringng-cherrypick r12764. The rest is polishing.
>
>
> The first set of comments is about SBuf::vappendf():
>
>> if (sz >= static_cast<int>(spaceSize())) {
>> // not enough space on the first go, we now know how much we need
>> requiredSpaceEstimate = sz*2; // TODO: tune heuristics
>> space = rawSpace(requiredSpaceEstimate);
>> sz = vsnprintf(space, spaceSize(), fmt, vargs);
>> }
>>
>> if (sz < 0) // output error in either vsnprintf
>> throw TextException("output error in vsnprintf",__FILE__, __LINE__);
>
> Given the complexities involved, I think we should add a check that the
> second vsnprintf() resulted in sz of less than spaceSize(). If it did
> not, we should throw a different exception.

Different as in different type of exception or different text? I've
done the latter for now.

>> // data was appended, update internal state
>> len_ += sz;
>
> IMO, we should eat our own dog food and call forceSize() here instead.
> If you do that, move this code lower, after sz has been adjusted (while
> making other appropriate changes).

I don't think it's a good idea: forceSize requires a higher guarantee
than we have here (only one owner of the MemBlob). Here we can afford
to be in a shared MemBlob (if we were at the tail of the used portion
and there was enough capacity for the append operation).

>> static bool snPrintfTerminatorChecked = false,
>> snPrintfTerminatorCounted = false;
>
> Please declare them separately. This is too clever and implies that they
> are used similarly. They are not.

Ok.

>> static bool snPrintfTerminatorChecked = false,
>
> You forgot to update this variable inside the if-statement.

*blush*

>> int
>> SBuf::compare(const SBuf &S, SBufCaseSensitive isCaseSensitive, size_type n) const
>> {
>> Must(n == npos || n >= 0);
>> if (n != npos) {
>> if (n > length())
>> return compare(S.substr(0,n),isCaseSensitive);
>> return substr(0,n).compare(S.substr(0,n),isCaseSensitive);
>> }
>
> The logic surrounding the two return statements above is puzzling. The
> last return statement is clear -- we only want to compare the first n
> characters. That makes the first return statement an optimization -- it
> avoids calling this->substr(0,n). However, if you think such an
> optimization is a good idea, why not apply it to the S parameter as
> well? In other words, I would expect to see either both optimizations
> applied:
>
> if (n != npos) {
> const SBuf &me = n > length() ? substr(0,n) : *this;
> const SBuf &them = n > S.length() ? S.substr(0,n) : S;
> return me.compare(them, isCaseSensitive, npos);
> }
>
> or none of the two optimizations applied:
>
> if (n != npos)
> return substr(0,n).compare(S.substr(0,n),isCaseSensitive, npos);
>
>
> but not the current mix. Did I miss why only one optimization should be
> applied?

You are right; it's probably rooted in legacy. Removing the optimization.

>> // this differs from how std::string handles 1-length string
>
> Please clarify what "1-length" string is. Did you mean std::string
> behavior when searching for a character? If yes, you can say
> "std::string returns something else in this case", especially if it is
> difficult to succinctly describe what std::string returns exactly.

Removing the comment. IIRC it had something to do with the behaviour
of std::string::find being undefined when npos was passed as
start_pos, and as a result the behavior had changed in different
libstdc++ versions.
SBuf's behavior is consistent, and this is what matters. The worst
drawback is that since we can't rely on std::string, we can't compare
the behaviour in automated unit tests for this case.

>> // std::string allows needle to overhang hay but not start outside
>> if (startPos != npos && startPos > length()) {
>> ++stats.find;
>> return npos;
>> }
>
> If you can move the above after you test for empty needle, and replace "
>> " with " >= ", then you may be able to move the single-char code up
> one if-statement as well. That would would clarify your overall
> find(string) logic:
>
> if empty needle ...
> if 1-char needle ...
> complex cases start here

Current code:
- validate startPos
- if empty needle
- if 1-char needle
- complex search

>> // for npos with char* std::string scans entire hay
>> // this differs from how std::string handles single char from npos
>> if (startPos == npos)
>> return npos;
>
> Missing ++stats.find that all other cases have.

Yes.

>> // on empty hay std::string returns size of hay
>> if (length() == 0)
>> return npos;
>
> Did you mean to say "this differs from std::string ..."? Or does our
> rfind(char) do what std::string does, returning npos on empty hay?

The latter. Changed comment.

>> if (startPos > length())
>> return npos;
>
> This can be broadened to startPos >= length().

Can't. This change would fail the tests:
case1: SBuf("").find("") returns npos instead of 0
    std::string("").find("") returns 0
    category: hay0.find(needle0_at_B,posB)
case6: SBuf("").find("") returns npos instead of 0
    std::string("").find("") returns 0
    category: hay0.find(needle0_at_B,posP)
case6168: SBuf("2").find("", 1) returns npos instead of 1
    std::string("2").find("", 1) returns 1
    category: hay1.find(needle0_at_B,posP)
case13442: SBuf("K2").find("", 2) returns npos instead of 2
    std::string("K2").find("", 2) returns 2
    category: hayN.find(needle0_at_B,posP)
Generated SBuf test cases: 782044
        failed cases: 164
        reported cases: 4

>> debugs(24, 8, "result: \"" << *this << "\"");
>> ++stats.caseChange;
>> return rv;
>
> No, *this in debugging is not the result (rv is). There are at least two
> copies of this bug.

Fixed. Looked if there's more, it doesn't seem so.

>> * \retval false no grow was needed
>> * \retval true had to copy
>> */
>> bool
>> SBuf::cow(SBuf::size_type newsize)
>
> s/grow/growth/
> However, cow() may return true if growth was not needed (but a dedicated
> buffer was needed) AND cow() may return true when it did not have to
> copy anything (because the buffer was empty). You probably want to say
> something like:
>
> * \retval true had to allocate a dedicated buffer
> * \retval false otherwise

Thanks!

> Furthermore, I do not see cow() return value being used. Perhaps we
> should change cow() to return void and avoid all this complexity until
> it is needed?

Sounds sensible. Done.

>> debugs(24, DBG_DATA, "new size: " << newsize);
>> debugs(24, DBG_DATA, "new size:" << newsize);
>> debugs(24, DBG_DATA, "no cow needed");
>
> These do not dump data. Let's use level 8 for them so that we can
> eventually have all DBG_DATA debugs() dedicated to blind data dumping?

Ok.

>> +SBuf &
>> +SBuf::append(const char * S, size_type Ssize)
>> +{
>> + Must (Ssize == npos || Ssize >= 0);
>> +
>> + if (S == NULL)
>> + return *this;
>
> Do you think we should also add a Must() that NULL S has npos or zero Ssize?

It may lead to greater correctness, but is that an error condition
worth terminating over? I'd rather document that in case of S being
NULL, Ssize is ignored. Which led me to notice that the documentation
was stale. Updated in this regard, let me know if you think we should
Must() regardless.

> * Since SBuf::dump() may output a lot of raw data and that data may
> contain non-printable characters, consider using Raw API there.
> Eventually, the code implementing that API will deal with those cases
> better than just writing raw data to the stream.

Tried it. It's actually too high-level IMO. For instance, it cares
about debug levels, which current SBuf::dump doesn't care about. I've
left in the code a commented-out draft working implementation.

>> + alloc+=s.alloc;
>> + live+=s.live;
>> + append+=s.append;
>> + liveBytes+=s.liveBytes;
>
> Please surround "+=" with single spaces for readability and consistency:
> " += ".

Strange, it seems already done in stringng. I'll have to check at
cherrypicking time.

>> class NullSBufException : public TextException
>
> Remove as unused. I hope we no longer have a concept of NULL SBuf.

We don't. Removed.

>> + virtual ~OutOfBoundsException() throw();
>
> Just curious, do you know why other exceptions you added do not require
> this?

It's inheriting from TextException, whose distructor is virtual. The
other exceptions store no extra data, and thus need no additional
destructor action.

>> /* */
>
> Please remove that last line from SBufExceptions.cc.

Done.

>> + * SBuf.cc (C) 2008 Francesco Chemolli <kinkie_at_squid-cache.org>
>
> If you do not mind, please drop the copyright statements (here and in
> other files you are adding). You are not the only person contributing
> this code, and managing per-file copyright claims correctly is
> practically impossible (incomplete copyright information in your files
> is a case in point).

That would apply to the whole Squid source tree (there's plenty of
files which carry copyright notices by individual contributors).
But this is a matter of copyright policy (and possible assignment
thereof) which should probably be tackled at the Foundation level.
Among other questions, once I remove this, what copyright statement
should I put? Squid Software Foundation (therefore assigning
copyright)? Nothing? Wouldn't then be automatically be reassigned to
the only entity named in the COPYRIGHT file, University of California?
It can be argued that the per-file copyright belongs to the main
contributor, while each additional contributor has copyright over
their respective contributions. This would seem to be in line with
some past contributions...

This is too messy to address here, it'd be however nice if there was a
guideline.

>> + /// clear the stream's store
>> + void clearBuf() {
>> + theBuf.clear();
>> + }
>
>> + /// Clear the stream's backing store
>> + SBufStream& clearBuf() {
>> + theBuffer.clearBuf();
>> + return *this;
>> + }
>
> Are these two really needed? AFACT, they are not used (except in a test
> case testing them). I am worried that they do not really work because
> internal SBufStream state may become inconsistent if these are called
> because you are resetting the buffer without resetting all other
> internal stream pointers/offsets.

IMO they are useful, as they allow to reuse a StreamBuf (which is a
complex data structure) in a scope without having to reinstantiate it.
I've added explicit flush() calls at the beginning of these functions.
Probably no as efficient as it could be for clearBuf(), but correct
(and while clearBuf is an useful function, it probably isn't worth to
optimize it now).

>> // inspired to SBufFindTest; to be expanded.
>
> s/ to / by /?

Thanks.

> I suspect you can fix all of the above without another round of review.
> However, please do not forget that you wanted to make (or consider
> making?) size_type unsigned once everybody is happy with the code (but
> before merging it).

Yes, that's the planned next step.
It seems to be working, however there is an annoyance when someone
passes a negative integer, which gets automatically cast into
veryLargeInteger unless compile-time countermeasures are taken against
automatic type conversion (I've used the technique suggested here:
http://stackoverflow.com/questions/175689/can-you-use-keyword-explicit-to-prevent-automatic-conversion-of-method-parameter);
if this gets done, any integer literal must be explicitly qualified as
unsigned.

Currently being tested at lp:~squid/squid/stringng (without the
-cherrypick). If everyone is ok with this approach, I'm implementing
it throughout.

Thanks for your feedback.

-- 
    /kinkie
Received on Thu Sep 26 2013 - 19:43:36 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 27 2013 - 12:00:11 MDT