Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloaded operator function name should be returned as one FUNCTION token, not two tokens FUNCTION + <operator> #36

Open
mosherubin opened this issue Jun 30, 2022 · 2 comments

Comments

@mosherubin
Copy link
Collaborator

The Problem

When NsiqCppStyle parses the following code snippet:

Event& Event::operator=(OUT Event& other)
{
}

The internal token list (accessed via the lexer.tokenlist member variable) looks like this:

tokenlist = {list: 19}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,15,14, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,23,22, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 08 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 10 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 11 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 12 = {LexToken} LexToken(SPACE,' ',1,35,34, False, None)
 13 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 14 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 16 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 17 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 18 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

The problem is that NsiqCppStyle returns the overloaded operator function name ("operator=") as two tokens, i.e., a FUNCTION ("operator" at offset 5) and an EQUALS ("=" at offset 6). This is incorrect - the overloaded operator function name should be returned as one FUNCTION token with a value of "operator=":

tokenlist = {list: 18}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator=',1,15,14, False, None)
 06 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 07 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 08 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 09 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 10 = {LexToken} LexToken(SPACE,' ',1,34,33, False, None)
 11 = {LexToken} LexToken(AND,'&',1,35,34, False, None)
 12 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 13 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 14 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 15 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 16 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 17 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

This is the correct way to parse the overloaded operator function name - tokenlist[5] shows the FUNCTION token with a value of "operator=".

@mosherubin mosherubin changed the title The entire overloaded operator function name should be returned with type FUNCTION, not FUNCTION + <operator> Overloaded operator function name should be returned as one FUNCTION token, not two tokens FUNCTION + <operator> Jun 30, 2022
@kunaltyagi
Copy link
Owner

🤔 This would make some parses a bit more complicated. Is there any situation where such a thing would help?

Current:

  • FUNCTION + OPERATOR = operator overload
  • FUNCTION = normal function

New:

  • FUNCTION = normal function or operator overload
  • To detect operator overload, remove the operator prefix if it's there

IMO, the current version supports a simpler user implementation. Some example usage supporting your preference would be really nice :)

@mosherubin
Copy link
Collaborator Author

Funny you should ask ;-) The reason for this change, made in my private cloned repo, was because of a proprietary rule I wrote for my company.

My company has a coding standard where every function begins and ends with a call to the internal logging system, tracing the entry and exit in the function. The string passed to the logging system (a) Has a '+' or '-' to denote entry/exit respectively, and (b) following the +/- character must be the full and precise function name. Since all of our functions have a single point of exit, both logging statements are always logged. Here is what a function might look like:

uint32_t Client::GetGrpcRetryInterval(void)
{
    LOG_TRACE("+ Client::GetGrpcRetryInterval");

    Configuration config = this->_configFile->GetConfig();
   
    LOG_TRACE("- Client::GetGrpcRetryInterval return %d", config.grpc_retry_interval());
    return config.grpc_retry_interval();
}

The purpose of the rule was to enforce the coding standard, checking that each function conforms to the standard. The rule flags different coding errors, among them:

  • The function does not have both entry and exit logging statements
  • The function does not have two entry logging statements
  • The function name in the logging statements matches the function definition's full function name exactly

The rule's logic is simple:

  • When the function's FUNCTION token is encountered, save its token.fullName.
  • When a token with (type, value) = ('ID', 'LOG_TRACE') is encountered, check the correctness of the STRING passed to LOG_TRACE with the previously saved full function name.

When I ran the rule on our entire code base, it correctly flagged hundreds of coding errors. However, it incorrectly flagged overloaded operator functions as the LOG_TRACE strings not matching the function definition's full name. Here's an example to illustrate the problem:

Thread& Thread::operator=(Thread &other)
{
    LOG_TRACE("+ Thread::operator=");
    ...
    LOG_TRACE("- Thread::operator=");
    return *this;
}

This produces the following lexer.tokenlist:

tokenlist = {list: 42}
 00 = {LexToken} LexToken(ID,'Thread',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,7,6, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,8,7, False, None)
 03 = {LexToken} LexToken(ID,'Thread',1,9,8, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,15,14, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,25,24, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,26,25, False, None)
 08 = {LexToken} LexToken(ID,'Thread',1,27,26, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,33,32, False, None)
 10 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 11 = {LexToken} LexToken(ID,'other',1,35,34, False, None)
 12 = {LexToken} LexToken(RPAREN,')',1,40,39, False, None)
 13 = {LexToken} LexToken(LINEFEED,'\n',1,41,40, False, None)
 14 = {LexToken} LexToken(LBRACE,'{',2,1,41, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',2,2,42, False, None)
 16 = {LexToken} LexToken(SPACE,'    ',3,1,43, False, None)
 17 = {LexToken} LexToken(ID,'LOG_TRACE',3,5,47, False, None)
 18 = {LexToken} LexToken(LPAREN,'(',3,14,56, False, None)
 19 = {LexToken} LexToken(STRING,'"+ Thread::operator="',3,15,57, False, None)
 20 = {LexToken} LexToken(RPAREN,')',3,36,78, False, None)
 21 = {LexToken} LexToken(SEMI,';',3,37,79, False, None)
 22 = {LexToken} LexToken(LINEFEED,'\n',3,38,80, False, None)
 23 = {LexToken} LexToken(SPACE,'    ',4,1,81, False, None)
 24 = {LexToken} LexToken(ELLIPSIS,'...',4,5,85, False, None)
 25 = {LexToken} LexToken(LINEFEED,'\n',4,8,88, False, None)
 26 = {LexToken} LexToken(SPACE,'    ',5,1,89, False, None)
 27 = {LexToken} LexToken(ID,'LOG_TRACE',5,5,93, False, None)
 28 = {LexToken} LexToken(LPAREN,'(',5,14,102, False, None)
 29 = {LexToken} LexToken(STRING,'"- Thread::operator="',5,15,103, False, None)
 30 = {LexToken} LexToken(RPAREN,')',5,36,124, False, None)
 31 = {LexToken} LexToken(SEMI,';',5,37,125, False, None)
 32 = {LexToken} LexToken(LINEFEED,'\n',5,38,126, False, None)
 33 = {LexToken} LexToken(SPACE,'    ',6,1,127, False, None)
 34 = {LexToken} LexToken(RETURN,'return',6,5,131, False, None)
 35 = {LexToken} LexToken(SPACE,' ',6,11,137, False, None)
 36 = {LexToken} LexToken(TIMES,'*',6,12,138, False, None)
 37 = {LexToken} LexToken(THIS,'this',6,13,139, False, None)
 38 = {LexToken} LexToken(SEMI,';',6,17,143, False, None)
 39 = {LexToken} LexToken(LINEFEED,'\n',6,18,144, False, None)
 40 = {LexToken} LexToken(RBRACE,'}',7,1,145, False, None)
 41 = {LexToken} LexToken(LINEFEED,'\n',7,2,146, False, None)

The problem lies within token[5] above:

lexer.tokenlist[5] = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 additional = {str} ''
 column = {int} 17
 context = {Context} <nsiqcppstyle_checker.Context object at 0x05046AB0>
 contextStack = {ContextStack} 
 decl = {bool} False
 filename = {str} 'D:\\Junk\\NsiqCppStyle\\operator-2.cpp'
 fullName = {str} 'operator::'
 inactive = {bool} False
 index = {int} 5
 lexer = {Lexer} <nsiqcppstyle_lexer.Lexer object at 0x05044330>
 lexpos = {int} 16
 line = {str} 'Thread& Thread::operator=(Thread &other)'
 lineno = {int} 1
 pp = {NoneType} None
 type = {str} 'FUNCTION'
 value = {str} 'operator'

Although the value ("operator") is correct, the token.fullName value found is "operator::", which is wrong on several counts:

  • Since this is the FUNCTION token, I would expect token.fullName to be "Thread::operator=". If it were correctly "Thread::operator=", the STRING checks are straightforward and consistent for overloaded- and non-overloaded functions alike.
  • The colon (":") following "operator" is incorrect.

IMHO, an overloaded operator function is just like any other function name. We should not consider the name to be the string "operator" followed by an equal ("=") character, but rather a function whose name just happens to be "operator=".

I think the opposing question should be posed: in what case would a rule writer want the FUNCTION token.fullName to only correspond to "operator", leaving the work of concatenating the operator etc. to the rule writer. I had no need to split the overloaded operator function name - I wanted to know the full name. Should the rule writer want to know the precise operator, he can easily take a slice of the function name, or look at the following token type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants