Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to traverse the ast tree? #21

Open
dandelion915 opened this issue Jun 14, 2023 · 11 comments
Open

How to traverse the ast tree? #21

dandelion915 opened this issue Jun 14, 2023 · 11 comments

Comments

@dandelion915
Copy link

Say I have two source code file1.cpp and file2.cpp written in C++, I want to know the difference code between them and locate this difference code to the function. I kind of get that this cppparser is able to get ast of the source code, but i'm still confused how to traverse the ast tree and compare two ast trees from different source code? Could anyone help me with this?

@salehjg
Copy link

salehjg commented Jun 15, 2023

Hi there, I believe you have to write your own code for that.
Furthermore, you can use my fork's doxygen branch to build the documentation and check out the inheritance diagram to get a better understanding of the AST's data structure.

inherit_graph_0

Also, check out the examples.

@dandelion915
Copy link
Author

Thanks for your wonderful reply! The example is very clear how the code is structed. However, do you have any idea on how to destructualize the code? What i'm saying is for a certain line of code within a function, I want to collect the all of the components, e.g, int a = b + c, then i need to collect int,a,=,b,+,c. It confuses me how to use this parser to traverse the data structure. I'd be very appreciated if you can give me any hints, a short example code might be the best :)

@salehjg
Copy link

salehjg commented Jun 19, 2023

@dandelion915
Sure thing mate.

As I understand it, you want to get familiar with how AST expresses a piece of code.
If so, I highly recommend checking out this example file from my fork.

It goes through manually creating for-loops, arrays, functions, pointers, expressions (chained or single), ... . It should provide you with the initial insight into the AST structure needed to write your own code.

So for a basic int a = b+c;, you have a CppVar instance as in below:

auto* myVar = new CppVar(
      new CppVarType("int"),
      CppVarDecl(
             "a", 
              new CppExpr(new CppExprAtom("b"), CppOperator::kPlus, new CppExprAtom("c"));, 
              AssignType::kUsingEqual)
    );

Now, as you can see, b and c are expressed as CppExprAtom instances. All expressions whether they are singular or chained, are expressed using nested CppExpr instances. Variables can have their initial value assignments in CppVar using CppVarDecl.
There are much more details about the AST structure that cannot be covered here. Basically, you need a good debugging environment like MS VS or CLion to inspect the AST of a piece of code generated by CppParser and then work your way up from there.

{ .... } is called a block and is represented by a CppCompound of type CppCompoundType::kBlock. Its children are stored in CppCompund::members_. CppCompund::add_member() could be used to add them one by one.

Also, have a look at cppast.

@dandelion915
Copy link
Author

dandelion915 commented Jun 21, 2023

Thank for your speedy reply. I have seen your wonderful example.But what i'm saying is a reverse process of your example. To be specific, let's say i have an input hello-world.cpp file like this:

#include <iostream>

int add(int a, int b)
{
  int c = a + b;
  return c;
}


int main()
{
  std::cout << "Hello World\n";
  int res = add(1, 2);
  return 0;
}

The purpose is to get the function body with in add() and main() using cppparser:

CppParser parser;
 CppCompoundPtr ast = parser.parseFile(std::string
 ("hello-world.cpp"));
 const auto& members = ast->members();
 CppIncludeEPtr hashInclude = members[0];

 CppFunctionEPtr addFunc = members[1];

 const auto& mainBodyMembers = addFunc->defn()->members();

 CppVarEPtr abPlus = mainBodyMembers[0];
 std::cout << abPlus->name() << std::endl; //c

 AssignType atype = abPlus->assignType();
 std::cout << (atype == AssignType::kUsingEqual) << std::endl; // =,kUsingEqual

 const auto& avalue = abPlus->assignValue();
 std::cout << (avalue->oper_ == CppOperator::kPlus) << std::endl; // +,kPlus
 std::cout << *(avalue->expr1_.expr->expr1_.atom) << std::endl; //a
 std::cout << *(avalue->expr2_.expr->expr1_.atom) << std::endl; //b

 CppExprEPtr returnC = mainBodyMembers[1];
 std::cout <<  addFunc->retType_->baseType() << std::endl; //return int 
 std::cout << *(returnC->expr1_.atom) << std::endl; //c

CppFunctionEPtr mainFunc = members[2];
 const auto& mainBodyMembers2 = mainFunc->defn()->members();
 CppExprEPtr coutHelloWorld = mainBodyMembers2[0];
 CppExprAtom expr1 = coutHelloWorld->expr1_;
 CppExprAtom expr2 = coutHelloWorld->expr2_;
 std::cout << *(expr1.expr->expr1_.atom) << std::endl; //std::cout
 std::cout << *(expr2.expr->expr1_.atom) << std::endl; //"Hello World\n"

 CppVarEPtr callAdd = mainBodyMembers2[1];
 std::cout << callAdd->name() << std::endl; //res
 AssignType atype2 = callAdd->assignType();
 std::cout << (atype2 == AssignType::kUsingEqual) << std::endl; // =
 const auto& avalue2 = callAdd->assignValue();
 std::cout << (avalue2->oper_ == CppOperator::kFunctionCall) << std::endl; // kFunctionCall
 std::cout << *(avalue2->expr1_.atom) << std::endl; //add
 const auto& expr3 = avalue2->expr2_.expr;
 std::cout << *(expr3->expr1_.expr->expr1_.atom) << std::endl; //1
 std::cout << (expr3->oper_ == CppOperator::kComma) << std::endl; //, kComma
 std::cout << *(expr3->expr2_.expr->expr1_.atom) << std::endl; 2

You see, I did debug's hardwork to traverse the data structure from root to the leaf node to get the function body(such as "a","b","Hello World\n"). It confuses me how to use this parser to automatically traverse the data structure of different phrases. Or is there any solution to stop parsing below the function level and just save all the function body as strings? I'd be very appreciated if you can give me any hints, a short example code might be the best :)

@salehjg
Copy link

salehjg commented Jun 21, 2023

I have been skimming through the codes for the past few days and I believe no such functionality has been implemented yet. There are some related codes in the *-info-accessor.h headers but they are not what you want.
CppAST has visitor.cpp. For CppParser we can use CppWriter as a starting point for our visitor with some call back funtions.
I am planning to use CppParser in my own projects and eventually I am going to have to implement visitors and AST matchers as well, but i am not sure about the timeline though. @dandelion915

@salehjg
Copy link

salehjg commented Jun 21, 2023

Hi @satya-das, are you accepting PRs?

@dandelion915
Copy link
Author

dandelion915 commented Jun 23, 2023

Thanks for your reply. I have figured out the exposure of function body by a different thought. First using cppparser to get all the function name and return type name, then use string match algorithm to find the beginning of the function, then math the { }. I dont know if this can be any use of you, but I'll provide this code anyway :)

CppCompoundPtr ast = parser.parseFile(filename);
    const auto& members = ast->members();
    for (const auto& member : members) {
        if (member->objType_ == CppObjType::kFunction) {
            CppFunctionEPtr func = member;
            const string& funcName = func->name_;
            const string& retType = func->retType_->baseType();
            cout << funcName << endl;
            string functionBody = findFunctionBody(filename, funcName,retType);
string findFunctionBody(const string& filename, const string& functionName, const string& retType) {
    ifstream file(filename);
    string line;

    string functionBody;
    bool foundFunction = false;
    bool insideFunction = false;
    int openBrackets = 0;
    int count = 0;
    int found_count = -2;
    while (getline(file, line)) {
        count++;
        if (!foundFunction && line.find(functionName) != string::npos && line.find(retType) != string::npos) {
            foundFunction = true;
            insideFunction = true;
            found_count = count;
        }

        if (count == found_count + 1) {
            found_count = count;
            functionBody += line;

            for (char c : line) {
                if (c == '{') {
                    openBrackets++;
                }
                else if (c == '}') {
                    openBrackets--;
                }
            }

            if (openBrackets == 0) {
                break;
            }
        }
    }

    return functionBody;

But this temporary string-match based solution has varies problems. Like comments may also contain function name or {}, so I'm still looking for parsers to get the function body. For producers for cppparser, maybe only adding tostring function to the project can lead to my solution, but sadly there isn't one.

@dandelion915
Copy link
Author

By the way, are you familiar with Clang? I searched and found Clang has such abilities, but there's no tutorial document to get start...

@salehjg
Copy link

salehjg commented Jun 23, 2023

@dandelion915 Oh wonderful. Thank you.
Right now I am working on the visitor pattern to implement a generic visitor with the matching capability.
Clang is quite capable. Unfortunately, there is a catch to it :) It does not expose the entire AST with stable API which is a big "no" for me.
Also, the data structure of the clang AST is way more complex compared to cppparser, but I guess it's not a good idea to compare these two.
Anyways, in case you need the Doxygen docs, you can access them at this link now.

@salehjg
Copy link

salehjg commented Jun 25, 2023

Dear @dandelion915,
Please have a look at my fork's visitor and AST matcher classes. Everything seems to be working.

CppVisitorMatcher matcher({CppObjType::kExpression, CppObjType::kHashInclude});
ast->accept(&matcher);

This example matches only the expressions and hashInclude nodes. Here, ast is the root node.
I have not merged it to my fork's main branch (called my-master) yet, but the code is ready.

@dandelion915
Copy link
Author

Hi @salehjg !
I have another question that I think you must know the answer.
I want to use parser.parseStream to parse string/stream based code, but somehow this doesn't work.
I wonder which part could possibly go wrong. I'd be very appreciated if u could give me some hint.

int main() {
   
    std::string input = string("int add(int a, int b);");
    CppParser parser;
    CppCompoundPtr ast = parser.parseStream(&input[0], input.size());
    const auto& members = ast->members();

	return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants