Formatted html – html_tree (pt.1)

There is a good, if somewhat basic tutorial available at tutorialspoint here: https://www.tutorialspoint.com/cplusplus/cpp_web_programming.htm so if you know nothing of using cgi then this is a good starting point.  While cgicc works pretty well, debugging can be a bit of a nightmare unless you format your code to make it readable.

Please note:  This is reference material on how this class was created and its internals, if you want to simply download it and see examples of how to use it then please click here.

Out of the box you have 2 options, neither of which I found were satisfactory.

The first is to use escape characters; lots of escape characters; just about everywhere.  This is quite error prone and if you want to output clean, nested, nodular html using this method can be quite a task.

Here is an example of the sort of string that you may have to produce:

//**************************************************************//
// GET TIME HISTORY RECORDS                                     //
//**************************************************************//
std::string job_times::build_html_job_times_table() {

    lookup_records2_psql jobTimes(m_Conn, "ic_machine_job_times");
    get_pg_time_history(jobTimes);

    string table (
                  "<div  class=\"w3-container\">\n"
                      "\t<br>\n"
                      "\t<table class=\"w3-table-all w3-border\" id=\"icdefault\">\n"
                      "\t<tr><th class=\"w3-blue-gray\"><h3>Total Time:</h3></th><td id=\"totalcell\" class=\"w3-border\">"+ job_time_validate::get_total() +"</td></tr>\n"
                      "\t<tr><th colspan=\"2\">Job Time Log:</th></tr>\n"
                  );


    for(int i =0; i != jobTimes.get_nrows(); ++i) {
            table.append(
                        "<tr>\n"
                            "\t<th class=\"w3-green\" id=\"icdefault\" width=\"80%\">Start:</th> <th class=\"w3-black\" id=\"icdefault\">Total:</th>\n"
                        "</tr>\n"
                        "<tr>\n"
                            "\t<td id=\"icdefault\">" + jobTimes.get_field_value("start_time", i) + "</td>\n"
                            "\t<td rowspan=\"3\" id=\"totalcell\" class=\"w3-border\">" + jobTimes.get_field_value("session_total_time", i) + "</td>\n"
                        "</tr>\n"
                        "<tr>\n"
                            "\t<th class=\"w3-red\" id=\"icdefault\">Finish:</td>\n"
                        "</tr>\n"
                        "<tr>\n"
                            "\t<td id=\"icdefault\">" + jobTimes.get_field_value("finish_time", i) + "</td>\n"
                        "</tr>\n"
                        );
    }

    table.append(
                    "\t</table>\n"
                    "\t<br>\n"
                "</div>\n"
                "<hr>\n"
                );

    return(table);


}

As you can see, while this is still better than just outputting a continual stream, it is not ideal so I decided to look at the alternatives.

The first option that I looked at was trying to use raw strings. eg.:

//**************************************************************//
// HTML STYLES                                                  //
//**************************************************************//
string job_times::html_input_time() {

    string headerText;

    if(in_progress())
        headerText = "Finish Time:";
    else {
        headerText = "Start Time:";
    }


return(R"***(
<div class="w3-container" id="times" style="margin-top:90px">
    <h4>
        )***" + headerText + R"***(
    </h4>
    <input data-clocklet id="time_widget">
    <br>
</div>
<div class="w3-container">
    <hr>
    <button class="w3-button w3-teal" onclick="post_input_time()">Save</button>
</div>

)***");

}

While I found this to be slightly less error prone than using escape sequences everywhere, it was still not ideal, especially not when handing variable data.

Rather helpfully here, the code widget has made the same sort of mess of the formatting (colours) as my IDE, but also I do feel that this code is too alien to be jammed in the middle of a load of c++ code and you may very well become a cropper when trying to maintain it years down the line.

Unhappy with my lot I felt that it would benefit if I wrote a class to resolve this problem, and to be honest, while it may not be perfect, it works pretty well for a couple of hundred lines of code.

There are essentially 2 parts to this; the building of the structure and the parsing into a valid html string.  I decided to put the structure in a class that I will call html_tree, but left the parsing in a namespace as a class was a little overkill.

One of the things that I struggled to get my head around, and even now am having some difficulty in explaining, is the necessity to have a vector of iterations of the class within itself.  To be honest, it is one of those conundrums where the code speaks for itself better than any explanation:

#include <vector>
#include <string>

class html_tree {

    public:
        //CONSTRUCTOR
        html_tree(
                std::string     htmlTag,
                std::string     &htmlPage,
                html_tree       *previousBranch = nullptr
                );

        //DECLARATIONS
        std::vector<html_tree*>      childNodes
        html_tree                    *previousBranch;

        //DESTRUCTOR
        ~html_tree();

};

You will note that the nested iteration is a pointer; The reason for this is that (on my build box in any event) I got a segmentation fault after nesting a certain number of nodes (around 12 if I remember correctly) and so found it necessary to build this on the heap.  PS. I do not like c++’s auto pointers as they just seem to create as many problems as they solve, but I see no reason why you could not use them if that is your greatest desire!

Each iteration of this class will contain a single node along with its content.  Next we need a series of methods to add content and create nested nodes:

ADDING A NEW NODE: In oder to create a new node, we are going to create a new_node() method. Each call to this method will create a nested iteration of the class and return a pointer to that iteration in order that it can be accessed.

//**************************************************************//
//  SET NODE CONTENT                                            //
//**************************************************************//
void html_tree::set_node_content(string content, bool lineBreak) { 

    m_tagContent.push_back( {content, lineBreak} );

}

The “htmlTag” argument is whatever you wish to place between “<” and “>” in the declaration. so if you wanted to declare a node called <body> you would simply call the method literally as follows:

html_tree *newNode = previousNode->new_node("body");

SETTING THE NODE CONTENT:  Now that we have a node, we may want to add some content.

//**************************************************************//
//  SET NODE CONTENT                                            //
//**************************************************************//
void html_tree::set_node_content(string content, bool lineBreak) { 

    m_tagContent.push_back( {content, lineBreak} );

}

You will note the lineBreak flag in the function declaration, this is simply to tell the class if we want a <br> at the end of our content.  LineBreak defaults to “true”  if the function is not presented with an argument.

There are several other methods in this class that are used for parsing, but these are the only nodes that we use for setting the nodes.

 

HTML NODE TYPES:  By default when we build our html page our function will nest each child node within a parent, but there are certain exceptions.  The exception that we need to handle are void tags and block tags.  Void tags are tags such as <br> that do not have a closing tag.  block tags are tags where we do not want any physical line breaks between the opening and closing tags.  Another feature of block tags is that they do not contain nested nodes.

In order to handle these tags we maintain a couple of string vectors (probably should be a list really) in the header file; these can be changed easily at any point in the future if we have missed something or the language changes.  At the moment we have:

//Void Tags (should not be closed)
        const std::vector<std::string> voidTags {
            "area",
            "base",
            "br",
            "col",
            "command",
            "embed",
            "hr",
            "img",
            "input",
            "keygen",
            "link",
            "meta",
            "param",
            "source",
            "track",
            "wbr"
        };

        //Blocks (That should not have new lines)
        const std::vector<std::string> blockTags {
            "textarea",
            "p",
            "h1",
            "h2",
            "h3",
            "h4",
            "h5",
            "h6",
            "title",
            "a",
            "button",
            "label"
        };

We can now create a simple function to check whether the tag is in one of the lists:

//**************************************************************//
// IS THE CURRENT TAG A VOID                                    //
//**************************************************************//
bool html_tree::is_special_tag(string tag, const vector<string> &table) {
    bool voidTag = false;

    for(auto &it : table) {
        if(tag == it) voidTag = true;
    }

    return (voidTag);
}

And another to determine if the node has any children:

//**************************************************************//
// IS THIS A LEAF NODE                                          //
//**************************************************************//
bool html_tree::is_leaf_node() {

    return((childNodes.empty()) ? true : false);

}

Finally we need another method to see if the node has been closed:

//**************************************************************//
// IS NODE CLOSED                                               //
//**************************************************************//
html_tree::status html_tree::closed() { return(m_nodestatus); }

All that is required now is a couple of methods to open and close the nodes:

OPENING THE NODE:  This method opens the node formats its contents accordingly

//**************************************************************//
// OPEN NODE                                                    //
//**************************************************************//
void html_tree::open_node(unsigned tabs) {

    if(tabs)
        m_htmlPage.append(tabs, '\t');

    string tag(m_htmlTag.substr(0, m_htmlTag.find_first_of(" ")));
    bool blockTag(is_special_tag(tag, blockTags));

    m_htmlPage.append("<" + m_htmlTag + ">");

    if(!blockTag)
        m_htmlPage.append("\n");

    if(!m_tagContent.empty()) {
        for(TAG_CONTENT &s : m_tagContent) {

            if(blockTag) {
                m_htmlPage.append(s.text);
                if((s.lineBreak) && (&s != &m_tagContent.back()))
                    m_htmlPage.append("<br>");
            }
            else {
                m_htmlPage.append(tabs +1, '\t');
                m_htmlPage.append(s.text);
                if((s.lineBreak) && (&s != &m_tagContent.back()))
                    m_htmlPage.append("<br>");

                m_htmlPage.append("\n");
            }
        }
    }
    m_nodestatus = status::OPEN;

}

You will note that the only parameter is tabs, this is the number of indents to prepend the line with.

In order to determine the name of the node the first thing that the method does is look for the first space within the declaration; anything that precedes it is the node name.

All that this method does is wrap the tag declaration around opening and closing brackets (“<” and  “>”) and formats any content depending on the type of node declared.

 

CLOSING THE NODE: When you use this class you do not have to handle the closing of nodes, this is done automatically by calling the close_node() method:

//**************************************************************//
// CLOSE NODE                                                   //
//**************************************************************//
void html_tree::close_node(unsigned tabs) {

    string tag(m_htmlTag.substr(0, m_htmlTag.find_first_of(" ")));

    if(!is_special_tag(tag, voidTags)) {

        if(!is_special_tag(tag, blockTags)) {
            if(tabs)
                m_htmlPage.append(tabs, '\t');
        }

            m_htmlPage.append("</" + tag + ">\n");

    }

    m_nodestatus = status::CLOSED;

}

That is basically all there is to the html_tree class, In order to use it effectively to produce reasonably formatted html you will need my pretty_html functions to turn the parent node into a formatted html page.

THE DESTRUCTOR:  As we have dynamically created our child nodes, when we have finished with this node, before destroying it, we need to destroy its children:

//----------------------------//
// DESTRUCTOR		      //
//----------------------------//
html_tree::~html_tree() {
    for(html_tree *it : childNodes) {
        delete it;
    }
}