Skip to content

Sql tokenizer#36

Open
jobala wants to merge 4 commits intomainfrom
sql-tokenizer
Open

Sql tokenizer#36
jobala wants to merge 4 commits intomainfrom
sql-tokenizer

Conversation

@jobala
Copy link
Owner

@jobala jobala commented Mar 14, 2026

Overview

Breaks down sql string into tokens

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new lib/sql module that tokenizes a SQL string into a sequence of tokens, and wires it into the build/test system.

Changes:

  • Introduces SqlTokenizer with basic identifier/keyword/symbol tokenization.
  • Adds TokenType, Token, and literal/symbol classification helpers.
  • Adds a new GTest (sql_test) and integrates the new library/target into CMake.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
CMakeLists.txt Adds lib/sql to the top-level build.
lib/sql/CMakeLists.txt Defines the new sql library target.
lib/sql/token.h Introduces token types, token struct, and helper functions for classification/type mapping.
lib/sql/tokenizer.h Declares the SqlTokenizer interface.
lib/sql/tokenizer.cpp Implements SQL tokenization logic.
test/CMakeLists.txt Adds and registers the sql_test target.
test/sql_test.cpp Adds initial tokenizer unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +31
#include <algorithm>
#include <cctype>
#include <set>
#include <string>
#include <unordered_map>

enum class TokenType : std::uint8_t {
// keywords
SELECT,
FROM,

// literals
LONG,
DOUBLE,
STRING,
IDENTIFIER,

// Symbol
STAR,
COMMA,
};

namespace Type
{

inline TokenType from_string(std::string token)
{

std::ranges::transform(token, token.begin(), [](unsigned char letter) { return std::toupper(letter); });
Comment on lines +51 to +54
std::set<unsigned char> symbols{'*', ','};
return symbols.contains(letter);
}

Comment on lines +28 to +35
auto offset = skip_whitespace(offset_);
if (offset > (int)sql_.length())
{
return std::nullopt;
}

if (Literal::is_identifier_start(sql_[offset]))
{
Comment on lines +61 to +66
auto SqlTokenizer::skip_whitespace(int start_offset) -> int
{
auto end_offset = start_offset;
auto curr = sql_[end_offset];
std::cout << curr;

while (end_offset < (int)sql_.size() && static_cast<unsigned char>(sql_[end_offset]) != terminated)
{
end_offset += 1;
}
Comment on lines +61 to +68
struct Token
{
std::string text_;
TokenType type_;
int end_offset_;

Token(std::string &text, TokenType type, int end_offset) : text_(text), type_(type), end_offset_(end_offset) {}
};
Comment on lines +76 to +80
if (offset_ < (int)sql_.size() && '`' == sql_[offset_])
{
auto end_offset = get_offset_until_terminated_char('`', start_offset);
auto text = sql_.substr(start_offset, end_offset - start_offset);
return {text, TokenType::IDENTIFIER, end_offset + 1};
ASSERT_EQ(6, tokens.size());
for (int i = 0; i < (int)tokens.size(); i++)
{
std::cout << tokens[i].text_ << "\n";
Comment on lines +50 to +55
throw std::runtime_error("Not Implemented");
}

if (Literal::is_char_start(sql_[offset]))
{
throw std::runtime_error("Not Implemented");
Comment on lines +74 to +81
auto SqlTokenizer::scan_identifier(int start_offset) -> Token
{
if (offset_ < (int)sql_.size() && '`' == sql_[offset_])
{
auto end_offset = get_offset_until_terminated_char('`', start_offset);
auto text = sql_.substr(start_offset, end_offset - start_offset);
return {text, TokenType::IDENTIFIER, end_offset + 1};
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants