Parsing is not our main topic, but we need it. Let’s create a parser that works, it doesn’t have to follow best practices - it’s not part of our fundamental topics.
We will be using the sqlparse
library to help us.
Here we will be describing the gist of how it works.
Exposed Pydantic interfaces
The next step after the parser - for now the engine, later the compiler - will need to have a structured representation of the query. We will use Pydantic
for that.
Remember that for now we are only supporting 3 very basic variations of statements:
CREATE TABLE
. SupportingINT
andVARCHAR
INSERT INTO
SELECT * FROM table
Here is our exported representation:
from enum import Enum
from pydantic import BaseModel
class SomeSQLType(Enum):
INT = "INT"
VARCHAR = "VARCHAR"
class SomeColumnDefinition(BaseModel):
name: str
type: SomeSQLType
length: int | None = None
class SomeSQLStatementBase(BaseModel):
pass
class SomeCreateTable(SomeSQLStatementBase):
columns: list[SomeColumnDefinition]
name: str
class SomeInsertInto(SomeSQLStatementBase):
table_name: str
column_names: list[str]
values: list
class SomeSelect(SomeSQLStatementBase):
table_name: str
SomeSQLStatement = SomeCreateTable | SomeInsertInto | SomeSelect
There isn’t much to say here, other than we prefix all structures with Some
to avoid confusion with the sqlparse
library.
The last line references all the statements we support. It is, in a sense, the entry point.
Example usage
Here is one example that covers all the cases we support:
from some import engine, parse
query1 = "CREATE TABLE users (id INT, name VARCHAR(100))"
query2 = "INSERT INTO users (id, name) VALUES (1, 'Jane Doe')"
query3 = "SELECT * FROM users"
if __name__ == "__main__":
for query in (query1, query2, query3):
parsed_statement = parse.parse(query)
print(f"Parsed statement:\n{type(parsed_statement)}")
print(parsed_statement)
Here is the output:
There are three parsed statements, one for each query. Have a cursory look at the data structure generated for each case.
Some implementation details
This is not a Substack about parsing. It’s an interesting topic, really it is. But for another season.
That being said, I am interested in your comments about the interface that is provided to the next piece of functionality (here the next piece is the executor).
I will take any comments in consideration both to redo the basic example and for future versions Some.
Final thoughts
You can find the code for the parser in the basic branch.