A C SQLite extension providing a suite of functions to parse and extract components from URL strings directly within your SQL queries. This extension leverages the furi library for URL parsing.
- Extract various components of a URL:
- Scheme (e.g.,
http,https) - Authority (e.g.,
user:pass@example.com:8080) - Userinfo (e.g.,
user:pass) - Host (e.g.,
example.com) - Port (e.g.,
8080) - Path (e.g.,
/path/to/resource) - Query (e.g.,
key1=value1&key2=value2) - Fragment (e.g.,
section1)
- Scheme (e.g.,
- SQLite (version 3.x.x with extension loading enabled)
- SQLite Development Headers (e.g.,
sqlite3.h,sqlite3ext.h) - C Compiler (e.g., GCC, Clang)
TODO: Cmake instructons per OS
Once you have compiled libsqlite_urlparser.so (or .dylib/.dll), you can load it into SQLite:
-
Using the SQLite CLI:
sqlite> .load ./libsqlite_urlparser.dll sqlite> SELECT url_host('https://www.example.com/path'); www.example.com
-
Programmatically (e.g., in C, Python, etc.): Most SQLite client libraries provide a way to load extensions.
- C API:
sqlite3_load_extension(db, "./libsqlite_urlparser.so", 0, 0); - Python (
sqlite3module):import sqlite3 conn = sqlite3.connect(':memory:') conn.enable_load_extension(True) conn.load_extension('./library.so') # Adjust path and extension name conn.enable_load_extension(False) # Optional: disable after loading cursor = conn.cursor() for row in cursor.execute("SELECT url_scheme('ftp://files.server.net/data.zip')"): print(row[0]) # ftp conn.close()
- C API:
After loading the extension, the following SQL functions become available:
| Function | Description | Example |
|---|---|---|
url_scheme |
Extracts the scheme from a URL. | SELECT url_scheme('https://ex.com'); -> https |
url_authority |
Extracts the authority part. | SELECT url_authority('https://u:p@ex.com:80'); -> u:p@ex.com:80 |
url_userinfo |
Extracts user information from the authority. | SELECT url_userinfo('https://u:p@ex.com'); -> u:p |
url_host |
Extracts the host from the authority. | SELECT url_host('https://ex.com:8080/p'); -> ex.com |
url_port |
Extracts the port from the authority. | SELECT url_port('https://ex.com:8080'); -> 8080 |
url_path |
Extracts the path component. | SELECT url_path('https://ex.com/a/b?q'); -> /a/b |
url_query |
Extracts the query string (without '?'). | SELECT url_query('https://ex.com/p?k=v#f'); -> k=v |
url_fragment |
Extracts the fragment (without '#'). | SELECT url_fragment('https://ex.com/p#sec'); -> sec |
Example Queries:
-- Load the extension (if not already loaded)
-- .load ./library.so
-- Basic usage
SELECT url_scheme('http://example.com/path?query=value#fragment');
-- Result: http
SELECT url_host('http://user:password@example.com:8080/path');
-- Result: example.com
SELECT url_path('http://example.com/some/resource.html');
-- Result: /some/resource.html
SELECT url_query('http://example.com/search?q=sqlite&lang=en');
-- Result: q=sqlite&lang=en
SELECT url_fragment('http://example.com/page#section-2');
-- Result: section-2
-- Using with table data
CREATE TABLE weblogs (id INTEGER PRIMARY KEY, referrer_url TEXT);
INSERT INTO weblogs (referrer_url) VALUES
('https://www.google.com/search?q=sqlite+extensions'),
('http://internal.app/dashboard#overview'),
(NULL),
('https://blog.example.org/article/123');
SELECT
id,
referrer_url,
url_host(referrer_url) AS referring_host,
url_path(referrer_url) AS referring_path
FROM weblogs;