Writing a Custom SQLite Function (in C) - Part 1
Feb 02, 2023
A powerful feature of SQLite is the ability to call application-defined functions from within SQL. This functionality leverages the fact that SQLite is embedded within the application and thus has far greater capacity for integration than an external database.
Your language's SQLite library probably offers some facility for registering a function with SQLite. But in most cases, these are heavily abstracted, hiding all internal SQLite details and, in some cases, limiting what you're able to accomplish.
In this short series, we're going to add a user defined function to SQLite in C. We'll go through the process slowly and in a way which, I hope, a C novice can follow.
Setup
It's pretty easy to get setup. The first thing we'll do is download and unzip the complete SQLite source code. Head over to the SQLite download page and search for "Alternative Source Code Formats". Download the source code. I'll be using https://www.sqlite.org/2022/sqlite-src-3400100.zip, but there's a good chance the latest version will work fine. Next we'll run ./configure
and make
. The entire sequence looks like:
wget https://www.sqlite.org/2022/sqlite-src-3400100.zip
unzip sqlite-src-3400100.zip
cd sqlite-src-3400100/
./configure
make
In addition to other things, this will generate what's known as the "amalgamation" source code, which is the entire SQLite source code in a single file (sqlite3.c) with a single header (sqlite3.h). If you have any problems building this from source, you can grab the pre-generated amalgamation from the same download page, just search for "sqlite-amalgamation" (it's usually the first (pre-release) or second (stable release) link).
We'll copy sqlite3.c
and sqlite3.h
into a new empty working directory. For me, that's ~/code/play/sqlite
. To this, we'll add a file, main.c
. In the end, you should have a directory with 3 files: sqlite3.h
, sqlite3.c
and main.c
. You'll only touch main.c
, open it now.
Opening a Connection
The first thing we'll do is open a new SQLite database and then close it:
#include <stdio.h>
#include <stdlib.h>
#include "sqlite3.h"
int main(int argc, char ** argv) {
sqlite3 *db;
int flags = SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE;
int rc = sqlite3_open_v2("db.sqlite", &db, flags, NULL);
if (rc != SQLITE_OK) {
printf("Failed to open: %s\n", sqlite3_errmsg(db));
sqlite3_close_v2(db);
exit(1);
}
sqlite3_close_v2(db);
exit(0);
}
If you go through the above code, it's hopefully understandable. We're using 3 SQLite functions. sqlite3_open_v2
opens a connection to our database and, because we're specifying the SQLITE_OPEN_CREATE
flag, will create the database if necessary. sqlite3_close_v2
closes the connection. sqlite3_errmsg
returns the last error message that happened on the provided connection.
If you're new (or like me rusty) with C, there's one thing about the above that must be understood: return values. A typical pattern in C, and certainly in SQLite, is for functions to return a return code. If we need to "return" multiple values, such as sqlite3_open_v2
which needs to "return" a return code (hence rc
) and a connection, we pass a reference to our connection and let the function initialize it.
Those familiar with Go (or other modern languages) will probably consider this pattern inferior to returning multiple values (and I'd agree!). But even in Go, passing "output" parameters is quite common: think json.Unmarshal
which takes a reference to the data to populate, or the Scan
methods of sql.Row
and sql.Rows
which take references to empty values to populate.
We can compile and run the above:
cc sqlite3.c main.c -o sqlite_udf
chmod a+x sqlite_udf
./sqlite_udf
This should create an empty db.sqlite
file in the same directory. If you want, you can check the error handling by changing the path, "db.sqlite"
, to something that doesn't exist, like /invalid/fail
, re-compiling and running it again.
Refactoring
Before we start writing a custom function, let's clean things up a little to streamline the rest of our effort. Let's create a helper to print errors, and let's add another function, run
, so that we don't have to worry about closing our connection in every error case:
#include <stdio.h>
#include <stdlib.h>
#include "sqlite3.h"
static int run(sqlite3*);
static void print_error(const char*, sqlite3*);
int main(int argc, char ** argv) {
sqlite3 *db;
int flags = SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE;
int rc = sqlite3_open_v2("db.sqlite", &db, flags, NULL);
if (rc != SQLITE_OK) {
print_error("sqlite_open: %s", db);
exit(1);
}
int status = run(db);
sqlite3_close_v2(db);
exit(status);
}
static void print_error(const char *format, sqlite3 *db) {
printf(format, sqlite3_errmsg(db));
printf("\n");
}
static void run(sqlite3 *db) {
}
Note that in C, we usually explicitly declare our functions before we can use them. Hence we provide the function headers for run
and print_error
. An alternative approach is to simply define the entire function before we use it. So we could put print_error
and run
and any other function that we'll create before they are needed.
Increment User Defined Function
We're going to do this slowly and incrementally. We'll eventually write a slightly fancier function, but for now, we're keeping it simple. Inside of our new run
function, we're going to use the sqlite3_create_function
function to register a custom function with SQLite:
static int run(sqlite3 *db) {
int rc = sqlite3_create_function(db, "increment", 1, SQLITE_UTF8, NULL, &increment, NULL, NULL);
if (rc != SQLITE_OK) {
print_error("create_function: %s", db);
return 2;
}
return 0;
}
We're registering our (not-yet-created) increment
function, that takes 1
argument and naming it "increment". The eventual result will be that we'll be able to do: select increment(value_or_column)
. The first argument to sqlite3_create_function
is the database connection (sqlite3*
). Functions are registered on the connection, not the database. If we open multiple connections and want our user defined function available in all of them, we need to call sqlite3_create_function
on each connection.
Now the skeleton of our increment
function:
static void increment(sqlite3_context*, int, sqlite3_value**);
static void increment(sqlite3_context *context, int argc, sqlite3_value **argv) {
}
The sqlite3_context*
is used to store results that will be passed back to SQLite and can be used for a few other things (like getting the underlying db connection (of type sqlite3*
). argc
is the number of arguments that were passed into our function, and argv
are the actual arguments. When we called sqlite3_create_function
we specified that our function needed 1
argument. SQLite will enforce this so, in our case, argc
will always be equal to 1
. But we could have specified -1
to indicate that the number of argument is variable, in which case we'd use argc
to figure out the number of arguments that were passed to our function.
Parameters to our function are given as sqlite3_value*
. To increment, we'll need an integer. We can use the sqlite3_value_int
function to turn an sqlite3_value*
into an int
. As you can probably guess, there are functions for all of the SQLite supported types, such as sqlite3_value_double
, sqlite3_value_text
and so on:
static void increment(sqlite3_context *context, int argc, sqlite3_value **argv) {
int value = sqlite3_value_int(argv[0]) + 1;
}
We can then use the sqlite3_result_int
to store the result into the context. Just like sqlite3_value_*
, there are also sqlite3_result_*
for each supported type. Thus, the entire increment
function is:
static void increment(sqlite3_context *context, int argc, sqlite3_value **argv) {
int value = sqlite3_value_int(argv[0]) + 1;
sqlite3_result_int(context, value);
}
Testing It (By Executing A Statement)
Our increment
function will work, but it'd be nice to see it in action. To do so, we need to execute a statement, such as select increment(?1)
and read the result. This is, as you can probably guess, the most important thing we can do in SQLite. So while it isn't directly related to writing a custom function, it's worth learning.
To execute a statement, any statement, we must do multiple things. First we have to prepare the statement. Next we optionally bind values to the statement. Then we step through the results in a loop, until the statement execution is done. Finally, we must cleanup after ourselves. This prepare -> bind -> step loop -> cleanup is a common pattern in SQLite (for example, it's the same thing that must be done when PostgreSQL executes a statement).
Starting from the beginning, we must prepare a statement:
static int run(sqlite3 *db) {
sqlite3_stmt *stmt;
rc = sqlite3_prepare_v2(db, "select increment(9000)", -1, &stmt, 0);
if (rc != SQLITE_OK) {
print_error("statement prepare: %s", db);
return 2;
}
return 0;
}
The third parameter is the length of the SQL statement to execute. -1
tells SQLite to find the end of the string (zero-terminated). This is a common pattern, where we can either specify the length of the string or let SQLite infer it. Being able to pass in an explicit length allows us to use larger buffers and tell SQLite to only care about the first N bytes (think something like an Go slice; but unsafe since we could always specify a length greater than our buffer).
sqlite3_prepare_v2
only prepares a single statement. The last parameter is an output parameter that, after execution, would point to the remainder of our input. In this case, we only have a single statement and thus don't expect any remainder.
After the function executes, we expect rc == SQLITE_OK
and stmt
to be initialized. At this point, we need to bind our values. We could make our life easier and simply test select increment(9000)
, but to demonstrate binding, we'll do select increment(?1)
. Thus we need to bind one value:
if (sqlite3_bind_int64(stmt, 1, 9000) != SQLITE_OK) {
print_error("bind: %s", db);
return 2;
}
There are sqlite3_bind_*
versions for all of the SQLite supported types. The 2nd parameter is the bind index. Bind indexes start at 1.
We're now ready to step through the result. Normally, we'd step in a loop, waiting for the step function to return SQLITE_DONE
. But we'll cheat a little for our specific use-case and rely on the fact that we only expect a single row:
int status;
rc = sqlite3_step(stmt);
if (rc == SQLITE_ROW) {
printf("increment(%d) == %d", 9000, sqlite3_column_int(stmt, 0));
status = 0;
} else if (rc == SQLITE_DONE) {
printf("Unexpected empty result");
status = 3;
} else {
print_error("step: %s", db);
status = 2;
}
sqlite3_finalize(stmt);
return status;
Again, the sqlite3_step
function is what we use to iterate through each row of our result. Here we cheat a little as we know to expect a single row (thus, don't iterate). The sqlite3_column_int
function reads an int
from the current row as the specified index (here our indexes start at 0). There are various sqlite3_column_*
functions to read different types of values from.
Finally, we have to call sqlite3_finalize
on the prepared statement to clean up any resources associated with it. In the next part we'll look more closely at resource allocation and lifetime as these are critical and complicated topics we need to understand. Our simple example only deals with an integer, but what if we were dealing with a heap-allocated value (like a string)? Who would be responsible for freeing that memory?
Complete Code
Our completed code looks like:
#include <stdio.h>
#include <stdlib.h>
#include "sqlite3.h"
static int run(sqlite3*);
static void print_error(const char*, sqlite3*);
static void increment(sqlite3_context*, int, sqlite3_value**);
int main(int argc, char ** argv) {
sqlite3 *db;
int flags = SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE;
int rc = sqlite3_open_v2("db.sqlite", &db, flags, NULL);
if (rc != SQLITE_OK) {
print_error("sqlite_open: %s", db);
exit(1);
}
int status = run(db);
sqlite3_close_v2(db);
exit(status);
}
static void print_error(const char *format, sqlite3 *db) {
printf(format, sqlite3_errmsg(db));
printf("\n");
}
static int run(sqlite3 *db) {
int rc = sqlite3_create_function(db, "increment", 1, SQLITE_UTF8, NULL, &increment, NULL, NULL);
if (rc != SQLITE_OK) {
print_error("create_function: %s", db);
return 2;
}
sqlite3_stmt *stmt;
rc = sqlite3_prepare_v2(db, "select increment(?1)", -1, &stmt, 0);
if (rc != SQLITE_OK) {
print_error("statement prepare: %s", db);
return 2;
}
if (sqlite3_bind_int64(stmt, 1, 9000) != SQLITE_OK) {
print_error("bind: %s", db);
return 2;
}
int status;
rc = sqlite3_step(stmt);
if (rc == SQLITE_ROW) {
printf("increment(%d) == %d", 9000, sqlite3_column_int(stmt, 0));
status = 0;
} else if (rc == SQLITE_DONE) {
printf("Unexpected empty result");
status = 3;
} else {
print_error("step: %s", db);
status = 2;
}
sqlite3_finalize(stmt);
return status;
}
static void increment(sqlite3_context *context, int argc, sqlite3_value **argv) {
int value = sqlite3_value_int(argv[0]) + 1;
sqlite3_result_int(context, value);
}
Follow Up Exercise
There are two additional enhancements you might be interested in exploring on your own. The first is expanding our increment function to take a 2nd (possibly optional) parameter to indicate how much to increment our value by (defaulting to 1). The second is to add type strictness. As-is, the return value for increment('teg')
will be 1
, as sqlite3_value_int
will return 0
when called on a non-integer value (or a value that cannot be coerced to an integer (e.g. passing the a string argument of '9002' would return 9003)). We could use the sqlite3_value_type
function on argv[0]
to determine the type and handle non-integer values. In such cases, instead of using the sqlite3_result_int
to store an integer value in our context, we could use sqlite3_result_error
to store an error.