SQLite 3.0.7

by Clay Dowling

D. Richard Hipp's SQLite database engine has earned a well-respected place in the toolbox of many programmers. Its small size and simple distribution make it a natural choice for standalone and embedded applications. Wide support by many programming languages, including PHP, has made SQLite popular for Web applications that need persistent data storage but don't need the kind of multi-user scaling capabilities provided by server-based solutions.

Unicode Support

Version 3 of SQL adds support for UTF-8 data. If your content doesn't happen to be in English, this addition obviously is of great benefit to you. According to the SQLite author, version 3 also has internal support for UTF-16 data. If you need that feature at the moment, however, you need to implement it yourself or make arrangements with the author to have that functionality built in.

Data Typing

One of the great peculiarities of the SQLite database has been its typeless storage--any data of any size can be stuffed into any field. That feature hasn't changed in version 3, but a new feature called manifest typing has been added. With manifest typing, the database engine attempts to make the data stored in a field be of the same general type as the declared type of the field. The engine still stores any data in any field, but if the type of the data matches the declared type of the field, the database engine then stores the data in a binary format that takes up less space.

Manifest typing means that a programmer can query the database for the type of an individual field in a result set with the sqlite3_column_type() function call. The types that are stored are integer, float, text, blob and null.

Version 3 also adds support for BLOBS. This means, for example, that it would be perfectly reasonable to store photos of the desired items or the family members themselves in my family wish list database.

Application Interface

The most apparent change that most previous SQLite users notice is the prefix changes. SQLite 3 functions all begin with sqlite3_ instead of sqlite_, as in older versions.

Old methods of data access still are available, including sqlite3_exec() and sqlite3_get_table(). For the uninitiated, sqlite3_exec takes a callback function as one of its parameters and calls this function for each row in the data set. The sqlite3_get_table function returns the entire results set in a single-character array.

The preferred way to interact with the database in SQLite 3 is the prepared statement interface. Advanced users of SQLite 2.8.x might recognize prepared statement routines as an extension of the experimental callback free vm interface to SQLite.

Prepared Statements

ODBC programmers have been able to use prepared statements for a long time. In addition to extending the older virtual machine interface to SQLite, prepared statements offer a number of convenience features for programmers. For those who haven't seen a prepared statement, an example might be:


     SELECT id, name, email FROM user WHERE name = ?

The question mark represents a wildcard, a place where real data is bound before the query actually runs. The query is prepared once, and for each execution, actual data is bound to the wildcards. This eliminates the need for careful, paranoid parsing of user-supplied data for killer characters and potential exploits.

Prepared queries also offer some savings in terms of speed. Internally, a database must convert a statement to a set of parameters that define the actual steps to be carried out in the search. This can represent some computational expense, especially if a single query is executed repetitively. Removing the compilation step for all but the first invocation of the query saves processor time. This saved times adds up in applications such as data conversions or modifications, where hundreds or thousands of identical queries may need to be run.

The best way to understand the new API is to see it in use. Here, I show some examples from an e-commerce wish list application on which I'm currently working. The particular code shown comes from a command-line utility for generating shopping lists. The complete code for the example can be downloaded (see Resources).

The first change to note is the changed connection command.


sqlite3* db;

if (sqlite3_open("wishlist.db", &db))
  error_handler(db);

As with other SQLite functions, sqlite3_open now returns an integer result code rather than a connection handle. The second parameter then becomes a pointer to a pointer to an sqlite3 structure. Any result other than SQLITE_OK (0) causes control to flow to my error handler. That function shows another of the API changes:


void error_handler(sqlite3* db) {

  fprintf(stderr, "Error: %s\n", sqlite3_errmsg(db));
  exit(EXIT_FAILURE);

}

The error string function now returns a pointer rather than asking for a pointer to a character pointer as a parameter. Although it's not in keeping with the rest of the library, it's rather pointless to return an integer error code from this function, so there was no benefit to putting the programmer through pointer gymnastics.

In my program, I left the statement processing, binding and execution in the main function and moved the data gathering out to a second function.


#define ITEM_SQL "SELECT * FROM item WHERE user_id=?"

struct item;

struct item* item_new();
void item_delete(struct item*);
struct item* item_populate(sqlite3_stmt*, struct item*);

int* get_user_ids();
void print_wishlist(struct item*);

int main() {

  sqlite3_stmt* itemqry;
  struct item* wishlist;
  int* user_id;
  int i;

  user_id = get_user_ids(); /* Magic function */

  /* Prepare the statement for use, many times over */
  if (sqlite3_prepare(db, ITEM_SQL, strlen(ITEM_SQL), &itemqry, NULL))
    error_handler(db);
  
  for(i=0; user_id[i]; i++) {
    /* Data binding */
    if (sqlite3_bind_int(itemqry, 1, user_id[i]))
      error_handler(db);
    
    /* Actual data gathering */
    while((rc = sqlite3_step(itemqry)) == SQLITE_ROW) 
      wishlist = item_populate(itemqry, wishlist);

    /* Use the data we've collected */
    print_wishlist(wishlist);
    
    /* Get ready for the next user_id*/
    sqlite3_reset(itemqry);
  }

  /* Final cleanup */
  sqlite3_finalize(itemqry);
  return EXIT_SUCCESS;
}

The code has been commented lightly to make it a little more clear what is happening. For sqlite function, we start with the statement preparation. Notice that it is outside of any loops; once we've done it, we don't need to do it again. The itemqry pointer is the handle that we're going to use for all of our other steps.

Inside of the for loop, we need to rebind the single wildcard in the statement each time through the loop. There's no recompilation, which saves code and processor time.

In the execution step we're responsible for looping over all of the data to collect it. The sqlite3_step function handles that for us and returns SQLITE_ROW whenever a call results in actual data. A more robust program would look for the other possible values, which include SQLITE_BUSY and SQLITE_DONE. In practical terms, anything other than SQLITE_ROW means that our data gathering process is done.

Every time I get a row of data, I'm shipping off my prepared statement to a data handler function, item_populate();. This function is responsible for extracting the data from the row, populating an item structure and appending it to our linked list of items.

Once I have my data, I do whatever processing is necessary, which in this case is a printing function.

Finally, I call sqlite3_reset() on my prepared statement to free up any consumed by the query. It's worth noting that sqlite3_reset() doesn't change or remove the wildcard values from the prepared statement. If you step through the query again, you get the same result set.

As a last step, here I show you how data is extracted from a result set in SQLite 3. The new retrieval features are much more sophisticated than the string parsing necessary with previous versions.


struct item {

  int id;
  char* description;
  char externalkey[41];
  struct item* next;

};

struct item* item_populate(sqlite3_stmt* s, struct item* top) {

  struct item* cur;
  struct item* i;
  int size;
  const char* hold;

  i = item_new();
  i->id = sqlite3_column_int(s, 0);
  hold = sqlite3_column_text(s, 1);
  if (hold)
    strncpy(i->externalkey, hold, 40);
  size = sqlite3_column_bytes(s, 2);
  hold = sqlite3_column_text(s, 2);
  if (hold) {
    i->description = (char*)calloc(1, size + 1);
    strncpy(i->description, hold, size);
  }

  /* Linked list handling code */
  if (top == NULL) return i;
  for(cur = top; cur->next ; cur = cur->next);
  cur->next = i;

  return top;

}

The sqlite3_column family of functions extract the data from an individual column. Columns can be indexed numerically only, so it's a good idea to know the order in which your columns will be returning. If you don't know, you can inquire about the name of a specific column with sqlite3_column_name().

Some of my columns are allowed to have null values, and I need to take precautions to address that. That's why, rather than copying the results of sqlite3_column_text() directly to the externalkey element of my item structure, I assign it to a temporary pointer. This way, I can check to make sure it isn't a NULL before copying a fixed length of the string into my array. If you haven't worked with SQLite before, it's important to remember that the declared size of a string in the database has no bearing on the size of the string stored or returned. The strn series of string functions are vital to dealing with this data.

Finally, we have the code for handling the last column of our data, which is a potentially large text object. sqlite3_column_bytes() tells me how large the object is, which lets me allocate sufficient space for copying the data.

There's plenty more to the API, but hopefully these few examples have shown you enough to get you going. Features I didn't address include the ability to define your own collating sequences, implement a database permissions system and define your own database functions.

Improved Concurrency

Concurrency is the ability for multiple processes to access the same data set at the same time. Although disk-based databases in general aren't the first choice for multiple-user applications, it is entirely possible to use them with multiple processes accessing the same database. SQLite uses a coarse-grained locking system, requiring exclusive access to the database file when it is writing but allowing multiple processes to read from the database.

If a steady stream of processes is reading data, it was possible under the old version of SQLite for those processes to block a writing process from getting exclusive access. Under version 3, a process trying to write to the database can signal its intent and prevent new processes from starting their reading until the writing process has had its turn.

In simpler turns, look at the database as a street intersection. If there isn't much traffic, everything flows smoothly and rapidly even with only a two-way stop sign to regulate traffic. Now imagine trying to use that same two-way stop to regulate traffic flow in downtown Chicago at 5pm on a Friday. If you've been unfortunate enough to be in that crowd, you know that a stop sign isn't going to suffice. It takes a traffic cop to keep traffic moving at all.

What this means for you is if you're using SQLite for a multithreaded or multi-user application, you should spend less time waiting for locks.

Final Notes

I already was sold on SQLite for a wide variety of applications. The new features make the database easier to use and suitable for a wider array of applications. I expect that we'll be seeing SQLite turning up in more and more places.

Load Disqus comments