SQLAlchemy
Although it sometimes might seem as if relational databases have gone the way of the dinosaur, making way for non-relational (NoSQL) databases, such as MongoDB and Cassandra, a very large number of systems still depend on a relational database. And, although there is no requirement that a relational database use SQL as its query language, it's a rare database product that does not do so.
The good news is that SQL is relatively easy to work with, particularly when the queries are straightforward. It's fast and easy to create tables, insert data into them, update that data and write queries that retrieve some or all parts of the data. SQL also makes it fairly easy to combine ("join") information from multiple tables, letting you normalize the data, while keeping speed and flexibility at a maximum.
SQL might not be difficult to work with on its own, but you rarely work with it in a vacuum. Usually, your SQL statements reside within a program you have written. The SQL is kept as a text string within the application and is then sent, via a network socket, to the server.
There are several problems with this. First, it means you have to mix two different languages within the same program. Inside your Web application, which you've worked hard to write, and which you try to ensure is maintainable, you have code in a totally separate language, inside strings, which you cannot test or maintain directly.
Even if the SQL queries weren't written inside strings, you still would be faced with the fact that the majority of your Web application is written in one language, but your data-manipulation routines are written in another language. A Web application contains, no matter how you slice it, components in HTML, CSS and JavaScript, as well as whatever server-side language you're using. Adding SQL to this can only complicate things further.
Even if SQL and a typical server-side language were on equal footing in terms of the syntax of a Web application, there's a fundamental mismatch between the ways in which they handle data. SQL operates with rows and columns within tables; everything in a relational database has to fit into this table-centric view of the world. By contrast, modern programming languages have a rich variety of data structures and typically are object-oriented to some degree or another.
Libraries that bridge the gap between procedural code and SQL are known as object-relational mappers, or ORMs. ORMs typically represent database records as instances of a particular class. In order to represent 50 records, you would need 50 instances of a class, with the state of each instance reflecting the names, types and values of the columns in that record.
There are two basic paradigms for passing data between the object-oriented data structures and the database, both of which were described by Martin Fowler. In the first paradigm, known as Active Record, each instance is tied directly to a row in the database, and the class itself (as well as each object) is responsible for ensuring that the data is saved to the database. In other words, Active Record requires that you create a single class, and that it handles both sides of the object-relational divide. The Active Record class in Ruby on Rails is (not surprisingly) an implementation of this paradigm and provides a great deal of power and flexibility.
A second paradigm is known as Data Mapper, and it requires the use of three different object classes: a class that represents the data itself at the object level, a class that represents the database table and a "mapper" object that acts as a go-between, ensuring that the object and relational parts of the system are appropriately synchronized.
An excellent and popular example of the Data Mapper paradigm can be found in the SQLAlchemy project. SQLAlchemy has been around for a number of years already, and makes it possible to work with relational databases flexibly from within your Python program, without having to write any SQL.
In this article, I take a look at SQLAlchemy, exploring a number of its options and features, and considering how it can be used in Web and other applications.
Connecting to a DatabaseInstalling SQLAlchemy should be straightforward to anyone who has installed Python packages before. You can get SQLAlchemy from PyPi, the Python Package Index, either by downloading it from http://pypi.python.org or by using the easy_install or pip programs to retrieve and install it. I was able to install it with:
pip install sqlalchemy
You might need to install SQLAlchemy as root. Or, you can install it into a virtual machine, using the popular virtualenv package for Python, which gives you nonroot control and permissions over a Python environment.
You also will need to install a driver for the database you intend to use. My favorite relational database is PostgreSQL, and I use the psycopg Python driver, also available on PyPi and (by extension) via pip.
I should note that although I know SQLAlchemy works with Python 3, much of the work I do nowadays is still in Python 2, mostly because that's what my clients are using. My examples, thus, also will be in Python 2, although I believe they will work in Python 3 with little or no change.
Let's assume you have a database table, People:
CREATE TABLE People (id SERIAL PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT,
birthday DATE);
Let's also add some initial records:
INSERT INTO People (first_name, last_name, email, birthday)
VALUES ('Reuven', 'Lerner', 'reuven@lerner.co.il', '1970-jul-14'),
('Foo', 'Bar', 'foobar@example.com', '1970-jan-1');
In order to access this table using SQLAlchemy's ORM, you first need to create a database session object, which itself must be created using an "engine". Each database driver has its own style of URL. In the case of PostgreSQL accessed via the psycopg2 driver, you would use something like this:
dburl = 'postgresql+psycopg2://reuven:reuven@localhost/atf'
This URL indicates not only the database and driver type, but also my user name and password ("reuven" in both cases), the hostname (localhost) and the name of the database I'll be accessing ("atf"). If the database is not available at the default PostgreSQL port of 5432, you can specify that as well in the URL.
You then tell Python to create a new engine based on this URL:
from sqlalchemy import create_engine
engine = create_engine(dburl)
Now that you have the engine defined, you can create a session based on this engine. Doing so requires two steps: first you create a new, custom Session class for this engine, and then you create an instance of the Session class that you will use to access the database:
from sqlalchemy.orm import sessionmaker # import sessionmaker class
Session = sessionmaker(bind=engine) # make custom session type
session = Session() # make instance of session
You're now connected to the database! But, that's not quite enough. If you want to map your database table to one or more Python objects, you need to define a class. You do this by defining a normal Python class, with a few subtle changes:
-
The class must inherit from Base, a class returned from the declarative_base function provided by SQLAlchemy.
-
The database columns must be defined as class attributes, as instances of the SQLAlchemy-provided Column class.
-
You connect the class with your database table by defining the
__tablename__
class-level attribute.
For example, the following Python class provides a mapping to the People database table:
from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Person(Base):
__tablename__ = 'people'
id = Column(Integer, primary_key=True)
first_name = Column(String)
last_name = Column(String)
email = Column(String)
birthday = Column(DateTime)
def __init__(self, firstname, lastname, email, birthday):
self.first_name = firstname
self.last_name = lastname
self.email = email
It might not be obvious at first glance, but this class implements the Data Mapper design pattern. The class attributes that you have defined describe the columns in the database table and can contain a great deal of detail, including indexes, uniqueness requirements and even custom integrity constraints, such as those provided by PostgreSQL. The class itself is a standard Python class.
But behind the scenes, because this class inherits from Base, you get a
number of other attributes, including __mapper__
, which indicates how
your Python class will be mapped to the database table. You can see
this in an interactive Python shell by asking to see the printed
representation of Person.__mapper__
:
Person.__mapper__
<Mapper at 0x10af5ef90; Person>
You now have a session that connects to the database and a table in the database that has been described in Python. You now can execute a query against your table:
for p in session.query(Person):
print p.first_name
That gives the following:
Reuven
Foo
In other words, session.query
is executing a query against the
database, without you having to specify the SQL. You also can restrict
the records you'll get, by chaining the
filter_by
method to your
query:
for p in session.query(Person).filter_by(id=1):
print p.first_name
That gives the following:
Reuven
Note that the filter_by
method is not acting on the results of
session.query
. Rather, it is changing the SQL that
eventually is sent
to the database. You can see this by assigning printing to the query
object without executing it or putting it in an iteration context:
print session.query(Person).filter_by(id=1)
SELECT people.id AS people_id,
people.first_name AS people_first_name,
people.last_name AS people_last_name,
people.email AS people_email,
people.birthday AS people_birthday
FROM people
WHERE people.id = :id_1
You also can see from this query that SQLAlchemy binds parameters to variables inside your query, rather than directly placing your values. Not only does this allow you to re-run queries later with different variable values, but it reduces the possibility that you will suffer from an SQL injection attack, which still is surprisingly common.
You also can order the results:
for person in session.query(Person).order_by('first_name'):
print person.first_name
Foo
Reuven
And, you also can do all of the basic "CRUD" activities that you would expect. For example, you can create a new instance of your Person class and then save it to the database:
p = Person('newfirst', 'newlast', 'new@example.com', '1-jan-2012');
session.add(p)
session.commit()
Notice how I can handle multiple inserts (or other actions) inside
a single transaction by only issuing
session.commit()
after adding
several objects. Similarly, I can update the object and the
corresponding row in the database:
p.first_name = '!!!'
session.add(p)
session.commit()
I also can delete the object:
session.delete(p)
session.commit()
Relationships
If SQLAlchemy could only do this, it still would be a nice library, simplifying your queries. But the real power of SQLAlchemy occurs when you define relationships between tables. For example, let's assume that I have an Appointments table, indicating when I'm meeting with various people:
CREATE TABLE Appointments (
id SERIAL PRIMARY KEY,
person_id INTEGER NOT NULL REFERENCES People,
meeting_at TIMESTAMP NOT NULL,
notes TEXT
);
Let's also add some appointments:
INSERT INTO Appointments (person_id, meeting_at, notes)
VALUES (2, '1-jan-2013', 'New Year meeting'),
(2, '1-feb-2013', 'Monthly update');
Now I need to create a Python class that represents appointments:
class Appointment(Base):
__tablename__ = 'appointments'
id = Column(Integer, primary_key=True)
person_id = Column(Integer)
meeting_at = Column(DateTime)
notes = Column(String)
Now, this class will work just fine. However, there's no relationship, according to Python, between the Person class and the Appointment class. To make this work, you'll need to change each of these table definitions. In the case of Appointment, you'll need to indicate that the person_id column doesn't just contain an integer, but that it is a foreign key that points to the "id" column on the People table:
person_id = Column(Integer, ForeignKey('people.id'))
On the Person table, you'll need to add a line to the class attributes, after describing all of the columns:
appointments = relationship("Appointment", backref="person")
Thanks to these two lines, you get an "appointments" attribute on your Person model. But thanks to the "backref" parameter, you also get a "person" reference on the appointment. This means you can do something like this:
for a in session.query(Appointment):
print a.person
for p in session.query(Person):
print p.appointments
Note that the assumption is that you'll have multiple appointments per person, representing a one-to-many relationship.
Let's say, however, that you want to have a many-to-many relationship
between people and appointments, such that you can meet with more than
one person at a time, and you can have more than one appointment with a
particular person. In order to do that, you need to modify your
database table and code somewhat, adding a third (association) table.
SQLAlchemy makes it easy to do that. Although I don't have space to show
it here, the basic idea is that you create the third table, and you
use the relationship()
function to indicate that there is a secondary
relationship between the class and the join table.
SQLAlchemy is packed with features. In addition to the introductory examples I showed here, it handles everything from joins to connection pooling, to dynamically calculated column values, to creating Python classes based on an existing database table. There is no doubt that it's a powerful system, one that I expect to use in some of the Python projects on which I work.
That said, I found SQLAlchemy to be a bit overwhelming for the newcomer. Perhaps it's because I have long used the Active Record model in Ruby, which has minimal configuration and syntax, but I found the syntax for SQLAlchemy to be a bit overly verbose. Then again, Python has long preferred things be explicit, and there's no doubt that SQLAlchemy provides a clear and explicit ORM, without much magic and with obvious ramifications for every function call and parameter.
The other thing that might throw off newcomers to SQLAlchemy is that the documentation is complete, but not particularly friendly. Once you start to use the system, I expect that you (like me) will be able to understand the documentation and make good use of it. But I found that even the tutorial documents were a bit formal, trying to tell you too much before moving ahead with actual code. Hopefully, this article can help some more people become interested in SQLAlchemy.
In conclusion, SQLAlchemy is a great Python module, one that deserves its sterling reputation and broad popularity. If you're interested in working with databases from Python programs, you definitely should take a look at SQLAlchemy.
The home page for SQLAlchemy is http://sqlalchemy.org, and the documentation is at http://docs.sqlalchemy.org. The Python language is at http://python.org. I suggest that you read through the introductory section and then the ORM documentation, rather than look at the document sequentially.
There are a number of on-line tutorials for SQLAlchemy. Two that I enjoyed, which are freely available to the public, are https://www.youtube.com/watch?v=399c-ycBvo4 and https://www.youtube.com/watch?v=PKAdehPHOMo.
Finally, Rick Copeland's book, Essential SQLAlchemy, published by O'Reilly in 2008, is a good introduction, particularly if you look at the ORM section. The rest is a bit dry and technical, even if the examples are well written. This book is not completely up to date, and there are several items in it that reflect the fact that it was published several years ago. Nevertheless, having an additional reference can be quite handy and can provide examples for certain features that aren't otherwise obvious.