- PostgreSQL Naming Rules
- Creating, Destroying, and Viewing Databases
- Creating New Tables
- Adding Indexes to a Table
- Getting Information About Databases and Tables
- Transaction Processing
- Summary
Creating, Destroying, and Viewing Databases
Before you can do anything else with a PostgreSQL database, you must first create the database. Before you get too much further, it might be a good idea to see where a database fits into the overall scheme of PostgreSQL. Figure 3.1 shows the relationships between clusters, databases, and tables.
Figure 3.1 Clusters, databases, and tables.
At the highest level of the PostgreSQL storage hierarchy is the cluster. A cluster is a collection of databases. Each cluster exists within a single directory tree, and the entire cluster is serviced by a single postmaster3. A cluster is not namedthere is no way to refer to a cluster within PostgreSQL, other than by contacting the postmaster servicing that cluster. The $PGDATA environment variable should point to the root of the cluster's directory tree.
Three system tables are shared between all databases in a cluster: pg_group (the list of user groups), pg_database (the list of databases within the cluster), and pg_shadow (the list of valid users).
Each cluster contains one or more databases. Every database has a name that must follow the naming rules described in the previous section. Database names must be unique within a cluster. A database is a collection of tables, data types, functions, operators, views, indexes, and so on.
Starting with release 7.3, there is a new level in the PostgreSQL hierarchythe schema. Figure 3.2 shows the 7.3 hierarchy.
Figure 3.2 Clusters, databases, schemas and tables.
A schema is a named collection of tables (as well as functions, data types, and operators). The schema name must be unique within a database. With the addition of the schema, table names, function names, index names, type names, and operators must be unique within the schema. Prior to release 7.3, these objects had to be unique within the database. A schema exists primarily to provide a naming context. You can refer to an object in any schema within a single database by prefixing the object name with schema-name. For example, if you have a schema named bruce, you can create a table within that schema as
CREATE TABLE bruce.ratings ( ... ); SELECT * FROM bruce.ratings;
Each connection has a schema search path. If the object that you are referring to is found on the search path, you can omit the schema name. However, because table names are no longer required to be unique within a database, you may find that there are two tables with the same name within your search path (or a table may not be in your search path at all). In those circumstances, you can include the schema name to remove any ambiguity.
To view the schema search path, use the command SHOW SEARCH_PATH:
movies=# SHOW SEARCH_PATH; search_path -------------- $user,public (1 row)
The default search path, shown here, is $user,public. The $user part equates to your PostgreSQL user name. For example, if I connect to psql as user bruce, my search path is bruce,public. If a schema named bruce does not exist, PostgreSQL will just ignore that part of the search path and move on to the schema named public. To change the search path, use SET SEARCH_PATH TO:
movies=# SET SEARCH_PATH TO 'bruce','sheila','public'; SET
New schemas are created with the CREATE SCHEMA command and destroyed with the DROP SCHEMA command:
movies=# CREATE SCHEMA bruce; CREATE SCHEMA movies=# CREATE TABLE bruces_table( pkey INTEGER ); CREATE TABLE movies=# \d List of relations Name | Schema | Type | Owner ----------------+--------+-------+------- bruces_table | bruce | table | bruce tapes | public | table | bruce (2 rows) movies=# DROP SCHEMA bruce; ERROR: Cannot drop schema bruce because other objects depend on it Use DROP ... CASCADE to drop the dependent objects too movies=# DROP SCHEMA bruce CASCADE; NOTICE: Drop cascades to table bruces_table DROP SCHEMA
Notice that you won't be able to drop a schema that is not empty unless you include the CASCADE clause. Schemas are a new feature that should appear in version 7.3. Schemas are very useful. At many sites, you may need to keep a "development" system and a "production" system. You might consider keeping both systems in the same database, but in separate schemas. Another (particularly clever) use of schemas is to separate financial data by year. For example, you might want to keep one year's worth of data per schema. The table names (invoices, sales, and so on) remain the same across all schemas, but the schema name reflects the year to which the data applies. You could then refer to data for 2001 as FY2001.invoices, FY2001.sales, and so on. The data for 2002 would be stored in FY2002.invoices, FY2002.sales, and so on. This is a difficult problem to solve without schemas because PostgreSQL does not support cross-database access. In other words, if you are connected to database movies, you can't access tables stored in another database. Starting with PostgreSQL 7.3, you can keep all your data in a single database and use schemas to partition the data.
Creating New Databases
Now let's see how to create a new database and how to remove an existing one.
The syntax for the CREATE DATABASE command is
CREATE DATABASE database-name [ WITH [ OWNER [=] {username|DEFAULT} ] [ TEMPLATE [=] {template-name|DEFAULT} ] [ ENCODING [=] {encoding|DEFAULT} ] ] [ LOCATION [=] {'path'|DEFAULT} ]
As I mentioned earlier, the database-name must follow the PostgreSQL naming rules described and must be unique within the cluster.
If you don't include the OWNER=username clause or you specify OWNER=DEFAULT, you become the owner of the database. If you are a PostgreSQL superuser, you can create a database that will be owned by another user using the OWNER=username clause. If you are not a PostgreSQL superuser, you can still create a database if you have the CREATEDB privilege, but you cannot assign ownership to another user. Chapter 19, "General PostgreSQL Administration," describes the process of defining user privileges.
The TEMPLATE=template-name clause is used to specify a template database. A template defines a starting point for a database. If you don't include a TEMPLATE=template-name or you specify TEMPLATE=DEFAULT, the database named template1 is copied to the new database. All tables, views, data types, functions, and operators defined in the template database are duplicated into the new database. If you add objects (usually functions, operators, and data types) to the template1 database, those objects will be propagated to any new databases that you create based on template1. You can also trim down a template database if you want to reduce the size of new databases. For example, you might decide to remove the geometric data types (and the functions and operators that support that type) if you know that you won't need them. Or, if you have a set of functions that are required by your application, you can define the functions in the template1 database and all new databases will automatically include those functions. If you want to create an as-distributed database, you can use template0 as your template database. The template0 database is the starting point for template1 and contains only the standard objects included in a PostgreSQL distribution. You should not make changes to the template0 database, but you can use the template1 database to provide a site-specific set of default objects.
You can use the ENCODING=character-set clause to choose an encoding for the string values in the new database. An encoding determines how the bytes that make up a string are interpreted as characters. For example, specifying ENCODING=SQL_ASCII tells PostgreSQL that characters are stored in ASCII format, whereas ENCODING=ISO-8859-8 requests ECMA-121 Latin/Hebrew encoding. When you create a database, all characters stored in that database are encoded in a single format. When a client retrieves data, the client/server protocol automatically converts between the database encoding and the encoding being used by the client. Chapter 20, "Internationalization/Localization," discusses encoding schemes in more detail.
The last option for the CREATE DATABASE command is the LOCATION=path clause. In most cases, you will never have to use the LOCATION option, which is good because it's a little strange.
If you do have need to use an alternate location, you will probably want to specify the location by using an environment variable. The environment variable must be known to the postmaster processor at the time the postmaster is started and it should contain an absolute pathname.
The LOCATION=path clause can be confusing. The path might be specified in three forms:
The path contains a /, but does not begin with a /this specifies a relative path
The path begins with a /this specifies an absolute path
The path does not include a /
Relative locations are not allowed by PostgreSQL, so the first form is invalid.
Absolute paths are allowed only if you defined the C/C++ preprocessor symbol "ALLOW_ABSOLUTE_DBPATHS" at the time you compiled your copy of PostgreSQL. If you are using a prebuilt version of PostgreSQL, the chances are pretty high that this symbol was not defined and therefore absolute paths are not allowed.
So, the only form that you can rely on in a standard distribution is the lasta path that does not include any "/" characters. At first glance, this may look like a relative path that is only one level deep, but that's not how PostgreSQL sees it. In the third form, the path must be the name of an environment variable. As I mentioned earlier, the environment variable must be known to the postmaster processor at the time the postmaster is started, and it should contain an absolute pathname. Let's look at an example:
$ export PG_ALTERNATE=/bigdrive/pgdata $ initlocation PG_ALTERNATE $ pg_ctl restart -l /tmp/pg.log -D $PGDATA ... $ psql -q -d movies movies=# CREATE DATABASE bigdb WITH LOCATION=PG_ALTERNATE; ...
First, I've defined (and exported) an environment variable named PG_ALTERNATE. I've defined PG_ALTERNATE to have a value of /bigdrive/pgdatathat's where I want my new database to reside. After the environment variable has been defined, I need to initialize the directory structurethe initlocation script will take care of that for me. Now I have to restart the postmaster so that it can see the PG_ALTERNATE variable. Finally, I can start psql (or some other client) and execute the CREATE DATABASE command specifying the PG_ALTERNATE environment variable.
This all sounds a bit convoluted, and it is. The PostgreSQL developers consider it a security risk to allow users to create databases in arbitrary locations. Because the postmaster must be started by a PostgreSQL administrator, only an administrator can choose where databases can be created. So, to summarize the process:
Create a new environment variable and set it to the path where you want new databases to reside.
Initialize the new directory using the initlocation application.
Stop and restart the postmaster.
Now, you can use the environment variable with the LOCATION=path clause.
createdb
The CREATE DATABASE command creates a new database from within a PostgreSQL client application (such as psql). You can also create a new database from the operating system command line. The createdb command is a shell script that invokes psql for you and executes the CREATE DATABASE command for you. For more information about createdb, see the PostgreSQL Reference Manual or invoke createdb with the --help flag:
$ createdb --help createdb creates a PostgreSQL database. Usage: createdb [options] dbname [description] Options: -D, --location=PATH Alternative place to store the database -T, --template=TEMPLATE Template database to copy -E, --encoding=ENCODING Multibyte encoding for the database -h, --host=HOSTNAME Database server host -p, --port=PORT Database server port -U, --username=USERNAME Username to connect as -W, --password Prompt for password -e, --echo Show the query being sent to the backend -q, --quiet Don't write any messages By default, a database with the same name as the current user is created. Report bugs to <pgsql-bugs@postgresql.org>.
Dropping a Database
Getting rid of an old database is easy. The DROP DATABASE command will delete all of the data in a database and remove the database from the cluster.
For example:
movies=# CREATE DATABASE redshirt; CREATE DATABASE movies=# DROP DATABASE redshirt; DROP DATABASE
There are no options to the DROP DATABASE command; you simply include the name of the database that you want to remove. There are a few restrictions. First, you must own the database that you are trying to drop, or you must be a PostgreSQL superuser. Next, you cannot drop a database from within a transaction blockyou cannot roll back a DROP DATABASE command. Finally, the database must not be in use, even by you. This means that before you can drop a database, you must connect to a different database (template1 is a good candidate). An alternative to the DROP DATABASE command is the dropdb shell script. dropdb is simply a wrapper around the DROP DATABASE command; see the PostgreSQL Reference Manual for more information about dropdb.
Viewing Databases
Using psql, there are two ways to view the list of databases. First, you can ask psql to simply display the list of databases and then exit. The -l option does this for you:
$ psql -l List of databases Name | Owner -----------+--------------- template0 | postgres template1 | postgres movies | bruce (3 rows) $
From within psql, you can use the \l or \l+ meta-commands to display the databases within a cluster:
movies=# \l+ List of databases Name | Owner | Description -----------+---------------+--------------------------- template0 | postgres | template1 | postgres | Default template database movies | bruce | Virtual Video database (3 rows)