Code Answer: Naming of ID columns in database tables

I was wondering peoples opinions on the naming of ID columns in database tables.

If I have a table called Invoices with a primary key of an identity column I would call that column InvoiceID so that I would not conflict with other tables and it's obvious what it is.

Where I am workind current they have called all ID columns ID.

So they would do the following:

Select  
    i.ID 
,   il.ID 
From
    Invoices i
    Left Join InvoiceLines il
        on i.ID = il.InvoiceID

Now, I see a few problems here:
1. You would need to alias the columns on the select
2. ID = InvoiceID does not fit in my brain
3. If you did not alias the tables and referred to InvoiceID is it obvious what table it is on?

What are other peoples thoughts on the topic?

From stackoverflow Arry

We use InvoiceID, not ID. It makes queries more readable -- when you see ID alone it could mean anything, especially when you alias the table to i.

From Jason Cohen
I do hate the plain id name. I strongly prefer to always use the invoice_id or a variant thereof. I always know which table is the authoritative table for the id when I need to, but this confuses me
```
SELECT * from Invoice inv, InvoiceLine inv_l where 
inv_l.InvoiceID = inv.ID 
SELECT * from Invoice inv, InvoiceLine inv_l where 
inv_l.ID = inv.InvoiceLineID 
SELECT * from Invoice inv, InvoiceLine inv_l where 
inv_l.ID = inv.InvoiceID 
SELECT * from Invoice inv, InvoiceLine inv_l where 
inv_l.InvoiceLineID = inv.ID
```
What's worst of all is the mix you mention, totally confusing. I've had to work with a database where almost always it was foo_id except in one of the most used ids. That was total hell.

Kevin : I have read the word "invoice" too many times in this post. It looks funny now

From Vinko Vrsalovic
For the sake of simplicity most people name the column on the table ID. If it has a foreign key reference on another table, then they explicity call it InvoiceID (to use your example) in the case of joins, you are aliasing the table anyway so the explicit inv.ID is still simpler than inv.InvoiceID

From Mike Brown
I definitely agree with including the table name in the ID field name, for exactly the reasons you give. Generally, this is the only field where I would include the table name.

From DOK
You could use the following naming convention. It has its flaws but it solves your particular problems.
1. Use short (3-4 characters) nicknames for the table names, i.e. Invoice - inv, InvoiceLines - invl
2. Name the columns in the table using those nicknames, i.e. inv_id, invl_id
3. For the reference columns use invl_inv_id for the names.
this way you could say
```
SELECT * FROM Invoice LEFT JOIN InvoiceLines ON inv_id = invl_inv_id
```
James Curran : ick! I'd vote against using short nicknames for tables (or any other object). With nicknames, you'll never know exactly what the short name is. Remember, there are many way to spell it wrong; there's only one way to spell it right.

Kevin : James, I disagree. If you have a short name that isn't descriptive and you can't remember what it is, then you've picked the wrong name or don't understand the naming convention that someone else chose.

kristian : I've worked on databases structured like this and, once you get past the ugliness, it's a pretty good system.

David B : Nicknames, maybe. Underscores? Just say no.

Justice : Use aliases to achieve the same effect. select * from Invoice inv left join InvoiceLines invl on inv.ID = invl.InvoiceID

Adrian : NO no no no. alias the table with a abrievation in the query. But the table name should be a full.

mP : Why are so many programmers lazy and seem the answer to everything is to type the least possible just because its too hard to type a bit more.

From Ilya Kochetov
It's not really important, you are likely to run into simalar problems in all naming conventions.

But it is important to be consistent so you don't have to look at the table definitions every time you write a query.

From Nir
For the column name in the database, I'd use "InvoiceID".

If I copy the fields into a unnamed struct via LINQ, I may name it "ID" there, if it's the only ID in the structure.

If the column is NOT going to be used in a foreign key, so that it's only used to uniquely identify a row for edit editing or deletion, I'll name it "PK".

From James Curran
I always prefered ID to TableName + ID for the id column and then TableName + ID for a foreign key. That way all tables have a the same name for the id field and there isn't a redundant description. This seems simpler to me because all the tables have the same primary key field name.

As far as joining tables and not knowing which Id field belongs to which table, in my opinion the query should be written to handle this situation. Where I work, we always prefece the fields we use in a statement with the table/table alias.

From Kevin
I always try to make id columns descriptive of what they are an ID for usually tablename_ID for a primary key. Then where possible I always use the same id name anywhere that it is a foreign key. The only exceptions are where you might have two fields in a subordinate table that relate to different records in the main table. Then I would add another qualifier to the field to indicate the difference, so if in my Meeting table I have the sales rep and the speaker and both their ids would come from the person table, I would have Rep_Person_ID and Speaker_Person_ID. Under no circumstances would I ever use just ID. That is way too confusing when you are trying to maintain old code.

The legacy database I work with did not do this and it's a pain to write queries because you are always trying to remember what the field is called in each table as there are at least 7 or 8 differnt names that relate to the id in the main table. And at least 5 differnt versions of another common id field. Naming consistency is important for ease of maintenance.

From HLGEM
If you give each key a unique name, e.g. "invoices.invoice_id" instead of "invoices.id", then you can use the "natural join" and "using" operators with no worries. E.g.
```
SELECT * FROM invoices NATURAL JOIN invoice_lines
SELECT * FROM invoices JOIN invoice_lines USING (invoice_id)
```
instead of
```
SELECT * from invoices JOIN invoice_lines
    ON invoices.id = invoice_lines.invoice_id
```
SQL is verbose enough without making it more verbose.

Arry : Do you know if SQL Server supports natural join?

Steven Huwig : I don't think that it does. According to http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=153679 it appears that the syntax is slated to be added in some version after SQL Server 2005. I know it works in PostgreSQL and in Oracle.

dland : heh, that sounds like the sound of real-life experience :)

Steven Huwig : I would only use natural join for ad hoc queries.

Arry : SQL Server 2008 still does not have natural join.

From Steven Huwig
I think you can use anything for the "ID" as long as you're consistent. Including the table name is important to. I would suggest using a modeling tool like Erwin to enforce the naming conventions and standards so when writing queries it's easy to understand the relationships that may exist between tables.

What I mean by the first statement is, instead of ID you can use something else like 'recno'. So then this table would have a PK of invoice_recno and so on.

Cheers, Ben

From $pageUsers[$post.author].name
Theres been a nerd fight about this very thing in my company of late. The advent of LINQ has made the redundant tablename+ID pattern even more obviously silly in my eyes. I think most reasonable people will say that if you're hand writing your SQL in such a manner as that you have to specify table names to differentiate FKs then it's not only a savings on typing but it adds clarity to your SQL to use just the ID in that you can clearly see which is the PK and which is the FK.

ie. LEFT JOIN Customers ON Employee.ID = Customer.EmployeeID

tells me not only that the two are linked but which is the PK and which is the FK

whereas in the old style you're forced to either look or hope that they were named well.

From Echostorm
My vote is for InvoiceID for the table ID. I also use the same naming convention when it's used as a foreign key and use intelligent alias names in the queries.
```
 Select Invoice.InvoiceID, Lines.InvoiceLine, Customer.OrgName
 From Invoices Invoice
 Join InvoiceLines Lines on Lines.InvoiceID = Invoice.InvoiceID
 Join Customers Customer on Customer.CustomerID = Invoice.CustomerID
```
Sure, it's longer than some other examples. But smile. This is for posterity and someday, some poor junior coder is going to have to alter your masterpiece. In this example there is no ambiguity and as additional tables get added to the query, you'll be grateful for the verbosity.

From Rob Allen
Coming at this from the perspective of a formal data dictionary, I would name the data element invoice_ID. Generally, a data element name will be unique in the data dictionary and ideally will have the same name throughout, though sometimes additional qualifying terms may be required based on context e.g. the data element named employee_ID could be used twice in the org chart and therefore qualified as supervisor_employee_ID and subordinate_employee_ID respectively.

Obviously, naming conventions are subjective and a matter of style. I've find ISO/IEC 11179 guidelines to be a useful starting point.

For the DBMS, I see tables as collections of entites (except those that only ever contain one row e.g. cofig table, table of constants, etc) e.g. the table where my employee_ID is the key would be named Personnel. So straight away the TableNameID convention doesn't work for me.

I've seen the TableName.ID=PK TableNameID=FK style used on large data models and have to say I find it slightly confusing: I much prefer an identifier's name be the same throughout i.e. does not change name based on which table it happens to appear in. Something to note is the aforementioned style seems to be used in the shops which add an IDENTITY (auto-increment) column to every table while shunning natural and compound keys in foreign keys. Those shops tend not to have formal data dictionaries nor build from data models. Again, this is merely a question of style and one to which I don't personally subscribe. So ultimately, it's not for me.

All that said, I can see a case for sometimes dropping the qualifier from the column name when the table's name provides a context for doing so e.g. the element named employee_last_name may become simply last_name in the Personnel table. The rationale here is that the domain is 'people's last names' and is more likely to be UNIONed with last_name columns from other tables rather than be used as a foreign key in another table, but then again... I might just change my mind, sometimes you can never tell. That's the thing: data modelling is part art, part science.

From onedaywhen
I prefer DomainName || 'ID'. (i.e. DomainName + ID)

DomainName is often, but not always, the same as TableName.

The problem with ID all by itself is that it doesn't scale upwards. Once you have about 200 tables, each with a first column named ID, the data begins to look all alike. If you always qualify ID with the table name, that helps a little, but not that much.

DomainName & ID can be used to name foreign keys as well as primary keys. When foriegn keys are named after the column that they reference, that can be of mnemonic assistance. Formally, tying the name of a foreign key to the key it references is not necessary, since the referential integrity constrain will establish the reference. But it's awfully handy when it comes to reading queries and updates.

Occasionally, DomainName || 'ID' can't be used, because there would be two columns in the same table with the same name. Example: Employees.EmployeeID and Employees.SupervisorID. In those cases, I use RoleName || 'ID', as in the example.

Last but not least, I use natural keys rather than synthetic keys when possible. There are situations where natural keys are unavailable or untrustworthy, but there are plenty of situations where the natural key is the right choice. In those cases, I let the natural key take on the name it would naturally have. This name often doesn't even have the letters, 'ID' in it. Example: OrderNo where No is an abbreviation for "Number".

From Walter Mitty
For each table I choose a tree letter shorthand(e.g. Employees => Emp)

That way a numeric autonumber primary key becomes nkEmp.

It is short, unique in the entire database and I know exactly its properties at a glance.

I keep the same names in SQL and all languages I use (mostly C#, Javascript, VB6).

From pkario
See the Interakt site's naming conventions for a well thought out system of naming tables and columns. The method makes use of a suffix for each table (_prd for a product table, or _ctg for a category table) and appends that to each column in a given table. So the identity column for the products table would be id_prd and is therefore unique in the database.

They go one step further to help with understanding the foreign keys: The foreign key in the product table that refers to the category table would be idctg_prd so that it is obvious to which table it belong (_prd suffix) and to which table it refers (category).

Advantages are that there is no ambiguity with the identity columns in different tables, and that you can tell at a glance which columns a query is referring to by the column names.

From flamingLogos
What I do to keep things consistent for myself (where a table has a single column primary key used as the ID) is to name the primary key of the table Table_pk. Anywhere I have a foreign key pointing to that tables primary key, I call the column PrimaryKeyTable_fk. That way I know that if I have a Customer_pk in my Customer table and a Customer_fk in my Order table, I know that the Order table is referring to an entry in the Customer table.

To me, this makes sense especially for joins where I think it reads easier.
```
SELECT * 
FROM Customer AS c
    INNER JOIN Order AS c ON c.Customer_pk = o.Customer_fk
```
From JettGeek
FWIW, our new standard (which changes, uh, I mean "evolves", with every new project) is:
- Lower case database field names
- Uppercase table names
- Use underscores to separate words in the field name - convert these to Pascal case in code.
- pk_ prefix means primary key
- _id suffix means an integer, auto-increment ID
- fk_ prefix means foreign key (no suffix necessary)
- _VW suffix for views
- is_ prefix for booleans
So, a table named NAMES might have the fields pk_name_id, first_name, last_name, is_alive, and fk_company and a view called LIVING_CUSTOMERS_VW, defined like:
```
SELECT first_name, last_name
FROM CONTACT.NAMES
WHERE (is_alive = 'True')
```
As others have said, though, just about any scheme will work as long as it is consistent and doesn't unnecessarily obfuscate your meanings.

From CMPalmer

Code Answer

Saturday, February 12, 2011

Naming of ID columns in database tables

0 comments:

Post a Comment

Blog Archive