Monday, April 11, 2011

Storing GPS locations in a database varchar field

I'd be grateful for any advice that anyone has regarding:

How you you effectively store a gps (or any floating point number) in a varchar field that can be indexed.


Background:

We develop a content management system that can effectively store files of any type together with a collection of metadata. This file/metadata is stored as follows:

file_table              metadata_table
----------              --------------
file_id         ->      file_id (number)
file_name               metadata_id (number)
file_location           metadata_value (varchar)
...etc

I've been asked to provide support for geo-tagging files (ie. storing gps coordinates as metadata). Additionally, we'd also like to support files that have multiple geo-tags.

Now as far as I see I have a few options:

1) Store latitude and longitude within the same metadata_value varchar (eg. '52.4343242,-1.32324').

How would I query against this string? Is there anything clever I can do with sql that will allow me to query against "components" of a string? Could I store the coordinate as an xml string - would this help? How can this be effectively indexed?

2) Store latitude and longitude as separate rows in the *metadata_table*.

This solution fixes the problem of supporting easier querying (at the expense of complexity and unwieldiness, especially when I'll be storing multiple geo-tags per file), however I'm still faced with the problem of indexing.

I can convert the varchars to floating point when querying, however I'm not sure whether this will ignore the index I have on *metadata_table.metadata_value* and perform table-scans instead.

3) Create dedicated floating point fields to store gps data.

This is the least desirable option since it goes against the grain of the design to add database fields for a specific metadata. Not all files will store gps data.

Any help or advise appreciated.

From stackoverflow
  • To option 1, I can say: Use Gps eXchange Format (GPX). It is the standard way to save GPS points. There are options to mark waypoints, tracks and point of interests.
    Nontheless, it's not easy to query.

  • Could this be any help: http://postgis.refractions.net

  • Although you've tagged this with Oracle, I figured this would be useful for anyone using MySQL: use the spatial extensions to store location data.

  • You can use Oracle locator. The free subset of Oracle Spatial to do all kind of different geographical manipulations and indexing of spatial data: http://www.oracle.com/technology/products/spatial/index.html

    With the use of column type mdsys.sdo_geometry you can store points, clouds of points, lines, polygons and 3D things in the database.

  • EDIT: see comments for where this falls short.

    To answer your base question, ignoring any of the reasoning behind it, you could used function-based indexes. If you go with your option #2, this should be straight-forward.

    If you stick with #1, you'll just have to add some instr/substr voodoo; for example:

    select 
        to_number(
          substr(
              '52.4343242,-1.32324'
            , 1
            , instr( '52.4343242,-1.32324', ',' ) - 1
          )
        ) as lattitude
      , to_number(
          substr(
              '52.4343242,-1.32324'
            , instr( '52.4343242,-1.32324', ',' ) + 1
          )
        ) as longitude
    from dual;
    

    So you'd do something like:

    create index lat_long_idx on metadata_table ( 
        to_number(
          substr(
              metadata_value
            , 1
            , instr( metadata_value, ',' ) - 1
          )
        )
      , to_number(
          substr(
              metadata_value
            , instr( metadata_value, ',' ) + 1
          )
        )
    );
    
    tuinstoel : to_number(instr... will produce errors when there is no latitude and longitude stored in the metadata_value. Remember that this field is used in a generic way so it should be able to store different data too. I think it is almost impossible to index latitude and longitude when stored generically.
    Alkini : Good point. It also probably means that just doing the substr part of the index isn't valid either. Perhaps creating a materialized view of just the lat/long data and indexing that is an option.
  • Using dedicated floating point fields or columns of type mdsys.sdo_geometry are the best way to store this data. If a file doesn't have GPS data those fields will be empty but why should that be a problem? If a file could have more than one point associated use a detail table.

    Options 1 and 2 are a 'generic' solution. Generic database solutions are slow because they are more difficult to index and collecting statistics becomes harder, so life becomes more difficult for the query optimizer.

    Also reporting for collecting management information with tools like Cognos (business intelligence) over a generic solution is harder for your users.

    Store dates in a date field, numbers in a number field and geographical information in a geographical field (mdsys.sdo_geometry).

    Here it is explained why storing a date like '20031603' in a number field slows things down: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:77598210939534 .

  • Generally, if I'm having a one-size-fits-all table (and I'm not arguing they're not useful), I tend to allow a range of datatypes for storage, and enfore the types. E.g.

    CREATE TABLE MetaDataType (
      MetaDataID int IDENTITY(1,1) not null,
      MetaDataType varchar(10) not null,
      constraint PK_MetaDataType PRIMARY KEY (MetaDataID),
      constraint UQ_MetaDataType_TypeCheck UNIQUE (MetaDataID,MetaDataType),
      constraint CK_MetaDataType CHECK (MetaDataType in ('INT','CHAR','FLOAT'))
    )
    

    And then the meta data table would look like:

    CREATE TABLE MetaData (
      FileID int not null,
      MetaDataID int not null,
      MetaDataType varchar(10) not null,
      IntValue int null,
      CharValue varchar(max) null,
      FloatValue float null,
      constraint PK_MetaData PRIMARY KEY (FileID,MetaDataID),
      constraint FK_MetaData_Files FORIEGN KEY (FileID) references /* File table */,
      constraint FK_MetaData_Types FOREIGN KEY (MetaDataID,MetaDataType) references MetaDataTypes (MetaDataID,MetaDataType),
      constraint CK_MetaData_ValidTypes ((MetaDataType = 'INT' or IntValue is null) and (MetaDataType = 'CHAR' or CharValue is null) and (MetaDataType = 'FLOAT' or FloatValue is null))
    )
    

    The whole point being that 1) You store for each Meta data item the expected type, and 2) You enforce that in the MetaData table.

    Damien_The_Unbeliever : Forgot to say, the above design allows nulls to be stored. If you want to prevent that, add an extra CHECK constraint to ensure at least one of the *Value columns is not null

0 comments:

Post a Comment