Thursday, April 28, 2011

R + ggplot: plotting irregular time series

I have data at a number of days since an event. This data is sampled irregularly - my time points are like 0, 5, 6, 10, 104 days. I don't have specific date-time information - i.e. I have no idea when in real life the event I'm studying occurred.

I'd like to plot, using ggplot, my time series. I can use, say

p <- ggplot(data,aes(x=time,y=expression))
p <- p + geom_point()

but of course my x-axis variables are plotted next to each other, so that the distance between t=10 and t=104 is the same as t=5 and t=6. So I can make something up like

start <- ISOdate(2001, 1, 1, tz = "")
data$time <- start + data$time*60*60*12

which almost works, but now the ticks on my x-axis are horribly inaccurate date times. I could re-format them maybe? But can't see anyway to make the format "days from start". And by now I've been googling around for quite a while, with the nagging feeling that I'm missing something seriously obvious. Am I?

From stackoverflow
  • Not sure if this is what you're looking for (see this related question). You can reformat the axis and deal with irregularity by using the scale_x functions. For instance:

    p <- qplot(1:3, 1:3, geom='line') 
    p + scale_x_continuous("", breaks=1:3, 
            labels = as.Date(c("2010-06-03", "2010-06-04", "2010-06-07")))
    

    Incidentally, here's a function that I created for plotting multivariate zoo objects:

    qplot.zoo <- function(x) {
      if(all(class(x) != "zoo")) stop("x must be a zoo object")
      x.df <- data.frame(dates=index(x), coredata(x))
      x.df <- melt(x.df, id="dates", variable="value")
      ggplot(x.df, aes(x=dates, y=value, group=value, colour=value)) + geom_line() + opts(legend.position = "none")
    }
    
    Mike Dewar : Thanks! though it turned out that raw stupidity on my part was to blame. What's a zoo object?
    Shane : @Mike: `zoo` is probably the most popular irregularly-spaced time series class in R: http://cran.r-project.org/web/packages/zoo/index.html
    Mike Dewar : @Shane awesome. I haven't gone anywhere near R's time-series classes yet. I'm working on gene expression time series at the moment, which have maybe 6 or 7 time points at best! No need for dedicated classes (outside of Bioconductor's ExpressionSets) yet!
  • Sounds like your time variable is a factor or maybe a character vector, not a numeric value! If you do data$time <- as.numeric(data$time) it may well solve your problem.

    ggplot is pretty good at using the right sort of scale for the right sort of data. (Sadly, data import routines in R generally are less smart...)

    JoFrhwld : that should be `as.numeric(as.character(data$time))`, or `as.numeric(levels(data$time)[data$time])`. The help pages say the second is a little faster.
    Harlan : right, if it's a factor. If it's a character vector already, which could give a similar result, then you don't need the inner conversion.
    Mike Dewar : I'm sure that time starts off as a regular ol' numeric vector, then once I've added it to `start` its a datetime vector or something.
    Mike Dewar : Crap. You're right! It's a bloomin factor! I've spent far too long learning the `as.numeric(levels(data$time)[data$time])` idiom (and bleating about it on twitter) and just plain completely forgot to actually apply it to my freaking data. Thanks Harlan!
    Shane : @Mike: you know that you can check what an object is with the `class()` function?
    Jared : I had a similar problem and converted time to numeric, but once I did that the plot showed some like 415 instead of 6:45 or whatever. Any way to make the axis convert the values back to time/date?
    Mike Dewar : @Shane - yes thanks! This is just pure idiocy on my part!

0 comments:

Post a Comment