Wednesday, March 16, 2011

RoR: FasterCSV to hash

I'm really struggling with grasping how to effectively use FasterCSV to accomplish what I want.

I have a CSV file; say:

ID,day,site
test,tuesday,cnn.com
bozo,friday,fark.com
god,monday,xkcd.com
test,saturday,whatever.com

I what to go through this file and end up with a hash that has a counter for how many times the first column occurred. So:

["test" => 2, "bozo" => 1, "god" => 1]

I need to be able to do this without prior knowledge of the values in the first column.

?

From stackoverflow
  • I don't have the code in front of me, but I believe row.to_hash does that (where row is the FasterCSV::Row of the current record)

    row.headers should give you an array of the headers, incidentally. Check the docs for more: http://fastercsv.rubyforge.org/classes/FasterCSV/Row.html

    neezer : But wouldn't that merely translate all rows to hashes? That's not what I want: I want the hash to have counters for unique occurrences of row[0]. Any other thoughts?
  • Hum, would :

    File.open("file.csv").readlines[1..-1].inject({}) {|acc,line| word = line.split(/,/).first; acc[word] ||= 0; acc[word] += 1; acc}
    

    do ?

    [1..-1] because we don't want the header line with the column names

    then, for each line, get the first word, put 0 in the accumulator if it does not exist, increment it, return

    Eli : Trying to parse a CSV file by doing `split(/,/)` is the path to a world of hurt. There's a reason why the FasterCSV gem is more than one line.
    mat : Hum, yes, of course, replace the "File.open("file.csv").readlines[1..-1]" by the correct way of reading lines from FasterCSV, and "line.split(/,/).first" by the correct way of getting the first field :-)
  • Easy:

    h = Hash.new(0)
    FasterCSV.read("file.csv")[1..-1].each {|row| h[row[0]] += 1}
    

    Works the same with CSV.read, as well.

    mat : Any reason not to use inject ?
    glenn mcdonald : Mostly a question of taste, I think, but inject is also slower, which sometimes matters.
    glenn mcdonald : OK, I just ran a quick test on a 9000-line CSV file I had handy, using the four combinations of CSV/FasterCSV and each/inject. Timings: FasterCSV+each: 1.01s FasterCSV+inject: 1.18s CSV+each: 3.32s CSV+inject: 3.34s
    dasil003 : Or you could even do FasterCSV.foreach to shorten it a bit.
  • I'd use foreach, and treat nils with respect - or else I'd risk an "undefined nil.+ method" error...

    counter = {}
    FasterCSV.foreach("path_to_your_csv_file", :headers => :first_row) do |row|
      key=row[0]
      counter[key] = counter[key].nil? ? 1 : counter[key] + 1
    end
    

0 comments:

Post a Comment