I'm really struggling with grasping how to effectively use FasterCSV to accomplish what I want.
I have a CSV file; say:
ID,day,site
test,tuesday,cnn.com
bozo,friday,fark.com
god,monday,xkcd.com
test,saturday,whatever.com
I what to go through this file and end up with a hash that has a counter for how many times the first column occurred. So:
["test" => 2, "bozo" => 1, "god" => 1]
I need to be able to do this without prior knowledge of the values in the first column.
?
-
I don't have the code in front of me, but I believe
row.to_hash
does that (whererow
is theFasterCSV::Row
of the current record)row.headers
should give you an array of the headers, incidentally. Check the docs for more: http://fastercsv.rubyforge.org/classes/FasterCSV/Row.htmlneezer : But wouldn't that merely translate all rows to hashes? That's not what I want: I want the hash to have counters for unique occurrences of row[0]. Any other thoughts? -
Hum, would :
File.open("file.csv").readlines[1..-1].inject({}) {|acc,line| word = line.split(/,/).first; acc[word] ||= 0; acc[word] += 1; acc}
do ?
[1..-1] because we don't want the header line with the column names
then, for each line, get the first word, put 0 in the accumulator if it does not exist, increment it, return
Eli : Trying to parse a CSV file by doing `split(/,/)` is the path to a world of hurt. There's a reason why the FasterCSV gem is more than one line.mat : Hum, yes, of course, replace the "File.open("file.csv").readlines[1..-1]" by the correct way of reading lines from FasterCSV, and "line.split(/,/).first" by the correct way of getting the first field :-) -
Easy:
h = Hash.new(0) FasterCSV.read("file.csv")[1..-1].each {|row| h[row[0]] += 1}
Works the same with CSV.read, as well.
mat : Any reason not to use inject ?glenn mcdonald : Mostly a question of taste, I think, but inject is also slower, which sometimes matters.glenn mcdonald : OK, I just ran a quick test on a 9000-line CSV file I had handy, using the four combinations of CSV/FasterCSV and each/inject. Timings: FasterCSV+each: 1.01s FasterCSV+inject: 1.18s CSV+each: 3.32s CSV+inject: 3.34sdasil003 : Or you could even do FasterCSV.foreach to shorten it a bit. -
I'd use foreach, and treat nils with respect - or else I'd risk an "undefined nil.+ method" error...
counter = {} FasterCSV.foreach("path_to_your_csv_file", :headers => :first_row) do |row| key=row[0] counter[key] = counter[key].nil? ? 1 : counter[key] + 1 end
0 comments:
Post a Comment