ruby - Rails normalize csv file data -
i'm trying import tsv (tab separated data) file database it's not formatted properly. columns price
, count
separated space (with exception of header line) , values both placed price
key, moving data wrong key value pairs.
tsv file:
purchaser name item description price count merchant address merchant name alice bob $10 off $20 of food 10.0 2 987 fake st bob's pizza example name $30 of awesome $10 10.0 5 456 unreal rd tom's awesome shop name 3 $20 sneakers $5 5.0 1 123 fake st sneaker store emporium john williams $20 sneakers $5 5.0 4 123 fake st sneaker store emporium
in /models/purchase.rb
:
class purchase < activerecord::base # validates :item_price, :numericality => { :greater_than_or_equal_to => 0 } def self.import(file) csv.foreach(file.path, :headers => true, :header_converters => lambda { |h| h.downcase.gsub(' ', '_')}, :col_sep => "\t" ) |row| # debugger purchase_hash = row.to_hash purchase.create!(purchase_hash) end end end
if import file , comment in debugger in model , type row
returns:
#<csv::row "purchaser_name":"alice bob" "item_description":"$10 off $20 of food" "price":"10.0 2" "count":" 987 fake st" "merchant_address":" bob's pizza" "merchant_name":nil>
row.inspect
returns:
"#<csv::row \"purchaser_name\":\"alice bob\" \"item_description\":\"$10 off $20 of food\" \"price\":\"10.0 2\" \"count\":\" 987 fake st\" \"merchant_address\":\" bob's pizza\" \"merchant_name\":nil>"
as can see price
(10.0) , count
(2) have been squished same value because not tab separated in file.
db/schema.rb
:
activerecord::schema.define(version: 20160601205154) create_table "purchases", force: :cascade |t| t.string "purchaser_name" t.string "item_description" t.string "price" t.string "count" t.string "merchant_address" t.string "merchant_name" t.datetime "created_at", null: false t.datetime "updated_at", null: false end end
i had price
decimal datatype , count
integer switched them string try find solution. can change them if (and prefer change them if possible)
the solution twofold. first, define converter split field 2 parts (and convert numbers in process) during parsing:
converter_split_price_count = lambda |value, info| next value unless info.header == "price" price, count = value.split [ price.to_f, count.to_i ] end
this turns price
field array, e.g. "10.0 2"
becomes [10.0, 2]
.
second, define method that, after parsing, fix misplaced value , return correct hash:
def row_to_hash_fixing_price_count(row) row.headers.zip(row.fields.flatten).to_h end
the above flattens price/count array parent array (the rest of row) , zips headers array. since there more fields headers nil
@ end dropped.
you'll use them this:
csv_opts = { headers: true, col_sep: "\t", header_converters: ->(h) { h.downcase.tr(" ", "_") }, converters: converter_split_price_count } data_out = csv.new(data, csv_opts).map |row| row_to_hash_fixing_price_count(row) end # => [ { "purchaser_name" => "alice bob", # "item_description" => "$10 off $20 of food", # "price" => 10.0, # "count" => 2, # "merchant_address" => "987 fake st", # "merchant_name" => "bob's pizza" # }, # # ... # ]
you can see in action here: http://ideone.com/08wtpt
p.s. consider creating records in bulk instead of 1 @ time. given above purchase.create!(data_out)
since create!
accepts array of hashes.