ruby - Rails normalize csv file data -


i'm trying import tsv (tab separated data) file database it's not formatted properly. columns price , count separated space (with exception of header line) , values both placed price key, moving data wrong key value pairs.

tsv file:

purchaser name  item description    price   count   merchant address    merchant name alice bob   $10 off $20 of food 10.0 2   987 fake st     bob's pizza example name    $30 of awesome $10  10.0 5   456 unreal rd   tom's awesome shop name 3  $20 sneakers $5 5.0    1     123 fake st     sneaker store emporium john williams   $20 sneakers $5 5.0    4     123 fake st     sneaker store emporium  

in /models/purchase.rb:

class purchase < activerecord::base   # validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }    def self.import(file)     csv.foreach(file.path, :headers => true,                        :header_converters => lambda { |h| h.downcase.gsub(' ', '_')},                        :col_sep => "\t"                        ) |row|                       # debugger                       purchase_hash = row.to_hash       purchase.create!(purchase_hash)     end   end end 

if import file , comment in debugger in model , type row returns:

#<csv::row "purchaser_name":"alice bob" "item_description":"$10 off $20 of food" "price":"10.0 2" "count":" 987 fake st" "merchant_address":" bob's pizza" "merchant_name":nil>

row.inspect returns:

"#<csv::row \"purchaser_name\":\"alice bob\" \"item_description\":\"$10 off $20 of food\" \"price\":\"10.0 2\" \"count\":\" 987 fake st\" \"merchant_address\":\" bob's pizza\" \"merchant_name\":nil>"

as can see price (10.0) , count (2) have been squished same value because not tab separated in file.

db/schema.rb:

activerecord::schema.define(version: 20160601205154)    create_table "purchases", force: :cascade |t|     t.string   "purchaser_name"     t.string   "item_description"     t.string   "price"     t.string   "count"     t.string   "merchant_address"     t.string   "merchant_name"     t.datetime "created_at",       null: false     t.datetime "updated_at",       null: false   end  end 

i had price decimal datatype , count integer switched them string try find solution. can change them if (and prefer change them if possible)

the solution twofold. first, define converter split field 2 parts (and convert numbers in process) during parsing:

converter_split_price_count = lambda |value, info|   next value unless info.header == "price"   price, count = value.split   [ price.to_f, count.to_i ] end 

this turns price field array, e.g. "10.0 2" becomes [10.0, 2].

second, define method that, after parsing, fix misplaced value , return correct hash:

def row_to_hash_fixing_price_count(row)   row.headers.zip(row.fields.flatten).to_h end 

the above flattens price/count array parent array (the rest of row) , zips headers array. since there more fields headers nil @ end dropped.

you'll use them this:

csv_opts = {   headers: true,   col_sep: "\t",   header_converters: ->(h) { h.downcase.tr(" ", "_") },   converters: converter_split_price_count }  data_out = csv.new(data, csv_opts).map |row|   row_to_hash_fixing_price_count(row) end # => [ { "purchaser_name" => "alice bob", #        "item_description" => "$10 off $20 of food", #        "price" => 10.0, #        "count" => 2, #        "merchant_address" => "987 fake st", #        "merchant_name" => "bob's pizza" #      }, #      # ... #    ] 

you can see in action here: http://ideone.com/08wtpt

p.s. consider creating records in bulk instead of 1 @ time. given above purchase.create!(data_out) since create! accepts array of hashes.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)