Parse currency into numbers in Python -
i learnt format numbers currency in python python module babel provides babel.numbers.format_currency
format numbers currency. instance,
from babel.numbers import format_currency s = format_currency(123456.789, 'usd', locale='en_us') # u'$123,456.79' s = format_currency(123456.789, 'eur', locale='fr_fr') # u'123\xa0456,79\xa0\u20ac'
how reverse, currency numbers, such $123,456,789.00
--> 123456789
? babel
provides babel.numbers.parse_number
parse local numbers, didn't found parse_currency
. so, ideal way parse local currency numbers?
i went through python: removing characters except digits string.
# way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits) s = '$123,456.79' n = s.translate(all, nodigs) # 12345679, lost `.` # way 2 import re n = re.sub("\d", "", s) # 12345679
it doesn't take care decimal separator .
.
remove non-numeric characters, except .
, string (refer here),
import re # way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s) # 123456.79 # way 2: non_decimal = re.compile(r'[^\d.]+') s = '$123,456.79' n = non_decimal.sub('', s) # 123456.79
it process decimal separator .
.
but above solutions don't work when coming to, instance,
from babel.numbers import format_currency s = format_currency(123456.789, 'eur', locale='fr_fr') # u'123\xa0456,79\xa0\u20ac' new_s = s.encode('utf-8') # 123 456,79 €
as can see, format of currency varies. what ideal way parse currency numbers in general way?
using babel
the babel documentation notes the number parsing not implemented yes have done lot of work currency info library. can use get_currency_name()
, get_currency_symbol()
currency details, , other get_...
functions normal number details (decimal point, minus sign, etc.).
using information can exclude currency string currency details (name, sign) , groupings (e.g. ,
in us). change decimal details ones used c
locale (-
minus, , .
decimal point).
this results in code (i added object keep of data, may come handy in further processing):
import re, os babel import numbers n babel.core import default_locale class amountinfo(object): def __init__(self, name, symbol, value): self.name = name self.symbol = symbol self.value = value def parse_currency(value, cur): decp = n.get_decimal_symbol() plus = n.get_plus_sign_symbol() minus = n.get_minus_sign_symbol() group = n.get_group_symbol() name = n.get_currency_name(cur) symbol = n.get_currency_symbol(cur) remove = [plus, name, symbol, group] token in remove: # remove pieces of information shall obvious value = re.sub(re.escape(token), '', value) # change minus sign locale=c minus value = re.sub(re.escape(minus), '-', value) # , change decimal mark locale=c decimal point value = re.sub(re.escape(decp), '.', value) # in case remove extraneous spaces value = re.sub('\s+', '', value) return amountinfo(name, symbol, value) #cur_loc = os.environ['lc_all'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'usd', locale=cur_loc), 'usd') , (n.format_currency(-123456.78, 'pln', locale=cur_loc), 'pln') , (n.format_currency(123456.789, 'pln', locale=cur_loc), 'pln') , (n.format_currency(123456.789, 'idr', locale=cur_loc), 'idr') , (n.format_currency(123456.789, 'jpy', locale=cur_loc), 'jpy') , (n.format_currency(-123456.78, 'jpy', locale=cur_loc), 'jpy') , (n.format_currency(123456.789, 'cny', locale=cur_loc), 'cny') , (n.format_currency(-123456.78, 'cny', locale=cur_loc), 'cny') ] v,c in test: print('as currency :', c, ':', v.encode('utf-8')) info = parse_currency(v, c) print('as value :', c, ':', info.value) print('extra info :', info.name.encode('utf-8') , info.symbol.encode('utf-8'))
the output looks promising (in locale):
$ export lc_all=en_us $ ./cur.py locale: en_us currency : usd : b'$123,456.79' value : usd : 123456.79 info : b'us dollar' b'$' currency : pln : b'-z\xc5\x82123,456.78' value : pln : -123456.78 info : b'polish zloty' b'z\xc5\x82' currency : pln : b'z\xc5\x82123,456.79' value : pln : 123456.79 info : b'polish zloty' b'z\xc5\x82' currency : idr : b'rp123,457' value : idr : 123457 info : b'indonesian rupiah' b'rp' currency : jpy : b'\xc2\xa5123,457' value : jpy : 123457 info : b'japanese yen' b'\xc2\xa5' currency : jpy : b'-\xc2\xa5123,457' value : jpy : -123457 info : b'japanese yen' b'\xc2\xa5' currency : cny : b'cn\xc2\xa5123,456.79' value : cny : 123456.79 info : b'chinese yuan' b'cn\xc2\xa5' currency : cny : b'-cn\xc2\xa5123,456.78' value : cny : -123456.78 info : b'chinese yuan' b'cn\xc2\xa5'
and still works in different locales (brazil notable using comma decimal mark):
$ export lc_all=pt_br $ ./cur.py locale: pt_br currency : usd : b'us$123.456,79' value : usd : 123456.79 info : b'd\xc3\xb3lar americano' b'us$' currency : pln : b'-pln123.456,78' value : pln : -123456.78 info : b'zloti polon\xc3\xaas' b'pln' currency : pln : b'pln123.456,79' value : pln : 123456.79 info : b'zloti polon\xc3\xaas' b'pln' currency : idr : b'idr123.457' value : idr : 123457 info : b'rupia indon\xc3\xa9sia' b'idr' currency : jpy : b'jp\xc2\xa5123.457' value : jpy : 123457 info : b'iene japon\xc3\xaas' b'jp\xc2\xa5' currency : jpy : b'-jp\xc2\xa5123.457' value : jpy : -123457 info : b'iene japon\xc3\xaas' b'jp\xc2\xa5' currency : cny : b'cn\xc2\xa5123.456,79' value : cny : 123456.79 info : b'yuan chin\xc3\xaas' b'cn\xc2\xa5' currency : cny : b'-cn\xc2\xa5123.456,78' value : cny : -123456.78 info : b'yuan chin\xc3\xaas' b'cn\xc2\xa5'
it worth point out babel
has encoding problems. because locale files (in locale-data
) use different encoding themselves. if you're working currencies you're familiar should not problem. if try unfamiliar currencies might run problems (i learned poland uses iso-8859-2
, not iso-8859-1
).