Parse currency into numbers in Python -


i learnt format numbers currency in python python module babel provides babel.numbers.format_currency format numbers currency. instance,

from babel.numbers import format_currency  s = format_currency(123456.789, 'usd', locale='en_us')  # u'$123,456.79' s = format_currency(123456.789, 'eur', locale='fr_fr')  # u'123\xa0456,79\xa0\u20ac' 

how reverse, currency numbers, such $123,456,789.00 --> 123456789? babel provides babel.numbers.parse_number parse local numbers, didn't found parse_currency. so, ideal way parse local currency numbers?


i went through python: removing characters except digits string.

# way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits)  s = '$123,456.79' n = s.translate(all, nodigs)    # 12345679, lost `.`  # way 2 import re n = re.sub("\d", "", s)         # 12345679 

it doesn't take care decimal separator ..


remove non-numeric characters, except ., string (refer here),

import re  # way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s)   # 123456.79  # way 2: non_decimal = re.compile(r'[^\d.]+') s = '$123,456.79' n = non_decimal.sub('', s)      # 123456.79 

it process decimal separator ..


but above solutions don't work when coming to, instance,

from babel.numbers import format_currency s = format_currency(123456.789, 'eur', locale='fr_fr')  # u'123\xa0456,79\xa0\u20ac' new_s = s.encode('utf-8') # 123 456,79 € 

as can see, format of currency varies. what ideal way parse currency numbers in general way?

using babel

the babel documentation notes the number parsing not implemented yes have done lot of work currency info library. can use get_currency_name() , get_currency_symbol() currency details, , other get_... functions normal number details (decimal point, minus sign, etc.).

using information can exclude currency string currency details (name, sign) , groupings (e.g. , in us). change decimal details ones used c locale (- minus, , . decimal point).

this results in code (i added object keep of data, may come handy in further processing):

import re, os babel import numbers n babel.core import default_locale  class amountinfo(object):     def __init__(self, name, symbol, value):         self.name = name         self.symbol = symbol         self.value = value  def parse_currency(value, cur):     decp = n.get_decimal_symbol()     plus = n.get_plus_sign_symbol()     minus = n.get_minus_sign_symbol()     group = n.get_group_symbol()     name = n.get_currency_name(cur)     symbol = n.get_currency_symbol(cur)     remove = [plus, name, symbol, group]     token in remove:         # remove pieces of information shall obvious         value = re.sub(re.escape(token), '', value)     # change minus sign locale=c minus     value = re.sub(re.escape(minus), '-', value)     # , change decimal mark locale=c decimal point     value = re.sub(re.escape(decp), '.', value)     # in case remove extraneous spaces     value = re.sub('\s+', '', value)     return amountinfo(name, symbol, value)  #cur_loc = os.environ['lc_all'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'usd', locale=cur_loc), 'usd')        , (n.format_currency(-123456.78, 'pln', locale=cur_loc), 'pln')        , (n.format_currency(123456.789, 'pln', locale=cur_loc), 'pln')        , (n.format_currency(123456.789, 'idr', locale=cur_loc), 'idr')        , (n.format_currency(123456.789, 'jpy', locale=cur_loc), 'jpy')        , (n.format_currency(-123456.78, 'jpy', locale=cur_loc), 'jpy')        , (n.format_currency(123456.789, 'cny', locale=cur_loc), 'cny')        , (n.format_currency(-123456.78, 'cny', locale=cur_loc), 'cny')        ]  v,c in test:     print('as currency :', c, ':', v.encode('utf-8'))     info = parse_currency(v, c)     print('as value    :', c, ':', info.value)     print('extra info  :', info.name.encode('utf-8')                          , info.symbol.encode('utf-8')) 

the output looks promising (in locale):

$ export lc_all=en_us $ ./cur.py locale: en_us currency : usd : b'$123,456.79' value    : usd : 123456.79 info  : b'us dollar' b'$' currency : pln : b'-z\xc5\x82123,456.78' value    : pln : -123456.78 info  : b'polish zloty' b'z\xc5\x82' currency : pln : b'z\xc5\x82123,456.79' value    : pln : 123456.79 info  : b'polish zloty' b'z\xc5\x82' currency : idr : b'rp123,457' value    : idr : 123457 info  : b'indonesian rupiah' b'rp' currency : jpy : b'\xc2\xa5123,457' value    : jpy : 123457 info  : b'japanese yen' b'\xc2\xa5' currency : jpy : b'-\xc2\xa5123,457' value    : jpy : -123457 info  : b'japanese yen' b'\xc2\xa5' currency : cny : b'cn\xc2\xa5123,456.79' value    : cny : 123456.79 info  : b'chinese yuan' b'cn\xc2\xa5' currency : cny : b'-cn\xc2\xa5123,456.78' value    : cny : -123456.78 info  : b'chinese yuan' b'cn\xc2\xa5' 

and still works in different locales (brazil notable using comma decimal mark):

$ export lc_all=pt_br $ ./cur.py  locale: pt_br currency : usd : b'us$123.456,79' value    : usd : 123456.79 info  : b'd\xc3\xb3lar americano' b'us$' currency : pln : b'-pln123.456,78' value    : pln : -123456.78 info  : b'zloti polon\xc3\xaas' b'pln' currency : pln : b'pln123.456,79' value    : pln : 123456.79 info  : b'zloti polon\xc3\xaas' b'pln' currency : idr : b'idr123.457' value    : idr : 123457 info  : b'rupia indon\xc3\xa9sia' b'idr' currency : jpy : b'jp\xc2\xa5123.457' value    : jpy : 123457 info  : b'iene japon\xc3\xaas' b'jp\xc2\xa5' currency : jpy : b'-jp\xc2\xa5123.457' value    : jpy : -123457 info  : b'iene japon\xc3\xaas' b'jp\xc2\xa5' currency : cny : b'cn\xc2\xa5123.456,79' value    : cny : 123456.79 info  : b'yuan chin\xc3\xaas' b'cn\xc2\xa5' currency : cny : b'-cn\xc2\xa5123.456,78' value    : cny : -123456.78 info  : b'yuan chin\xc3\xaas' b'cn\xc2\xa5' 

it worth point out babel has encoding problems. because locale files (in locale-data) use different encoding themselves. if you're working currencies you're familiar should not problem. if try unfamiliar currencies might run problems (i learned poland uses iso-8859-2, not iso-8859-1).


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo