python - Correcting to the correct URL -


i have written simple script access json keywords needed used url.

below script have written:

import urllib2 import json  f1 = open('catlist.text', 'r') f2 = open('sublist.text', 'w') lines = f1.read().splitlines()    line in lines:      url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'     json_obj = urllib2.urlopen(url)     data = json.load(json_obj)     item in data['query']:             in data['query']['categorymembers']:                 print i['title']                 print '-----------------------------------------'                 f2.write((i['title']).encode('utf8')+"\n") 

in script, program first read catlist provides list of keywords used url.

here sample of catlist.text contains.

category:branches of geography category:geography place category:geography awards , competitions category:geography conferences category:geography education category:environmental studies category:exploration category:geocodes category:geographers category:geographical zones category:geopolitical corridors category:history of geography category:land systems category:landscape category:geography-related lists category:lists of countries geography category:navigation category:geography organizations category:places category:geographical regions category:surveying category:geographical technology category:geography terminology category:works geography category:geographic images category:geography stubs 

my program keywords , placed in url.

however not able result.i have checked code printing url:

import urllib2 import json  f1 = open('catlist.text', 'r') f2 = open('sublist2.text', 'w') lines = f1.read().splitlines()    line in lines:      url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100'     json_obj = urllib2.urlopen(url)     data = json.load(json_obj)       f2.write(url+'\n') 

the result follows in sublist2:

https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches of geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography place&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography awards , competitions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography conferences&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography education&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:environmental studies&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:exploration&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geocodes&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographers&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical zones&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geopolitical corridors&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:history of geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:land systems&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:landscape&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography-related lists&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:lists of countries geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:navigation&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography organizations&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:places&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical regions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:surveying&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical technology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography terminology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:works geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographic images&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography stubs&cmlimit=100 

it shows url placed correctly.

but when run full code not able correct result.

one thing notice when place in link address bar example:

https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches of geography&cmlimit=100 

it gives correct result because address bar auto corrects :

https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches%20of%20geography&cmlimit=100 

i believe if %20 added in place of empty space between word " category: branches of geography" , script able correct json items.

problem: not sure how modify statement in above code replace blank spaces contained in catlist %20.

please forgive me bad formatting , long post, still trying learn python.

thank helping me.

edit:

thank tim. solution works:

 url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+urllib2.quote(line)+'&cmlimit=100' 

it able print correct result:

https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3abranches%20of%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20by%20place&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20awards%20and%20competitions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20conferences&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20education&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aenvironmental%20studies&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aexploration&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageocodes&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographers&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20zones&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageopolitical%20corridors&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ahistory%20of%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aland%20systems&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3alandscape&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography-related%20lists&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3alists%20of%20countries%20by%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3anavigation&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20organizations&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aplaces&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20regions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3asurveying&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20technology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20terminology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aworks%20about%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographic%20images&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20stubs&cmlimit=100 

use urllib.quote() replace special characters in url:

python 2:

import urllib line = 'category:branches of geography' url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.quote(line) + '&cmlimit=100' 

https://docs.python.org/2/library/urllib.html#urllib.quote

python 3:

import urllib.parse line = 'category:branches of geography' url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.parse.quote(line) + '&cmlimit=100' 

https://docs.python.org/3.5/library/urllib.parse.html#urllib.parse.quote


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)