python - Correcting to the correct URL -
i have written simple script access json keywords needed used url.
below script have written:
import urllib2 import json f1 = open('catlist.text', 'r') f2 = open('sublist.text', 'w') lines = f1.read().splitlines() line in lines: url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100' json_obj = urllib2.urlopen(url) data = json.load(json_obj) item in data['query']: in data['query']['categorymembers']: print i['title'] print '-----------------------------------------' f2.write((i['title']).encode('utf8')+"\n")
in script, program first read catlist provides list of keywords used url.
here sample of catlist.text contains.
category:branches of geography category:geography place category:geography awards , competitions category:geography conferences category:geography education category:environmental studies category:exploration category:geocodes category:geographers category:geographical zones category:geopolitical corridors category:history of geography category:land systems category:landscape category:geography-related lists category:lists of countries geography category:navigation category:geography organizations category:places category:geographical regions category:surveying category:geographical technology category:geography terminology category:works geography category:geographic images category:geography stubs
my program keywords , placed in url.
however not able result.i have checked code printing url:
import urllib2 import json f1 = open('catlist.text', 'r') f2 = open('sublist2.text', 'w') lines = f1.read().splitlines() line in lines: url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+line+'&cmlimit=100' json_obj = urllib2.urlopen(url) data = json.load(json_obj) f2.write(url+'\n')
the result follows in sublist2:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches of geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography place&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography awards , competitions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography conferences&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography education&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:environmental studies&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:exploration&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geocodes&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographers&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical zones&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geopolitical corridors&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:history of geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:land systems&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:landscape&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography-related lists&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:lists of countries geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:navigation&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography organizations&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:places&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical regions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:surveying&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographical technology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography terminology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:works geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geographic images&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:geography stubs&cmlimit=100
it shows url placed correctly.
but when run full code not able correct result.
one thing notice when place in link address bar example:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches of geography&cmlimit=100
it gives correct result because address bar auto corrects :
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category:branches%20of%20geography&cmlimit=100
i believe if %20 added in place of empty space between word " category: branches of geography" , script able correct json items.
problem: not sure how modify statement in above code replace blank spaces contained in catlist %20.
please forgive me bad formatting , long post, still trying learn python.
thank helping me.
edit:
thank tim. solution works:
url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle='+urllib2.quote(line)+'&cmlimit=100'
it able print correct result:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3abranches%20of%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20by%20place&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20awards%20and%20competitions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20conferences&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20education&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aenvironmental%20studies&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aexploration&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageocodes&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographers&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20zones&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageopolitical%20corridors&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ahistory%20of%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aland%20systems&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3alandscape&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography-related%20lists&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3alists%20of%20countries%20by%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3anavigation&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20organizations&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aplaces&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20regions&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3asurveying&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographical%20technology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20terminology&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3aworks%20about%20geography&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageographic%20images&cmlimit=100 https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=category%3ageography%20stubs&cmlimit=100
use urllib.quote()
replace special characters in url:
python 2:
import urllib line = 'category:branches of geography' url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.quote(line) + '&cmlimit=100'
https://docs.python.org/2/library/urllib.html#urllib.quote
python 3:
import urllib.parse line = 'category:branches of geography' url ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=' + urllib.parse.quote(line) + '&cmlimit=100'
https://docs.python.org/3.5/library/urllib.parse.html#urllib.parse.quote