R - Parse HTML only if http status response is 200 -


i have dataframe urls list of urls want crawl obtain variable pagename defined in source code. purpose use following code:

# crawl page names for(n in 1:length(urls$url)) {   if (domain(urls$url[n])=="www.domain.com") {    doc = readlines(con = file(as.character(urls$url[n]), encoding = "utf-8"))   close(con)   rownumber = grep('s.pagename', doc)   datalines = grep(pagenamepattern,doc[rownumber],value=true)   gg = gregexpr(pagenamepattern,datalines)   matches = mapply(getexpr,datalines,gg)   matches = gsub(" ", "", matches[1], fixed = true)   result = gsub(pagenamepattern,'\\1',matches)   names(result) = null   urls$pagename[n] = stri_unescape_unicode(result[1])    } else {      urls$pagename[n] <- na    } } 

if (domain(urls$url[n])=="www.domain.com") uses function domain included in urltools package , let me crawl urls know pagename variable defined, in specific domain.

however, code interrupted if parsed page's http status response returns 4xx client error or 5xx server error.

i add second if code doing crawl if http status response of con 200 (ok). have idea on how or package or functions use?


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)

Google AdWords and AdSense - A Dynamic Small Business Marketing Duo