R - Parse HTML only if http status response is 200 -
i have dataframe urls
list of urls want crawl obtain variable pagename
defined in source code. purpose use following code:
# crawl page names for(n in 1:length(urls$url)) { if (domain(urls$url[n])=="www.domain.com") { doc = readlines(con = file(as.character(urls$url[n]), encoding = "utf-8")) close(con) rownumber = grep('s.pagename', doc) datalines = grep(pagenamepattern,doc[rownumber],value=true) gg = gregexpr(pagenamepattern,datalines) matches = mapply(getexpr,datalines,gg) matches = gsub(" ", "", matches[1], fixed = true) result = gsub(pagenamepattern,'\\1',matches) names(result) = null urls$pagename[n] = stri_unescape_unicode(result[1]) } else { urls$pagename[n] <- na } }
if (domain(urls$url[n])=="www.domain.com")
uses function domain
included in urltools
package , let me crawl urls know pagename variable defined, in specific domain.
however, code interrupted if parsed page's http status response returns 4xx client error or 5xx server error.
i add second if code doing crawl if http status response of con
200 (ok). have idea on how or package or functions use?