R - Parse HTML only if http status response is 200 -


i have dataframe urls list of urls want crawl obtain variable pagename defined in source code. purpose use following code:

# crawl page names for(n in 1:length(urls$url)) {   if (domain(urls$url[n])=="www.domain.com") {    doc = readlines(con = file(as.character(urls$url[n]), encoding = "utf-8"))   close(con)   rownumber = grep('s.pagename', doc)   datalines = grep(pagenamepattern,doc[rownumber],value=true)   gg = gregexpr(pagenamepattern,datalines)   matches = mapply(getexpr,datalines,gg)   matches = gsub(" ", "", matches[1], fixed = true)   result = gsub(pagenamepattern,'\\1',matches)   names(result) = null   urls$pagename[n] = stri_unescape_unicode(result[1])    } else {      urls$pagename[n] <- na    } } 

if (domain(urls$url[n])=="www.domain.com") uses function domain included in urltools package , let me crawl urls know pagename variable defined, in specific domain.

however, code interrupted if parsed page's http status response returns 4xx client error or 5xx server error.

i add second if code doing crawl if http status response of con 200 (ok). have idea on how or package or functions use?


Popular posts from this blog

Apache NiFi ExecuteScript: Groovy script to replace Json values via a mapping file -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -