r - Scraping dataTable gets only header -
i'm trying salary data the feds data center. there 1537 entries read. thought i'd gotten table xpath chrome's inspect
. however, code returning header. love know i'm doing wrong.
library(rvest) url1 = 'http://www.fedsdatacenter.com/federal-pay-rates/index.php?n=&l=&a=consumer+financial+protection+bureau&o=&y=2016' read_html(url1) %>% html_nodes(xpath="//*[@id=\"example\"]") %>% html_table()
i (lonely) header:
[[1]] [1] name grade pay plan salary bonus agency location [8] occupation fy <0 rows> (or 0-length row.names)
my desired result data frame or data.table 1537 entries.
edit: here's relevant info chrome's inspect, header in thead
, data in tbody
tr
the site not expressly forbid scraping data. terms of use generic , taken main http://www.fedsmith.com/terms-of-use/ site (so appears boilerplate). aren't doing source free data adds value. agree should use source data http://www.opm.gov/data/index.aspx?tag=fedscope vs rely on site being around.
but…
it doesn't require using rselenium
.
library(httr) library(jsonlite) res <- get("http://www.fedsdatacenter.com/federal-pay-rates/output.php?n=&a=&l=&o=&y=&secho=2&icolumns=9&scolumns=&idisplaystart=0&idisplaylength=100&mdataprop_0=0&mdataprop_1=1&mdataprop_2=2&mdataprop_3=3&mdataprop_4=4&mdataprop_5=5&mdataprop_6=6&mdataprop_7=7&mdataprop_8=8&isortingcols=1&isortcol_0=0&ssortdir_0=asc&bsortable_0=true&bsortable_1=true&bsortable_2=true&bsortable_3=true&bsortable_4=true&bsortable_5=true&bsortable_6=true&bsortable_7=true&bsortable_8=true&_=1464831540857") dat <- fromjson(content(res, as="text"))
it makes xhr request data , it's paged. in event it's not obvious, can increment idisplaystart
100
page through results. made using curlconverter
package. dat
variable has itotaldisplayrecords
component tells total.
the entirety of browser developer tools friend , can avoid clunkiness & slowness & flakiness of browser instrumentation.