xml - How to recognize data format

xml - How to recognize data format - scraping in R -

May 15, 2011

i trying use r data open data source in netherlands. source here.

when open in browser (at least chrome), presented xml code. thought can use rcurl package parse it, , use xpath extract specific nodes seek.

however, when trying parse it, run problems. not seem straight xml, has json in it.

how can extract information datasource? not looking full solution, guidance in right direction.

if try:

url <- "http://www.kiesbeter.nl/open-data/api/care/careproviders/?apikey=18a2b2b0-d232-4f48-8d10-5fc10ff04b17" html <- geturl(url) doc <- htmlparse(html,astext = true)

it seems doc in json format still. cannot seem use getnodeset(doc, "//careproviders"). however, if use fromjson first, in awkward list format.

so question how can treat data can information out of dataset (e.g. care providers). , how recognize format data in?

use

html <- geturl(url, httpheader = c(accept = "text/xml"))

with specified content-type xml curl.

a little clarification. service provides both xml , json data formats, default of json. browser sends text/xml (among others) in accept header request, service returns xml. curl (by default) doesn't send so, service returns json format, default type.

Search This Blog

Copy

xml - How to recognize data format - scraping in R -

Comments

Post a Comment

Popular posts from this blog

asp.net - redirect .aspx with query string to html page using htaccess -

matlab - Deleting rows with specific rules -

jquery - How would i go about shortening this code? And to cancel the previous click on click of new section? -