Chapter 3 Data Transformation
Since the dataset is stored in a neat .csv
file, we simply read the file using the read.csv
method and store the data in the data frame job
. Later, when we need some specific attributes of the job postings dataset, we can use simply $
or pipes and the %>%
operator to extract the columns we need. In the dataset we find there is some duplicated recordings, so we removed them first.
In order to plot a interactive map, we need the data of coordinates. However, we don’t have those coordinates in our dataset. Under this circumstance, we deployed the google API which is so-called geocode()
. After we successfully scraped the longitudes and latitudes of addresses of different job locations, we were able to plot the points on the map of NYC accurately. What’s more, because the google API could not find some specific locations and returned NA
, we had to drop those rows with NA
values in longitudes and latitudes.