Chapter 5 Results

5.1 Job Count by Category

Since we want to study the total number of jobs for each job category and one particular job could belong to multiple categories, we extract all the categories related to a job, seperate them, and create a new data frame called popular_category, which stores the counts of different job categories. Then, in order to visualize the numbers of job postings among different categories, we draw a descending horizontal bar chart based on his new data frame.

From the graphs below, we can tell that Architecture and Engineering have the most job postings, while Procurement Policy and Social Services have the fewest.

5.2 Distributions of Salaries by Payroll Types

We also want to study the distributions of salaries among different types of payroll. Since there are three payroll types in our data set, which are Annual, Daily and Hourly, we will draw three histograms to visualize the distributions. We take the mean of Salary Range From and Salary Range To as our salary for the histogram at the x-axis.

From these three plots above, we have the following obeservations: 1. For most of the jobs, the salaries are given annually. There are also some jobs which have hourly salaries. Only a few of those jobs have daily salaries. 2. For salaries calculated annually, it has approximately right-skewed normal distribution, which means that most jobs do not have a relatively high salaries. 3. For salaries calculated daily, there is no specific pattern regarding the distribution. Some jobs have relatively low daily salaries, while others have much higher salaries. 4. For salaries calculated hourly, most of them has a relatively low value, but there are still some jobs have relatively high hourly salaries.

Then, We also look into our data and find out more information about our salary distribution. For insace, for houly paied jobs, Stationary Engineer and City Medical Specialist have extremly high hourly salaries, while College Aide has low hourly salaries.

5.3 Distribution of Salaries by Categories

This boxplot gives us a general idea of the salary distribution for different kinds of jobs from the highest to the lowest mean salaries. For instance, we can see that jobs of Building Operations & Maintenance, in general, have lower salaries than those of Information Technology & Telecommunications. We can also see a general pattern that the higher the Annual Salary, the wider the range of the Salaries.

5.4 Job Postings Count Trend In One Year

The plot above shows us the change in the number of postings of popular categories from January to December. As we can see, most of the job postings are posted between August to December, which makes sense because most recruiting season happens during the fall. We can also see that some of the job categories have demand in other seasons. For example, Public Safty, Inspections, and Enforcement also have some recruiting demand in May. Some of the job categories have stable recruiting demand throughout the whole year, such as Maintenance, Architecture, and Engineering.

5.5 Word Clouds for Text Information

5.5.1 How we get started

Meanwhile, we also want to study the minimum qualification requirements and preferred skills for the available jobs in NYC. We want to find if there are any patterns in these two columns and if we can extract any useful information from them. In order to illustrate our findings graphfically, we decide to use Word Clouds to show the most frequent words in these texts.

So what is Word Clouds? Word Clouds is visual representations of text data. They are useful for quickly perceiving the most prominent terms, which makes them widely used in media and well understood by the public. A Word Cloud is a collection of words depicted in different sizes. The bigger and bolder the word appears, the greater frequency within a given text and the more important it is.

In order to extract meaningful vocabularies from the text descriptions, we take advantage of the text mining package tm in R. This package is based on the ideas of Natural Language Processing (NLP). It have methods that can tranform all words to lowercases, remove words that are uninformative in Enlighs such as “a” and “the”, and get rid of whitespaces and punctuations.

After these manipulations on the text data, we can create a new data frame of word frequencies. We can also sort it by frequency and find out the most frequent words under minimum qualification requirements and preferred skills for all jobs or for any particular category of jobs that we are interested in.

5.5.2 Results

Due to the problem of wordcloud2 that only one Word Cloud graph appears after knitting to Bookdown or HTML, we save all our graphs to four seperate html files that can be automatically rendered everytime they are opened in a browser. Here are the link to those files in my GitHub repo: https://github.com/ju-chengyou/5702_Final_Word_Cloud.

Here, we will show the Word Cloud of the most frequent words in Minimum Qual Requirements among all jobs in our dataset.

5.5.2.1 Minium Qual Requirements @ All Jobs

Minimum Qual Requirements in All Jobs Word Frequency
Word Frequency
experience 13182
years 6907
college 4875
accredited 4828
education 4567
satisfactory 4550
described 4382
fulltime 4064
degree 3931
equivalent 3471
assignment 2860
least 2780
school 2717
four 2535
engineering 2327
candidates 2242
professional 2222
level 2165
baccalaureate 2141
high 2083

5.5.2.2 Preferred Skills @ All Jobs

Preferred Skills in All Jobs Word Frequency
Word Frequency
experience 4322
skills 3960
ability 3227
knowledge 1844
strong 1764
work 1541
management 1528
excellent 1509
communication 1494
written 1097
microsoft 1045
years 968
preferred 912
excel 880
project 830
working 829
new 750
verbal 699
data 693
organizational 669

5.5.2.3 Minium Qual Requirements @ Tech Jobs

Minimum Qual Requirements in Technology Related Jobs Word Frequency
Word Frequency
experience 1827
computer 864
college 766
years 743
accredited 720
education 697
satisfactory 689
equivalent 628
described 575
fulltime 561
data 509
school 498
degree 472
four 457
programming 442
systems 429
least 386
months 383
high 356
related 352

5.5.2.4 Preferred Jobs @ Tech Jobs

Preferred Skills in Technology Related Jobs Word Frequency
Word Frequency
experience 1030
skills 511
ability 460
knowledge 327
management 307
strong 295
years 270
security 227
excellent 206
development 202
working 189
communication 186
project 180
work 177
microsoft 172
systems 161
technical 156
data 154
sql 149
following 134

5.5.3 Obervations

We can have plenty of observations from the four Word Clouds. For instance, we can see that for both Minimum Qual Requirements and Preferred Skills, experience is the most frequent word in all these four graphs, which makes sense, since previous working experience is indeed very important for applicants.

Also, when comparing all jobs with technological jobs, we notice that for tech jobs prefer to hire employees with skills related to technology, since vocabularies like computer and programming appears a lot in these texts. Even some words about specific skills, such as sql, appear in our most frequent word list.

Meanwhile, in all these four graphs, vocabularies like skills, knowledge, management, communication appear plenty of times. This makes sense since all employers want to hire people who have solid skills and are good at communication and cooperation.

Finally, in general, we find that minimum requirements of all jobs and tech jobs graphs share almost the same set of frequent words, which we believe is due to the fact that minimum requirements are similar for all kinds of jobs.

5.6 More Studies on Tech Jobs

From the bar plot we can see 79% of the technology job postings are full time jobs, and only 5% of them are part time. The remaining of them do not specify full time or part time.

From this bar plot, we can see almost all (94%) technology jobs are annully paid, 5% of them are hourly paid, and only 1% are daily paid.

To take a closer view, we look into the Job Count w.r.t Civil Service Title. From the plot above, we can see that for tech jobs, Computer System Manager is the most frequent. It occurs almost 50 times, whhich almost doubles the count of the second most title. The fewest civil service titles include Supervising Computer Service and Staff Analyst.

Here we are looking at the relationship between job count and locations. From the bar plot above, we can see that most tech jobs in our data set are located at 255 greenwich street, 2 metro tech, and 355 adam street. After searching these locations on a map, we see that the first 10 locations in our plot are clustered. Therefore, we tend to believe that tech jobs are location sensitive. In other words, tech jobs are located within a certain area. More details can be found in an interactive map later in the book.

This Cleveland dot plot shows the relationship between average annual salaries and civil service title. The salary ranges from less thatn 4000 dollars to almost 12000 dollars. We can also see that the annual salary of Aministrative Business Promot is a lot higher than any other jobs.