Below are miscellaneous resources (mostly for R) that you may find useful.
You should do the following as soon as possible to be sure you are ready to use R.
Download and install R from www.r-project.org for your
operating system. If you already have R installed, be sure you have the
latest version installed which is R version 4.4.2 (2024-10-31) nicknamed
“Pile of Leaves.” You can see the version when starting R/RStudio, or by
typing the command version$version.string
in the console.
It should show R version 4.4.2 (2024-10-31)
. For this
course you must use this version of R (or later if a newer version is
released during the semester).
Download and install the free open source license desktop version RStudio from www.rstudio.com. Be sure you download “RStudio Desktop” and not RStudio Desktop Pro, RStudio Server, or RStudio Workbench. If you already have RStudio installed, be sure you have the latest version installed (i.e., 2024.12.0.467 or later).
Run RStudio to verify that everything is working. Note that R runs “under” RStudio (the latter is what is sometimes called an “integrated development environment” or IDE). Tinker around with RStudio/R if you’ve not used it before. Explore some of the options in “Global Options…” under the “Tools” drop-down menu to suit your tastes. Many of these options should not be changed if you don’t understand fully what you are doing, but most of the options under Code, Appearance, and Pane Layout are largely cosmetic and are safe to explore.
Try installing a package from the Comprehensive R Archive Network
(CRAN) repository by typing the command
install.packages("remotes")
at the >
prompt
in the console window of RStudio to install the remotes
package. This package is used to install my trtools
package (see the next step). Also install the package
tidyverse with
install.packages("tidyverse")
. This will install several
packages that we will be using throughout the semester for data
manipulation and plotting. You can also try updating any packages you
already have installed or that came with R by typing the command
update.packages()
at the prompt in the console window. Note
that update.packages()
will only update packages that are
on the CRAN repository. Installing or updating packages from CRAN can
also be done via the “Tools” menu in RStudio.
Install my trtools package using by typing the
command remotes::install_github("trobinj/trtools")
at the
prompt in the console window of RStudio, assuming you have already
installed the package remotes as described in the
previous step. Installing trtools requires a different
command because it is hosted on GitHub rather than on CRAN. Most of the
packages we will be using are on CRAN and can be installed using
install.packages
, but a few are hosted elsewhere and
require different commands for installation.
If you run into problems with any of the steps above let me know.
The following are a few interesting and sometimes useful R packages. We will be using several of these in this course.
anytime is useful for converting a variety of variable types into dates/times. See the vignette for some examples.
colorspace provides tools to select and manipulate individual colors and color pallets for graphics and plots.
colourpicker is useful when trying to find the name or HEX value for a color for a plot. It adds an add-in to RStudio that helps identify different colors.
cowplot
extends the capabilities of ggplot
in several ways, but
what I find particularly useful is the plot_grid
function
for combining and aligning several plots into one. See the vignettes
that come with the package for some examples.
dplyr is very helpful for manipulating data and computing basic descriptive statistics. It is part of the “tidyverse” of R packages.
emmeans
is designed to estimate “marginal means” for linear and generalized
linear models. It is a very useful package for making specific
inferences based on a linear or generalized linear (mixed) model model.
It’s functionality is similar to the contrast
function in
the trtools package.
forcats includes some useful functions for working with factors. It is part of the “tidyverse” of R packages.
gganimate
lets you produce animated plots with ggplot
that can be
exported as gifs.
lubridate is very useful when working with time or dates. It is part of the “tidyverse” of R packages.
marginaleffects
can be used to estimate “marginal effects” and other quantities. It
shares some of the same capabilities as the contrast
and
margeff
functions from trtools, and
functions from the emmeans package.
rmarkdown lets you create documents that combine text, R code, and the results of running that code. Almost all of the documents I create for my classes, include this web page, are created using rmarkdown.
Rcpp
facilitates the interfacing C++ code with R. It allows for more
computationally efficient code, but it also extends C/C++ with useful
classes and functions. This is highly recommended for C++ programmers
(or anyone who would like to learn C++) and for anyone that is
interested in using C++ from R for computationally intensive work. There
are also several packages that allow you to easily install and access
various C++ libraries. Some examples are RcppArmadillo
(my favorite) for using the Armadillo C++ linear algebra
library with Rcpp, RcppEigen
for using the Eigen
numerical library, RcppGSL
for using the GNU Scientific
Library (GSL), and RcppEnsmallen
for using the Ensmallen optimization
library, and roptim
for an interface to the C functions underlying the optim
function in R. For an introduction to using the Rcpp
package I would recommend starting with the book chapter in Advanced R on Rcpp, the
github “book” Rcpp for
Everyone, and the blog entry Introduction to
Rcpp.
tesseract lets you use the tesseract optical character recognition (OCR) software from R. I have found this useful for reading data into R from a scanned document from a book or article, although it can be a little tricky to calibrate.
tidyr includes functions for “reshaping” data in various ways such as between “long form” (with one observation per row) and “wide form” with multiple observations per row. It is part of the “tidyverse” of R packages.
tidyverse
is actually a collection of packages for manipulating and plotting
including dplyr, forcats,
lubridate, tidyr, and
ggplot2. You can install all of these packages at once
by installing the tidyverse package, and you can load
all of these packages at once with library(tidyverse)
.
trtools
is a package I originally created for teaching, but now I and others
also use it for research as well. It contains some data sets that I use
in classes as well as several utility functions to facilitate certain
kinds of tasks that are not (in my opinion) as easily done with other
packages or functions. It is not available on CRAN so it cannot be
installed using install.packages
. To install it use
remotes::install_github("trobinj/trtools")
(assuming you
already have the remotes package installed). Note that
this package is a work in progress.
There are several packages that add new color pallets for use with the ggplot2 package. Several of based on themes such as the wesanderson package for color pallets used in Wes Anderson films, the ghibli package for pallets inspired by films produced by Studio Ghibli, and the gameofthrones package for pallets inspired by Game of Thrones.
The following are some free books about using R that you might find useful. There are many other freely available books and other resources available online.
Advanced R covers some of the more advanced aspects of the R programming language.
Big Book of R is a collection of links of books on R, many of which are online and free.
Data Visualization for Social Science is a very nice book on visualization using R with the ggplot2 package and some supporting packages. And despite the title, this book would be useful for applications outside the social sciences.
R Packages is an introduction to creating your own R package. Making a R package is a useful way to organize your own R code and to disseminate that work to others.
R for Data Science is an introduction to R. It also features using some capabilities of the tidyverse packages (e.g., dplyr, tidyr, and ggplot2).
R Programming for Data Science is an introduction to R.