Instructor: Daniel Carter
Week of April 22

Description
This module presents an introduction to data infrastructures. Sometimes, the hardest part of doing research or teaching isn’t the specific skills involved—it’s putting together an infrastructure so you can actually use (or teach) those skills. Anyone who’s tried to setup a robust environment to work with Python knows what a pain figuring out dependencies, software versions and virtual environments can be.

Objectives

  • Understand the goals of creating a data infrastructure for research and teaching
  • Setup and use RStudio Cloud, an interactive development environment (IDE) for R programming.
  • Optional: Comprehend the elements of R programming

The videos in this module reference projects hosted on RStudio Cloud. You should each have received an invitation by email to join the module workspace; if you have problems, please email Daniel (dcarter [at] txstate.edu). The overview project provides a written summary of most of the content covered in the videos and is also available as a PDF that models the principles covered here.

In the first video, we cover the goals of a data infrastructure, focusing on the concepts of literate programming and reproducible research. The second video covers the R programming ecosystem and introduces you to RStudio Cloud, which we’ll use in the third video to look at examples related to research and teaching. An optional fourth video gives a very brief introduction to programming in R.






Discussion and Assignment
After viewing the first three (short) videos, please write a Slack post on #datainfrastructures reflecting on your personal needs (research, teaching or both) related to data infrastructures. You might discuss your current infrastructure, past successes or failures and/or plans for the future.


Further Reading

  • Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). ACM, New York, NY, USA, Paper 32, 12 pages. DOI: https://doi.org/10.1145/ 3173574.3173606
  • Çetinkaya-Rundel, Mine, and Colin Rundel. “Infrastructure and tools for teaching computing throughout the statistical curriculum.” The American Statistician 72.1 (2018): 58-65.
  • Grolemund, G., & Wickham, H. R for Data Science. Retrieved from http://r4ds.had.co.nz/
  • Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72(4), 382-391.
  • Horton, N. J., Baumer, B. S., & Wickham, H. (2014). Teaching precursors to data science in introductory and second courses in statistics. arXiv preprint arXiv:1401.3269.
  • Knuth, D. E. (1984). Literate programming. The Computer Journal, 27(2), 97-111.
  • Shum, S., & Cook, C. (1994, March). Using literate programming to teach good programming practices. In ACM Sigcse Bulletin (Vol. 26, No. 1, pp. 66-70). ACM.
  • Stander, J., & Dalla Valle, L. (2017). On Enthusing Students About Big Data and Social Media Visualization and Analysis Using R, RStudio, and RMarkdown. Journal of Statistics Education, 25(2), 60-67.