class: inverse, middle # Teach Me How to
Google Logo
oogle<br> <br> <span style = 'font-size: 130%;'>Sam Csik | Data Training Coordinator</span> National Center for Ecological Analysis & Synthesis<br> <br> <span style = 'font-size: 130%;'>Master of Environmental Data Science Program</span> Bren School of Environmental Science & Management<br> .footnote[Slides & source code available on [
GitHub
](https://github.com/UCSB-MEDS/teach-me-how-to-google)] --- ### But first, a note on ChatGPT -- <!-- <center><span style = 'font-size:120%; font-weight: bold; color: #05859B;'>What is it?</span></center> --> <span style = 'font-size: 90%;'>
Angle Right
**What is it?** [ChatGPT](https://chat.openai.com/auth/login) is an artificial intelligence (AI) chat bot developed by [OpenAI](https://openai.com/) that uses natural language processing (NLP) to generate responses to user-generated prompts.</span> <br> <img src="media/chatgpt-logo.png" width="50%" style="display: block; margin: auto;" /> <br> -- <span style = 'font-size: 90%;'>
Angle Right
**Why is everyone so exited about it?** ChatGPT can be a powerful tool to help increase productivity and efficiency, but it's important that you understand it's limitations and use it responsibly and ethically. This is beyond the scope of this talk, but I encourage you to read [ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope](https://doi.org/10.1016/j.iotcps.2023.04.003), by Partha Pratim Ray, or revist Ruth Oliver's talk, [Generative AI: What is it and how do we use it responsibly?](https://docs.google.com/presentation/d/1XxPoHLhUl2r7t0pAu8Hub_5NmY6bY6sH4_NfOa_cw3I/present?slide=id.g2ef98dcd5ad_0_51).</span> --- ## Why should I learn how to troubleshoot when I could just use ChatGPT? <span style = 'font-size: 100%;'>Many reasons! But here are just a few:</span> .footnote[ <span style = 'font-size: 80%;'><sup>1</sup>[ChatGPT & Education Slide Deck](https://figshare.edgehill.ac.uk/articles/presentation/ChatGPT_Education_Slide_Deck/21901629/1)</span> ] -- - <span style = 'font-size: 80%;'>ChatGPT is trained on a massive corpus of data extending only to 2021 (as of Fall 2023) -- meaning **it has biases**<sup>1</sup></span> -- - <span style = 'font-size: 80%;'>**ChatGPT can make stuff up** (see [this story](https://apnews.com/article/artificial-intelligence-chatgpt-fake-case-lawyers-d6ae9fa79d0542db9e1455397aef381c) about lawyers who submitted fake legal research created by ChatGPT)</span> -- - <span style = 'font-size: 80%;'>Similarly, ChatGPT has been called a **"stochastic parrot"** meaning it can generate convincing responses, but ultimately does not understand what it is processing ([Bender, et al 2021](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922))</span> -- - <span style = 'font-size: 80%;'>One of the more challenging aspects of both troubleshooting and crafting effective ChatGPT prompts is **having the vocabulary to describe what it is you're looking for** -- the more documentation, vignettes, and content you read, the easier this will become!</span> -- - <span style = 'font-size: 80%;'>**Programming/developing is more than just coding.** Reproducibility, intuitive organization, coming up with creative solutions for complex tasks, etc. requires contextual understanding and critical thinking. ChatGPT isn't a replacement for this.</span> --- ### Welcome to data science, where questions are aplenty! .pull-left[
Angle Right
You will become increasingly more comfortable with **not immediately knowing** the answers to all your coding problems. It's all part of the job. <br>
Angle Right
If it's not already, **Google** will become one of your best friends. <br>
Angle Right
Googling is hard, and it is a skill that requries **practice.** But you **can** and **will** get better at it over time. ] .pull-right[ <img src="media/questions.gif" alt="A gif of Seth Meyers sitting at the desk of his show, Late Night, and saying 'I have a lot of questions.'" width="120%" /> ] .pull-right[ .center[ <span style = 'font-size: 85%;'>-Me, everytime I open RStudio</span> ] ] --- ### It doesn't mean you won't still feel like this at times: .center2[ <img src="media/ron_swanson.gif" alt="A gif of Ron Swanson, from the comedy TV series, Parks and Recreation, getting angry at his computer then taking it outside to toss in a dumpster." width="100%" height="100%" /> .center[ <span style = 'font-size: 85%;'>-Me still, about half the times I open RStudio</span> ] ] --- ### But the goal is to be a bit more at peace with that feeling...and have the confidence that you can find your way <br> <br> .center2[ <img src="media/how_much_i_know.jpeg" alt="A cartoon drawing of an x- and y-axis plot, where the x-axis represents 'time' and the y-axis represents 'how much I know about R'. A round cartoon creature is moving through time (i.e. to the right across the x-axis). At time point one, when this creature is first beginning to learn R, it knows very little and it's facial expression suggests that it feels very intimidated. Over time, it experiences some highs and some lows. At time point eight (which is the furthest point to the right along the x-axis) this creature again feels like it knows very little about R, but instead of seeming intimidated as it did at time point one, it appears to be excited, suggesting a new sense of confidence." width="100%" /> .center[ <span style = 'font-size: 85%;'>Artwork by [@allison_horst](https://twitter.com/allison_horst?lang=en) ] ] --- ### I typically find myself turning to Google for one of two reasons: <br> <span style = 'font-size: 120%;'>
Triangle Exclamation
I got an error and need help fixing it</span> <br> <span style = 'font-size: 120%;'>
Circle Question
I know what I want my code to do, but I have no idea how to actually pull it off</span> <br> -- <span style = 'font-size: 120%;'>
Face Flushed
Sometimes, it's both of these things happening at the same time</span> --- class: inverse, middle, center ##
Triangle Exclamation
I got an error and need help fixing it --- ### We've all been here before: <img src="media/alligator.png" alt="Comic panels of an alligator trying to debug some code. First panel: A confident looking alligator gets an error message. Second panel: a few minutes later, the error remains and the alligator is looking carefully at their code. Third panel: 10 minutes after that, the error remains and the alligator is giving a frustrated 'RAAAR' while desperately typing. Fourth panel: The error remains, and the alligator looks exhausted and exasperated, and a thought bubble reads 'maybe it's a bug.' Fifth panel: A friendly flamingo comes over to take a look, and reads aloud from the problematic code a spelling error: 'L-E-N-G-H-T.' Only the tail of the alligator is visible as it stomp stomp stomps out of the panel roaring." width="100%" /> .center[ <span style = 'font-size: 75%;'>Artwork by [@allison_horst](https://twitter.com/allison_horst?lang=en)</span> ] --- ### Pause, exhale, narrow down your potential Google search <br> -- <span style = 'font-size: 120%;'>
Power Off
Restart R</span> -- <br> <span style = 'font-size: 120%;'>
Lightbulb
Check the easy stuff</span> -- <br> <span style = 'font-size: 120%;'>
Triangle Exclamation
Read that error message!</span> -- <br> <span style = 'font-size: 120%;'>
Magnifying Glass
Try to isolate the problem</span> -- <br> <span style = 'font-size: 120%;'>
File Lines
Double-check the documentation</span> -- <br> <span style = 'font-size: 120%;'>
Earlybirds
Talk about it out loud</span> --- ### <span style = 'font-size: 120%;'>
Power Off
Restart R</span> .center[ >"Restart R often, especially when things get weird...We install and update packages from R, which is a little bit like working on your airplane engine while you're flying." .right[ <span style = 'font-size: 60%;'>-Jenny Bryan, in her 2020 RSTUDIO::CONF keynote, [Object of type ‘closure’ is not subsettable](https://www.rstudio.com/resources/rstudioconf-2020/object-of-type-closure-is-not-subsettable/)</span> ] ] <br> -- .center[ Similarly, going to sleep and trying again tomorrow is a legitimate (and often impactful) strategy -- think of it as restarting your own internal computer (i.e. your brain). ] <blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">I love waking up and jumping back into solving a bug and immediately solving it with a fresh mind. Sleep is my favorite coding tool.</p>— Kelly Vaughn (@kvlly) <a href="https://twitter.com/kvlly/status/1385573317277532162?ref_src=twsrc%5Etfw">April 23, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ??? Common reason why things get funky in a way that makes it difficult to debug and understand. The good thing is that you don't have to. Quit, restart! --- ### <span style = 'font-size: 120%;'>
Lightbulb
Check the easy stuff</span> <br> .center[ <img src="media/debug_bingo.png" width="70%" /> ] .center[ <span style = 'font-size: 75%;'>Source: This [tweet](https://twitter.com/cogscimom/status/1354508785365078016?ref_src=twsrc%5Etfw) by [@cogscimom](https://twitter.com/cogscimom).</span> ] --- ### <span style = 'font-size: 120%;'>
Triangle Exclamation
Read that error message!</span> **Helpful:** ```r library(tidyverse) # a collection of data wrangling & visualization packages library(palmerpenguins) # contains the 'penguins' data set # print out the first three rows of the penguins data frame head(penguins, 3) ``` ``` ## # A tibble: 3 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## # ℹ 2 more variables: sex <fct>, year <int> ``` -- ```r # what unique values are in the species column of the penguins data frame? unique(penguins$species) ``` ``` ## [1] Adelie Gentoo Chinstrap ## Levels: Adelie Chinstrap Gentoo ``` -- ```r # filter for just "Gentoo" penguins gentoo <- penguins %>% filter(species = "Gentoo") ``` ``` ## Error in `filter()`: ## ! We detected a named input. ## ℹ This usually means that you've used `=` instead of `==`. ## ℹ Did you mean `species == "Gentoo"`? ``` ??? Revisit A.Horst's lab materials from EDS 221: https://allisonhorst.github.io/EDS_221_programming-essentials/interactive_sessions/day_9_interactive.html#2_Basics_-_troubleshooting_practice__tips --- ### <span style = 'font-size: 120%;'>
Triangle Exclamation
Read that error message!</span> **Somewhat less helpful:** ```r # create data object, 'dat' dat <- data.frame(x = 1, y = 2) dat ``` ``` ## x y ## 1 1 2 ``` .footnote[ <span style = 'font-size: 80%;'>Example from Jenny Bryan's 2020 RSTUDIO::CONF keynote, [Object of type ‘closure’ is not subsettable](https://www.rstudio.com/resources/rstudioconf-2020/object-of-type-closure-is-not-subsettable/). ] -- ```r # extract column 'x' from your data object df$x ``` ``` ## Error in df$x: object of type 'closure' is not subsettable ``` -- <span style = 'font-size: 65%;'>If you feel incredibly frustrated by this error, well, welcome to the club! Jenny Bryan writes:</span> > <span style = 'font-size: 65%;'>Your first “object of type ‘closure’ is not subsettable” error message is a big milestone for an R user. Congratulations, if there was any lingering doubt, you now know that you are officially programming!</span> <span style = 'font-size: 65%;'>This error often arises when you attempt to subset a function (i.e. treat a function in a way that it is shouldn't be). Here, we forgot that we called called our object `dat`, and not `df`. `df()` also happens to be a function that gives you the density of the 'F' distribution and we are attempting to subset a column (`x`) from it.</span> ??? Revisit A.Horst's lab materials from EDS 221: https://allisonhorst.github.io/EDS_221_programming-essentials/interactive_sessions/day_9_interactive.html#2_Basics_-_troubleshooting_practice__tips --- ### <span style = 'font-size: 120%;'>
Triangle Exclamation
Read that error message!</span> Check out some of these additional resources for common R error messages and strategies for fixing them: * [Common R Error Messages](https://www.programmingr.com/r-error-messages/), on [ProgrammingR](https://www.programmingr.com/) * [Common R Programming Errors Faced by Beginners](https://www.r-bloggers.com/2016/06/common-r-programming-errors-faced-by-beginners/), on [R-bloggers](https://www.r-bloggers.com/) * [R Error Message Cheat Sheet](http://varianceexplained.org/courses/errors/), by David Robinson on his blog, [Variance Explained](http://varianceexplained.org/) * [How to: Interpret Common Errors in R](https://warin.ca/posts/rcourse-howto-interpretcommonerrors/), by [Thierry Warin](https://warin.ca/) .center[ <img src="media/blame_code.png" alt="A friendly monster has slipped on a banana peel, and says 'I know it was you, code. It breaks my heart.' Meanwhile, a little character labeled 'CODE' looks on indignantly, pointing to evil characters labeled 'mismanaged files,' 'navigating your computer', and 'typing' hiding behind a bush holding a bunch of bananas. The point being: often folks blame code for data science problems that are often caused by other underlying issues." width="55%" /> .center[ <span style = 'font-size: 75%;'>Artwork by [@allison_horst](https://twitter.com/allison_horst?lang=en)</span> ] ] --- ### <span style = 'font-size: 120%;'>
Magnifying Glass
Try to isolate the problem</span> <span style = 'font-size: 80%;'>It can feel overwhelming to figure out where an error or issue is occurring in a large chunk of text. For example, let's say we're wrangling some data and want to reorder values in a column from highest to lowest:</span> ```r # load libraries ---- library(dplyr) library(palmerpenguins) # wrangle data ---- penguins_new <- penguins |> select(species, sex, bill_length_mm) |> filter(species == "Adelie") |> reorder(bill_length_mm) ``` ``` ## Error in eval(expr, envir, enclos): object 'bill_length_mm' not found ``` -- <span style = 'font-size: 80%;'>**Running all of our piped-together wrangling code at once can make it difficult to identify which line(s) of code is responsible for this error** (and you might imagine situations where you have much longer and more complex code than the short example presented here).</span> -- <span style = 'font-size: 80%;'>**Instead, run line-by-line to isolate where the problem is occurring** so that you may start investigating from there.</span> --- ### <span style = 'font-size: 120%;'>
Magnifying Glass
Try to isolate the problem</span> <span style = 'font-size: 80%;'>Run line-by-line until you hit the error (**Tip:** comment/uncomment lines of code using the keyboard shortcut, `command`/`control` + `shift` + `C`):</span> ```r penguins_new <- penguins |> select(species, sex, bill_length_mm) # |> # filter(species == "Adelie") |> # reorder(bill_length_mm) ``` <span style = 'font-size: 80%;'>Works!</span> -- ```r penguins_new <- penguins |> select(species, sex, bill_length_mm) |> filter(species == "Adelie") # |> # reorder(bill_length_mm) ``` <span style = 'font-size: 80%;'>Works!</span> -- ```r penguins_new <- penguins |> select(species, sex, bill_length_mm) |> filter(species == "Adelie") |> reorder(bill_length_mm) ``` ``` ## Error in eval(expr, envir, enclos): object 'bill_length_mm' not found ``` <span style = 'font-size: 80%;'>Doesn't work...let's look into what `reorder()` is / does...</span> --- ### <span style = 'font-size: 120%;'>
Magnifying Glass
Try to isolate the problem</span> <span style = 'font-size: 75%;'>Searching for `reorder()` (either by looking up documentation -- more on that in a moment -- or Googling it) will reveal that **it's not actually a `{dplyr}` (or tidyverse) function at all**.</span> -- <span style = 'font-size: 75%;'>**It's easy to confuse or mistake function names, particularly as you're just starting to learn a language or new packages.** For example, a similarly named tidyverse function, `forcats::fct_reorder()`, is used to reorder *factor* levels. In this case, however, we're looking to reorder *numeric values* in the `bill_length_mm` column.</span> -- <span style = 'font-size: 75%;'>A Google search ("[R tidyverse reorder values high to low](https://www.google.com/search?q=R+tidyverse+reorder+values+high+to+low&sca_esv=558984878&ei=KlPkZOzoK8TFkPIPqf2ikAI&ved=0ahUKEwjspZqwz--AAxXEIkQIHam-CCIQ4dUDCBA&uact=5&oq=R+tidyverse+reorder+values+high+to+low&gs_lp=Egxnd3Mtd2l6LXNlcnAiJlIgdGlkeXZlcnNlIHJlb3JkZXIgdmFsdWVzIGhpZ2ggdG8gbG93MgUQIRigATIFECEYoAEyBRAhGKsCSJ4fUPECWJ4dcAF4AJABAJgBqwGgAdAOqgEENC4xMrgBA8gBAPgBAcICChAAGEcY1gQYsAPCAgUQABiiBOIDBBgAIEGIBgGQBgg&sclient=gws-wiz-serp)") leads us to the [`{dplyr}` documentation](https://dplyr.tidyverse.org/reference/arrange.html) for the `arrange()` function, which also allows us to sort values in descending order when coupled with `desc()`:</span> ```r penguins_new <- penguins |> select(species, sex, bill_length_mm) |> filter(species == "Adelie") |> arrange(desc(bill_length_mm)) head(penguins_new) ``` ``` ## # A tibble: 6 × 3 ## species sex bill_length_mm ## <fct> <fct> <dbl> ## 1 Adelie male 46 ## 2 Adelie male 45.8 ## 3 Adelie male 45.6 ## 4 Adelie male 44.1 ## 5 Adelie male 44.1 ## 6 Adelie male 43.2 ``` --- ### <span style = 'font-size: 120%;'>
File Lines
Double-check the documentation</span> <span style = 'font-size: 75%;'>While documentation can sometimes be tricky to read, it provides critical info for understanding how to use a package or function. **Check out the documentation for any loaded package or function by either typing `?function_name` or `help(function_name)` in your console.** If you search for a function from a package that is not currently loaded, R will prompt you to type `??function_name`.</span> -- <span style = 'font-size: 75%;'>Documentation may include lots of different pieces of information. One example, `ggplot()`:</span> .center[ <img src="media/ggplot_help.png" width="70%" /> ] --- ### <span style = 'font-size: 120%;'>
File Lines
Double-check the documentation</span> <span style = 'font-size: 75%;'>What's included in documentation varies by function and package. Some other important components include:</span> <span style = 'font-size: 65%;'>
Angle Right
**Usage:** shows the various arguments to need to specify. Some are necessary, some are optional. Default values are shown here (See `ggplot` example on [previous slide](https://samanthacsik.github.io/teach-me-how-to-google/slides/WaterData_2022-11-08.html#34))</span> <span style = 'font-size: 65%;'>
Angle Right
**Arguments:** briefly describes what each argument is/takes as a value (See `ggplot` example on [previous slide](https://samanthacsik.github.io/teach-me-how-to-google/slides/WaterData_2022-11-08.html#34))</span> <span style = 'font-size: 65%;'>
Angle Right
**Value:** describes the output of a function</span> <span style = 'font-size: 65%;'>
Angle Right
**See Also:** related functions</span> <span style = 'font-size: 65%;'>
Angle Right
**Examples:** code snippets to show you how a function is used in practice</span> -- <span style = 'font-size: 75%;'>Some packages will also have **vignettes**, which provide more detailed descriptions of how functions within a package should be used. Googling `package_name vignette` will usually get you there (e.g. check out the [`dplyr` vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html)). Alternatively, package documentation on [CRAN](https://cran.r-project.org/)<sup>1</sup> will typically include any vignettes that are associated with that package (e.g. find all `dplyr` vignettes [here](https://cran.r-project.org/web/packages/dplyr/index.html)).</span> <span style = 'font-size: 60%;'><sup>1</sup>[CRAN](https://cran.r-project.org/) stands for the **C**omprehensive **R** **A**rchive **N**etwork and is R's central software repository -- in other words, it's where published R packages (including previous versions of published R packages) live. When you run `install.packages("package_name)`, R (on your computer) reaches out to CRAN to find/install the latest version of that package on your computer.</span> --- ### <span style = 'font-size: 120%;'>
Earlybirds
Talk about it *out loud*</span> .pull-left[ This is often referred to as **rubber duck debugging**, and it goes something like this: > <span style = 'font-size: 55%;'>*1. Beg, borrow, steal, buy, fabricate or otherwise obtain a rubber duck (bathtub variety).*</span> > <span style = 'font-size: 55%;'>*2. Place rubber duck on desk and inform it you are just going to go over some code with it, if that’s all right.*</span> > <span style = 'font-size: 55%;'>*3. Explain to the duck what your code is supposed to do, and then go into detail and explain your code **line by line**.*</span> > <span style = 'font-size: 55%;'>*4. At some point you will tell the duck what you are doing next and then realise that that is not in fact what you are actually doing. The duck will sit there serenely, happy in the knowledge that it has helped you on your way.*</span> .right[ <span style = 'font-size: 50%;'>- [rubberduckdebugging.com](https://rubberduckdebugging.com/) with original credit to Andy from lists.ethernal.org</span> ] ] .pull-right[ <img src="media/duck.jpeg" width="100%" /> .center[ <span style = 'font-size: 75%;'>Source: [Wikipedia](https://en.wikipedia.org/wiki/Rubber_duck_debugging)</span> ] ] ??? I'll also note that I tend to combine this 'rubber ducky debugging' method with actually running my code line-by-line. --- ### Still haven't figured it out? Enter Google. <span style = 'font-size: 100%;'>
Google Logo
General Googling tips:</span> - <span style = 'font-size: 75%;'>[r] + error message</span> - <span style = 'font-size: 75%;'>error message + function or package name</span> - <span style = 'font-size: 75%;'>sometimes even just the error message alone with suffice</span> -- <span style = 'font-size: 100%;'>
GitHub Square
Explore package source code on GitHub</span> - <span style = 'font-size: 75%;'>this may include looking at any issues or pull requests associated with that package to check if there are known problems (e.g. check out `xaringan` on [GitHub](https://github.com/yihui/xaringan))</span> - <span style = 'font-size: 75%;'>check to see if there are wikis available for a particular repository (e.g. I was able to figure out how to wrap code output on `xaringan` slides thanks to one of their [wikis](https://github.com/yihui/xaringan/wiki/Word-wrapping-of-code-output))</span> -- <span style = 'font-size: 100%;'>
Calendar
Check the date of online solutions</span> - <span style = 'font-size: 75%;'>past solutions can become outdated quickly -- target recent posts and responses</span> -- <span style = 'font-size: 100%;'>
Plus
Read multiple search results</span> - <span style = 'font-size: 75%;'>sometimes one explanation will make sense in a way that another explanation will not</span> -- <span style = 'font-size: 100%;'>
User Secret
Isolate the relevant part(s) of an answer</span> - <span style = 'font-size: 75%;'>not every part of an online response will be relevant to *your* problem -- take care when copying/pasting entire "solutions"</span> ??? James Frew posted in the eds-223 channel on Monday, October 4, 2021: - Reminder: anytime you get an unusually cryptic error message, it doesn't hurt to just cut-n-paste a reasonably-specific chunk of the message into Google. - For example, one of you contacted me about an error message whose last line was: `function 'Rcpp_precious_remove' not provided by package 'Rcpp'` - That looked reasonably-specific (i.e., likely to retrieve something useful, and unlikely to retrieve cat memes), so I Googled it: https://www.google.com/search?q=function+%27Rcpp_precious_remove%27+not+provided+by+package+%27Rcpp%27 - First hit was: https://stackoverflow.com/questions/68416435/rcpp-package-doesnt-include-rcpp-precious-remove which solved the problem. --- ##
Stack Overflow
Refine your search on Stack Overflow <span style = 'font-size: 90%;'>You're probably going to end up on Stack Overflow *often*. Increase your chances of finding a helpful answer by visiting their [How do I search?](https://stackoverflow.com/help/searching) page, which details all search operators.</span> <span style = 'font-size: 90%;'>You will also find a handy "cheat sheet" that suggests some of the more common search operators when you click into the search bar:</span> <br> .center[ <img src="media/stackoverflow.png" width="65%" /> ] <br> .center[ Check out the difference between searching for [[r] “dplyr filter” answers:3](https://stackoverflow.com/search?q=%5Br%5D+%22dplyr+filter%22+answers%3A3) versus [r filter](https://stackoverflow.com/search?q=r+filter). ] --- ## Still stumped? Ask the community! <span style = 'font-size: 75%;'>Create a `reprex` i.e. a **minimal reproducible example**. Strip away the cruft and present only what is required to reproduce your issue. Most of the time, this will help *you* solve your own problem. And if you still can't figure it out, you've helped the community help you.</span> <span style = 'font-size: 75%;'>Check out [FAQ: How to do a minimal reproducible example (reprex) for beginners](https://community.rstudio.com/t/faq-how-to-do-a-minimal-reproducible-example-reprex-for-beginners/23061) and [Reprex do's and don'ts](https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html) to get started on creating your first `reprex`.</span> .center[ <img src="media/reprex.jpeg" width="65%" /> .center[ <span style = 'font-size: 75%;'>Artwork by [@allison_horst](https://twitter.com/allison_horst?lang=en)</span> ] ] --- class: inverse, middle, center ##
Circle Question
I know what I want my code to do, but I have no idea how to actually pull it off --- ## Knowing *what* to search for is half the battle .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">The early hard part is just knowing the vocabulary. True beginners would have no reason to know what a join is. They might describe it as “looking up values from another table” which may not produce helpful results. So reading through a text book or intro class can give vocab</p>— JD Long (@CMastication) <a href="https://twitter.com/CMastication/status/1437827181476995072?ref_src=twsrc%5Etfw">September 14, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Maybe we need to just tell learners “read the book for both the concepts AND the vocabulary to ask questions” because I bet they don’t realize this at all.</p>— JD Long (@CMastication) <a href="https://twitter.com/CMastication/status/1437833818518528003?ref_src=twsrc%5Etfw">September 14, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- ## A curated **vocab list** of R terminology may help <span style = 'font-size: 80%;'>
Angle Right
Hadley Wickham lists functions that he believes constitute a good working vocabulary for R fluency in his book, [Advanced R (first edition)](http://adv-r.had.co.nz/Vocabulary.html), & Joachim Schork (the voice behind the blog, [Statistics Globe](https://statisticsglobe.com/)) curates a long list of R commands and functions in his post, [R Functions List (+ Examples) | All Basic Commands of the R Programming Language](https://statisticsglobe.com/r-functions-list/).</span> - <span style = 'font-size: 75%;'>**Keeping a running vocab list for yourself** (that's organized in a way that makes most sense to *you*) can be massively helpful, even if just for the beginning stages of your R journey. It won't be long until you start realizing how much you actually know:</span> .center[ <img src="media/know_stuff.png" alt="A cartoon monster student in a backpack looks on at stunning gems labeled 'Touchstones of Intuition', which have the text 'You know some stuff' written on sequential crystals. " width="55%" /> .center[ <span style = 'font-size: 85%;'>Artwork by [@allison_horst](https://twitter.com/allison_horst?lang=en) ] ] --- ## Use **search operators** to get more specific results .pull-left[ <img src="media/how_to_nytimes.png" width="97%" /> ] .pull-right[ <img src="media/how_to_velocity.png" width="90%" /> ] .center[ <span style = 'font-size: 65%;'>Source: ‘Infographic: Get More Out Of Google.’ HackCollege. HackCollege, 23 Nov. 2011. Web. 30 July 2015.</span> ] **For example, compare the following two Google queries:** **1.** <span style = 'font-family: "Fira Mono";'>[r dplyr join](https://www.google.com/search?q=r+dplyr+join&oq=R+dplyr+join&aqs=chrome.0.0i512l6j69i60.2431j0j7&sourceid=chrome&ie=UTF-8)</span> **2.** <span style = 'font-family: "Fira Mono";'>[site:stackoverflow.com R "dplyr join"](https://www.google.com/search?q=site%3Astackoverflow.com+R+%22dplyr+join%22&oq=site%3Astackoverflow.com+R+%22dplyr+join%22&aqs=chrome..69i57j69i58.71866j0j7&sourceid=chrome&ie=UTF-8)</span> --- ### It's also okay just to **copy code**, but take time to understand it (and give attribution as appropriate) .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Jokes are only the beginning.<br><br>The Key is available now, and your programming will never be the same.<a href="https://t.co/KAjJ1DKenf">https://t.co/KAjJ1DKenf</a> <a href="https://t.co/gDiYvBPSqp">https://t.co/gDiYvBPSqp</a> <a href="https://t.co/bhrBJN8N1Y">pic.twitter.com/bhrBJN8N1Y</a></p>— Cassidy (@cassidoo) <a href="https://twitter.com/cassidoo/status/1442936000578154498?ref_src=twsrc%5Etfw">September 28, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">It started as an April Fools prank, but the joke is officially over. The Key is now available for purchase worldwide from <a href="https://twitter.com/drop?ref_src=twsrc%5Etfw">@drop</a> with proceeds going to <a href="https://twitter.com/digundiv?ref_src=twsrc%5Etfw">@digundiv</a>. Check it out: <a href="https://t.co/rLQVdVn4ms">https://t.co/rLQVdVn4ms</a> <a href="https://t.co/rsO63pNd1B">pic.twitter.com/rsO63pNd1B</a></p>— Stack Overflow (@StackOverflow) <a href="https://twitter.com/StackOverflow/status/1442898486685364224?ref_src=twsrc%5Etfw">September 28, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] ??? Once you find a search result, it’s ok in the beginning to copy and paste without fully understanding what the code does. Over time, as you get better at coding, the code that other people post will make more sense. --- class: inverse, middle, center ##
Bug
Take proactive measures to make *your* code easier to debug --- ## Help future you *and* others by writing thoughtful code -- <br> <span style = 'font-size: 120%;'>
Code
Plan your approach by writing pseudocode</span> -- <br> <span style = 'font-size: 120%;'>
Brain
Use clear object/variable names</span> -- <br> <span style = 'font-size: 120%;'>
Check
Build in checks</span> --- ### <span style = 'font-size: 120%;'>
Code
Plan your approach by writing pseudocode</span> <span style = 'font-size: 90%;'>[Pseudocode](https://en.wikipedia.org/wiki/Pseudocode) is a step-by-step written "outline" of your code that you can then transcribe into a programming language (in our case, likely R). It is written in plain language (e.g. English), but uses the structural conventions of a programming language.</span> ```r ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## pseudocode ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ new_df <- original_data %>% filter for gentoo penguins %>% calculate bill length to depth ratio and add as new column ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## actual code ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ penguins_new <- penguins %>% filter(species == "Gentoo") %>% mutate(bill_LD_ratio = bill_length_mm/bill_depth_mm) ``` ??? While pseudocode is often used for planning out more complex algorithms, practicing with simpler examples may help you to avoid making mistakes -- pseudocode forces you acknowledge what each line of your script is meant to do, before you even write any code. --- ### <span style = 'font-size: 120%;'>
Brain
Use clear object/variable names</span> <span style = 'font-size: 80%;'>It can get difficult to keep track of what's what when you start working with multiple datasets and creating lots of different objects. Be sure to name things so that collaborators (including future you!) can more easily understand what's going on. For example:</span> ```r ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Not so good ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # load csv files data1 <- read_csv("data/y2020_waterData.csv") data2 <- read_csv("data/y2021_waterData.csv") # combine data using the `rbind()` function (assume both csv files have the same structure) data3 <- rbind(waterData2020, waterData2021) ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Good! ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # load csv files waterData2020 <- read_csv("data/y2020_waterData.csv") waterData2021 <- read_csv("data/y2021_waterData.csv") # combine data using the `rbind()` function (assume both csv files have the same structure) waterData2020_2021 <- rbind(waterData2020, waterData2021) ``` <span style = 'font-size: 80%;'>You don't need to create excessively long variable names, but being clear is important -- don't sacrifice clarity for shorter names (plus RStudio's autocomplete functionality makes retyping longer variable names a breeze).</span> ??? ### <span style = 'font-size: 120%;'>
Brain
Use explaining variables</span> Code comments are great, but variable and function names that explain what they are/do are also important. .center[ > <span style = 'font-size: 80%;'>Because with code comments we end up with two parallel sources of truth - what the code is doing, and what the comment says. Over time these two sources of truth have a tendency to diverge.</span> .right[ <span style = 'font-size: 60%;'>- Pete Hodgson, in his post, [Explaining Variable](https://blog.thepete.net/blog/2021/06/24/explaining-variable/)</span> ] ] <blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">A common suggestion I give during code review is to add an Explaining Variable. I couldn't find a good explanation of the pattern to link to, so I wrote one 🙃<a href="https://t.co/WRigu74gpu">https://t.co/WRigu74gpu</a></p>— Pete Hodgson (@ph1) <a href="https://twitter.com/ph1/status/1412125686974717971?ref_src=twsrc%5Etfw">July 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- ### <span style = 'font-size: 120%;'>
Check
Build in checks</span> <span style = 'font-size: 70%;'>The function below converts a temperature between Celsius and Fahrenheit, and vice versa. It takes three arguments:</span> - <span style = 'font-size: 70%;'>`from`: a string, either "fahrenheit" or "celsius"</span> - <span style = 'font-size: 70%;'>`to`: a string, either "fahrenheit" or "celsius"</span> - <span style = 'font-size: 70%;'>`temp`: a double, the temperature value to convert</span> <span style = 'font-size: 70%;'>It is important to note that with the way this function is currently written, the `from` and `to` arguments only take the string values, "fahrenheit" or "celsius." What happens if the user accidentally misspells those values? We can make that easier to troubleshoot by building in an informative error message.</span> ```r convert_temp <- function(from, to, temp) { if(from == "celsius" & to == "fahrenheit") { message("Converting from Celsius to Fahrenheit") converted_temp <- temp*1.8 + 32 } else if(from == "fahrenheit" & to == "celsius") { message("Converting from Fahrenheit to Celsius") converted_temp <- (temp - 32)/1.8 # if inputs don't match either of the first two conditionals, return error & NA * } else { * stop("The inputs provided (from = '", from, "' & to = '", to,"') are not as expected. Double check spelling.") * return(NA) * } return(converted_temp) } ``` --- ### <span style = 'font-size: 120%;'>
Check
Build in checks</span> <span style = 'font-size: 90%;'>Let's convert 90°F to °C:</span> ```r # try it out convert_temp(from = "farenheit", to = "celsius", temp = 90) ``` ``` ## Error in convert_temp(from = "farenheit", to = "celsius", temp = 90): The inputs provided (from = 'farenheit' & to = 'celsius') are not as expected. Double check spelling. ``` -- <span style = 'font-size: 90%;'>Bummer, our code broke. But informative error message to the rescue! Seems like we may have a spelling error for the input values provided to either the `from` and/or `to` arguments. Let's fix and try again:</span> -- ```r # oops, misspelled "fahrenheit" - try again convert_temp(from = "fahrenheit", to = "celsius", temp = 90) ``` ``` ## [1] 32.22222 ``` --- ### Google is now forever a part of your coding life... .center[ <span style = 'font-size: 80%;'>...but with practice, you'll find better solutions faster. Keep at it!</span> ] .center[ <img src="media/rake.png" alt="A meme with two stacked panels. The top panel depicts a stick figure taking one step forward only to step directly on a rake, which flips up to smack the stick figure in the face. The bottom panel depics a man 'skateboarding' down a set of outdoor stairs on a rake, performing tricks along the way. When he reaches the bottom of the stairs, he accidentally steps on the rake, which flips up to smack him in the face." width="42%" height="42%" /> .center[ <span style = 'font-size: 75%;'>- Twitter, source unknown</span> ] ] --- class: center, middle ##
Google Logo
et
Google Logo
oogling <img src="media/google.gif" width="60%" /> Slides created via the R packages: [**xaringan**](https://github.com/yihui/xaringan)<br> [**gadenbuie/xaringanthemer**](https://github.com/gadenbuie/xaringanthemer) <span style = 'font-size: 75%;'>*If you see mistakes or want to suggest changes, please create an [issue](https://github.com/UCSB-MEDS/teach-me-how-to-google/issues) on the source repository.*</span> --- class: inverse, middle, center ##
Laptop Code
Practice problems Copy and paste the following coding challenges into an R script or Rmd file. Problems may have more than one error or missing code (be mindful, you might not even get an error message) to identify and fix. More than one solution may exist for each problem -- I'd love to hear about the different ways people approach these, so please be prepared to share the steps (Googling, looking at documentation, reading error messages, etc.) you took with the class! *Remember:* running code line-by-line is one of the best ways to isolate errors. --- ## Problem 1 **Goal:** You have two data sets, `a` and `b`. Each contain the same attributes (i.e. columns) -- `month`, `season`, and `daylight_savings`. Identify which records (i.e. rows of data) in data table `a` are not contained in data table `b`. ```r a <- data.frame(month = c("October", "November", "December", "January", "February"), season = c("Fall", "Fall", "Winter", "Winter", "Winter"), daylight_savings = c("yes", "partly", "no", "no", "no")) b <- data.frame(month = c("October", "December", "February"), season = c("Fall", "Winter", "Winter"), daylight_savings = c("yes", "no", "no")) ``` --- ## Problem 1 (a solution) **Goal:** You have two data sets, `a` and `b`. Each contain the same attributes (i.e. columns) -- `month`, `season`, and `daylight_savings`. Identify which records (i.e. rows of data) in data table `a` are not contained in data table `b`. ```r a <- data.frame(month = c("October", "November", "December", "January", "February"), season = c("Fall", "Fall", "Winter", "Winter", "Winter"), daylight_savings = c("yes", "partly", "no", "no", "no")) b <- data.frame(month = c("October", "December", "February"), season = c("Fall", "Winter", "Winter"), daylight_savings = c("yes", "no", "no")) *difference <- anti_join(a, b) difference ``` ``` ## month season daylight_savings ## 1 November Fall partly ## 2 January Winter no ``` --- ## Problem 2 **Goal:** Modify the following code such that it sorts any **blue** animals from the `animals` vector into the `blue_animals` vector. ```r ##~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ vector of animals ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~ animals <- c("blue tang", "red panda", "Blue jay", "green sea turtle", "blue morpho butterfly", "Blue iguana", "Red squirrel") ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ initialize empty vector to store blue-colored animals ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ blue_animals <- c() ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ sort blue animals into their own list ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ for (i in 1:length(animals)){ current_animal <- animals[i] if(isTRUE(str_detect(current_animal, "blue"))){ message("The '", current_animal, "' is a blue animal") blue_animals <- current_animal } } ``` --- ## Problem 2 (a solution) **Goal:** Modify the following code such that it sorts any **blue** animals from the `animals` vector into the `blue_animals` vector. ```r ##~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ vector of animals ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~ animals <- c("blue tang", "red panda", "Blue jay", "green sea turtle", "blue morpho butterfly", "Blue iguana", "Red squirrel") ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ initialize empty vector to store blue-colored animals ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ blue_animals <- c() ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ~ sort blue animals into their own list ---- ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ for (i in 1:length(animals)){ current_animal <- animals[i] * if(isTRUE(str_detect(current_animal, "(?i)blue"))){ message("The '", current_animal, "' is a blue animal") * blue_animals <- append(blue_animals, current_animal) } } ``` --- ## Problem 3 (slide 1/2) **Goal:** Modify the following code (on slide 2/2) to recreate the plot below .center[ <img src="media/penguin_plot.png" width="75%" /> ] --- ## Problem 3 (slide 2/2) ```r library(tidyverse) library(palmerpenguins) ggplot(data = penguins, aes(x = species, y = body_mass_g), shape = sex) + geom_point(alpha = 0.2, position = position_jitterdodge(dodge.width = 0.8)) + # means & standard devs stat_summary(mapping = aes(color = species), fun = "mean", geom = "point", size = 4) + stat_summary(mapping = aes(color = species), fun = "mean", geom = "errorbar", size = 1, width = 0.2, fun.max = function(x) mean(x) + sd(x), fun.min = function(x) mean(x) - sd(x)) + # change colors/shapes scale_color_manual(values = c("#FF8C02", "#A93FF1", "#148F90"), name = "Species") + scale_shape_manual(values = c(15, 16), name = "Sex") + # add nicer axis labels + title + caption labs(x = "Penguin Species", y = "Body Mass (g)", title = "Body mass of female vs. male adelie, chinstrap, and gentoo penguins", subtitle = "Colored points represent mean body mass (± SD)", caption = "Data Source: Dr. Kristen Gorman, LTER Palmer Station") + theme_classic() + theme( plot.title = element_rect(hjust = 0, size = 14), axis.text = element_text(color = "black", size = 10), axis.title = element_rect(color = "black", size = 14), plot.caption = element_text(size = 7, hjust = 1, color = "gray", face = "italic"), panel.border = element_rect(color = "black", size = 0.7, fill = NA)) ``` --- ## Problem 3 (a solution) ```r library(tidyverse) library(palmerpenguins) *ggplot(data = na.omit(penguins), aes(x = species, y = body_mass_g, shape = sex)) + geom_point(alpha = 0.2, position = position_jitterdodge(dodge.width = 0.8)) + # means & standard devs stat_summary(mapping = aes(color = species), fun = "mean", geom = "point", size = 4, * position = position_dodge(width = 0.8)) + stat_summary(mapping = aes(color = species), fun = "mean", geom = "errorbar", size = 1, width = 0.2, fun.max = function(x) mean(x) + sd(x), fun.min = function(x) mean(x) - sd(x), * position = position_dodge(width = 0.8)) + # change colors/shapes scale_color_manual(values = c("#FF8C02", "#A93FF1", "#148F90"), name = "Species") + scale_shape_manual(values = c(15, 16), name = "Sex") + # add nicer axis labels + title + caption labs(x = "Penguin Species", y = "Body Mass (g)", title = "Body mass of female vs. male adelie, chinstrap, and gentoo penguins", subtitle = "Colored points represent mean body mass (± SD)", caption = "Data Source: Dr. Kristen Gorman, LTER Palmer Station") + theme_classic() + theme( * plot.title = element_text(hjust = 0, size = 14), axis.text = element_text(color = "black", size = 10), * axis.title = element_text(color = "black", size = 14), plot.caption = element_text(size = 7, hjust = 1, color = "gray", face = "italic"), panel.border = element_rect(color = "black", size = 0.7, fill = NA)) ```