Image source: Wikipedia
Teach Me How to Google
The case for debugging & search skills in the age of AI + tips on how to do so effectively
Published: October 11, 2021
Last updated: Jul 28, 2025
Sam Shanny-Csik |
Lecturer & Data Training Coordinator
Master of Environmental Data Science |
Bren School of Environmental Science & Management
Slides & source code available on GitHub
Let’s address the elephant in the room . . .
Image source: Wikipedia
We argue, YES!
Evidence suggests that overreliance on ChatGPT can erode critical thinking skills
Group 1: could use ChatGPT
Group 2: could use Google
Group 3: only their brains!
Evidence suggests that overreliance on ChatGPT can erode critical thinking skills
Group 1: could use ChatGPT
Group 2: could use Google
Group 3: only their brains!
Evidence suggests that overreliance on ChatGPT can erode critical thinking skills
Group 1: could use ChatGPT
Group 2: could use Google
Group 3: only their brains!
Evidence suggests that overreliance on ChatGPT can erode critical thinking skills
Group 1: could use ChatGPT
Group 2: could use Google
Group 3: only their brains!
Evidence suggests that overreliance on ChatGPT can erode critical thinking skills
After 3 essays, everyone was asked to re-write one of their previous essays, but Group 1 could no longer use ChatGPT, while Group 2 & Group 3 could now use ChatGPT
“The LLM undeniably reduced the friction involved in answering participants’ questions compared to the Search Engine. However, this convenience came at a cognitive cost, diminishing users’ inclination to critically evaluate the LLM’s output or”opinions” (probabilistic answers based on the training datasets). This highlights a concerning evolution of the ‘echo chamber’ effect: rather than disappearing, it has adapted to shape user exposure through algorithmically curated content. What is ranked as “top” is ultimately influenced by the priorities of the LLM’s shareholders.”
Other motivating findings
“Moreover, while GenAI can improve worker efficiency, it can inhibit critical engagement with work and can potentially lead to long-term overreliance on the tool and diminished skill for independent problem-solving. Higher confidence in GenAI’s ability to perform a task is related to less critical thinking effort. When using GenAI tools, the effort invested in critical thinking shifts from information gathering to information verification; from problem-solving to AI response integration; and from task execution to task stewardship. Knowledge workers face new challenges in critical thinking as they incorporate GenAI into their knowledge workflows.”
Other motivating findings
“Students who substitute some of their learning activities with LLMs (e.g., by generating solutions to exercises) increase the volume of topics they can learn about but decrease their understanding of each topic. Students who complement their learning activities with LLMs (e.g., by asking for explanations) do not increase topic volume but do increase their understanding. We also observe that LLMs widen the gap between students with low and high prior knowledge.”
And one final good analogy
“Using LLMs requires a good baseline knowledge of R [or other languages] to actually be useful. A good analogy for this is with recipes. ChatGPT is really confident at spitting out plausible-looking recipes. A few months ago, for fun, I asked it to give me a cookie recipe. I got back something with flour, eggs, sugar, and all other standard-looking ingredients, but it also said to include 3/4 cup of baking powder. That’s wild and obviously wrong, but I only knew that because I’ve made cookies before.”
From Andrew Heiss’s course guidelines on AI use.
GenAI in the MEDS calendar
Term | Incorporation of GenAI |
---|---|
SUMMER | Establish context Student use is discourage |
FALL | Critical interrogation Instructors demonstrate examples of use and discuss pros / cons |
WINTER | Guided Use Workshops early in quarter Instructors model use |
SPRING | Supported Use Instructors model use |
You’re here because you want to learn! ChatGPT (and related tools) will certainly become a part of your workflow, but in this early stage of MEDS, we want you to focus on core competencies and critical thinking skills, including an understanding of how to properly use tools, design workflows, write and organize code, and troubleshoot problems.
To do that most effectively, you need to commit to active learning processes and approaches.
Welcome to data science, where questions are aplenty!
-Me, everytime I sit down to program
It doesn’t mean you won’t still feel like this at times:
-Me still, about half the times I sit down to program
But the goal is to be a bit more at peace with that feeling…and have the confidence that you can find your way
Artwork by Allison Horst
I typically find myself turning to Google because:
I got an error and need help fixing it
I know what I want my code to do, but I have no idea how to actually pull it off
Sometimes, it’s both of these things happening at the same time
We’ve all been here before:
Artwork by Allison Horst
Pause, exhale, narrow down your potential Google search
Restart R
Check the easy stuff
Read that error message
Try to islate the problem
Double-check the documentation
Talk about it out loud
Restart R
“Restart R often, especially when things get weird…We install and update packages from R, which is a little bit like working on your airplane engine while you’re flying.”
-Jenny Bryan, in her 2020 rstudio::conf keynote, Object of type ‘closure’ is not subsettable
Similarly, going to sleep and trying again tomorrow is a legitimate (and often impactful) strategy – think of it as restarting your own internal computer (i.e. your brain).
Check the easy stuff
Source: This tweet by @cogscimom
Read that error message
# load packages ----
library(tidyverse) # a collection of data wrangling & visualization packages
library(palmerpenguins) # contains the 'penguins' data set
# print out the first three rows of the penguins data frame ----
head(penguins, 3)
# A tibble: 3 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
# ℹ 2 more variables: sex <fct>, year <int>
Read that error message
# create a new data frame with just rows (observations) containing "Gentoo" penguins ----
gentoo <- penguins |>
filter(species = "Gentoo")
Error in `filter()`:
! We detected a named input.
ℹ This usually means that you've used `=` instead of `==`.
ℹ Did you mean `species == "Gentoo"`?
Returns a helpful error message with a potential fix!
Read that error message
“Your first “object of type ‘closure’ is not subsettable” error message is a big milestone for an R user. Congratulations, if there was any lingering doubt, you now know that you are officially programming!“
-Jenny Bryan, in her 2020 rstudio::conf keynote, Object of type ‘closure’ is not subsettable
This error often arises when you attempt to subset a function (i.e. treat a function in a way that it is shouldn’t be; a “closure” is a type of function in R). Here, we forgot that we called called our object dat
, and not df
. df()
also happens to be a function that gives you the density of the ‘F’ distribution and we are attempting to subset (i.e. extract) a column (x
) from it.
Read that error message
Error messages provide helpful context and information, even if they seem unhelpful on the surface!
You’ll become more familiar with common error messages the more time you spend coding, but it can be helpful to explore some resources for deciphering the big ones:
Artwork by Allison Horst
Try to isolate the problem
It can be overwhelming to figure out where an error or issue is occurring in a large chunk of code. A small example:
# load libraries ----
library(dplyr)
library(palmerpenguins)
# wrangle data ----
penguins_new <- penguins |>
select(species, sex, bill_length_mm) |>
filter(species == "Adelie") |>
reorder(bill_length_mm)
Error: object 'bill_length_mm' not found
Running all lines together can make it difficult which line(s) is responsible for this error (and imagine dealing with much longer, more complex code chunks!).
Instead, run line-by-line to isolate where the problem is occurring so that you can begin investigating from there.
Try to isolate the problem
Run line-by-line until you hit the error:
Works!
Works!
Try to isolate the problem
Searching for reorder()
(either by looking up documentation – more on that in a moment – or Googling it) reveals that it’s not actually a function
Googling, “R tidyverse reorder values high to low,” leads us to the {dplyr}
documentation for the arrange()
function, which allows us to sort values in descending order when coupled with desc()
:
penguins_new <- penguins |>
select(species, sex, bill_length_mm) |>
filter(species == "Adelie") |>
arrange(desc(bill_length_mm))
head(penguins_new, 4)
# A tibble: 4 × 3
species sex bill_length_mm
<fct> <fct> <dbl>
1 Adelie male 46
2 Adelie male 45.8
3 Adelie male 45.6
4 Adelie male 44.1
It can be easy to confuse or mistake function names, particularly as you’re just starting to learn a langauge or new packages (e.g. forcats::fct_reorder()
is used to reorder factor levels, but here, we’re looking to reorder numeric values in the bill_length_mm
column).
Double-check the documentation
Documentation provides critical info for understanding how to correctly use a package or function
Pull up documentation for a loaded function by typing ?function_name
in your console. E.g.
Open up RStudio and practice pulling up the documentation for filter()
Double-check the documentation
With the person(s) next to you, explore the documentation and consider the following (and be prepared to share out):
filter()
function used for? Where did you locate this information?.data
and ...
arguments do? How easy or difficult of a time did you have understanding the descriptions?filter()
function works?04:00
Double-check the documentation
Vignettes are long-form guides / tutorials for R packages. These can offer helpful (and often less jargony) examples and explanations for how to use various functions. Check for a vignette by typing vignette("package_name")
in your console. E.g. if we want to learn more about how to use filter()
, which comes from the {dplyr}
package:
Not all packages will have a vignette. Vignettes do exist for some Python libraries as well.
Talk about it out loud
This is often referred to as rubber duck debugging, and it goes something like this:
1. Beg, borrow, steal, buy, fabricate or otherwise obtain a rubber duck (bathtub variety).
2. Place rubber duck on desk and inform it you are just going to go over some code with it, if that’s all right.
3. Explain to the duck what your code is supposed to do, and then go into detail and explain your code line by line.
4. At some point you will tell the duck what you are doing next and then realise that that is not in fact what you are actually doing. The duck will sit there serenely, happy in the knowledge that it has helped you on your way.
-rubberduckdebugging.com with original credit to Andy from lists.ethernal.org
Image source: Wikipedia
Talk about it out loud
But how is rubber duck debugging different than using ChatGPT?
Talking through your problem out loud forces you to systematically think through your logic step-by-step. This process often helps to identify previously overlooked details, errors, or logical inconsistencies.
In contrast, tools like ChatGPT tend to deliver complete (and sometimes incorrect) answers or solutions, without requiring you to engage in the same level of critical thinking.
Talk about it out loud
With careful prompting, ChatGPT can act as a rubber ducky (though we encourage talking to one another for now!):
“through carefully crafted prompts and easily accessible platforms, rubber duck LLMs can assist learners with specific questions while also situating those questions alongside larger computer science concepts and computational thinking practices.”
An example ChatGPT prompt:
“I am getting an error in my R code. Instead of giving me the answer, can you help me logically think through my code line-by-line so that I might identify the problem on my own? Here is my code:
[code here]
My error message is: [error message here]“”
Check out this example conversation using a code example from earlier in the slides!
Still stumped? Create a reprex
!
Create a reprex
i.e. a minimal reproducible example. Strip away the cruft and keep only what is required to reproduce your issue. More often than not, this will help you solve your own probelm. And if you still haven’t figured it out, you can bring your reprex
to a friend, colleague, or online community – making it easier for others to help you should always be the goal!
Artwork by Allison Horst
Check out FAQ: How to do a minimal reproducible example (reprex) for beginners and Reprex do’s and dont’s to get started on creating your first reprex.
Still haven’t figured it out after lots of debugging? Enter Google.
General Googling tips:
Check the date of online solutions:
Read multiple search results:
Isolate the relevant part(s) of an answer:
Use search operators to get more specific results
For example, compare these two Google queries: r dplyr join vs. site:stackoverflow.com R “dplyr join”
So where on Google should I actually look?
There are many excellent resources online, but the following are great places to start:
Q&A forums
GitHub
Documentation
Personal Blogs
Q&A forums
Stack Overflow has historically1 been the question-and-answer website for programmers. Refine your query using search operators, and look for accepted anwers with many upvotes:
I return to this particular Stack Overflow post on undoing a git commit and this post on checking out a remote git branch all the time (probably at least 1x/month!)
Q&A forums
Stack Overflow initially banned ChatGPT-written responses, but now partners with a number of AI chatbot developers, including OpenAI – this means that tools like ChatGPT are trained on Stack Overflow content (among other sources).
1Stack Overflow has seen a sharp decline in user activity since ChatGPT was released as users increasingly turn to GenAI tools.
The implications of this decline still remain to be fully seen, though research suggests that replacing human content creation with AI-generated responses may make it more difficult to train future AI models. This change also represents a shift of knowlege sharing from the public to private domains (del Rio-Chanona et al. 2024).
Q&A forums
Does this mean I should stop using community-based Q&A forums? No! They offer a unique and complementary value to AI chatbots, particularly when developing your core competencies and critical thinking skills, including:
AI is trained on past knowledge, forums offer living knowledge
Also check out:
{tidyverse}
, RStudio and more)GitHub
The source code for most (if not all) of your favorite data science software (e.g. packages / libraries) lives on GitHub. Check for known issues or fixes underway by exploring open issues and pull requests. Some repositories may also include additional documentation or examples using wikis (e.g. see the {xaringan}
wiki).
Open issues for {ggplot2}
Open pull requests for {ggplot2}
Documentation
We already saw how to access documentation directly from RStudio, but looking at the official documentation online may reveal additional helpful articles, FAQs, and resources. Documentation sites built with pkgdown are particularly easy to navigate (they all are structured the same) and approachable. For example, the {ggplot2}
pkgdown documentation:
Google package-name documentation to find the online version of a package’s documentation.
Personal blogs
I love a good blog post – and the data science field is full of incredible bloggers who make complex computational tasks more approachable.
A still-very-relevant tweet by David Robinson
Blogs are an excellent place to find tutorials interwoven with engaging narratives (at least more engaging than documentation), less jargon (or jargon that’s explained), and often include example code, outputs, and other visuals.
You’ll be writing your own blog posts as part of course assignments throughout the year!
Personal blogs
How do I find folks to follow? Here are some bloggers that I’ve personally enjoyed, but you might also consider joining Bluesky (a Twitter replacement that’s gaining popularity with the data science community) and join some relevant “starter packs” (you can find recommendations on the EDS 240 course website):
What else can should I be doing?
Closing thoughts
When you take the time to troubleshoot your own code, either independently or with help from Google, you’re doing more than just fixing errors. You are actively developing the core skills that make great data scientists:
By building these foundational technical and problem-solving skills now, you’re setting yourself up for long-term success.
et oogling!
If you see mistakes or want to suggest changes, please create an issue on the source repository.