EDS 296: Data Science Portfolios
GitHub Tools
November 8th, 2024
Demonstrate by doing
GitHub provides many cool project management features that facilitate organization, collaboration, coding, and building workflows. By using these tools (while working solo and while working with others), you can demonstrate your technical and programming proficiencies.
Art by Allison Horst
Track ideas & TODOs using GitHub issues
Add a new issue from a repo’s Issues tab
“GitHub Issues are items you can create in a repository to plan, discuss and track work. Issues are simple to create and flexible to suit a variety of scenarios. You can use issues to track work, give or receive feedback, collaborate on ideas or tasks, and efficiently communicate with others.”
A few helpful features:
Learn more by exploring GitHub Docs
Explore some real issues on the very real ggplot2 repository
When should I use issues?
Issues are a useful and valuable tool for tracking TODOs, jotting down ideas, recording bugs, etc., regardless of whether you’re working alone or with collaborators.
Like most things, it’s great to put some care and thought into writing an issue (especially when collaborating or contributing thoughts to a public project e.g. ggplot2). . .
. . . but I’d also argue that a hasty issue on a personal project can still go a long way in helping you remember a helpful resource, or that idea that popped into your mind at a time you couldn’t devote much attention to it.
Organize and prioritize issues (and pull requests) using GitHub projects
Create a project from your profile’s Projects tab
“Projects are an adaptable collection of items that stay up-to-date with GitHub data. Your projects can track issues, pull requests, and ideas that you note down. You can add custom fields and create views for specific purposes.”
A few helpful features:
Learn more by exploring GitHub Docs
An example project for my personal projects.
When should I use projects?
If you use issues, projects may offer an additional helpful way to organize your tasks.
Code is oftentimes spread across multiple repositories (capstones & GPs are an excellent examples of this!) – projects can be particularly helpful for tracking TODOs and progress across them.
Projects are not required – you can decide if they are a helpful tool for you / your team.
Collaborate with teams across shared projects (repos) using GitHub organizations
Create an organization from your GitHub profile
“Organizations are shared accounts where businesses and open-source projects can collaborate across many projects at once, with sophisticated security and administrative features.”
Click on the Create new (“+”) button or by clicking on your profile image (top right corner) > Your organizations > New organization. Choose the “free” option.
A few helpful features:
Learn more by exploring GitHub Docs. Also note the importance of assigning multiple organization owners.
The NCEAS Organization is home to over 100 repositories, and many members, teams, and projects
When should I use organizations?
GitHub organizations are extremely helpful when collaborating with teams of people within a company / group. Benefits include:
See this community discussion for more information.
Host reports, documents, websites, etc. with GitHub Pages
GitHub Pages can be enabled for any repo
You can host one website or rendered HTML document from any public GitHub repository.
Hosting additional websites via GHP:
github.io
suffix, e.g. <username>.github.io
– all other URLs will be structured as <username>.github.io/<repoName>
)Hosting a rendered HTML document via GHP:
index.html
and live in your repo’s root directory; no other configurations necessaryindex.html
, then push to GitHub to update deploymentThe materials for creating your personal website using Quarto (above) is one example of a published document. Similarly, the slides for customizing your Quarto website are also published using GitHub Pages.
When should I use GitHub Pages?
You can use GitHub Pages to host projects and resources that you want to share publicly with others (e.g. colleagues, clients, potential employers, etc.).
GitHub Pages can be enabled from any public repo owned by a personal profile or organization.
Consider hosting instructional documentation, software user guides, reports, project websites, etc.
Automate workflows with GitHub Actions
What is GitHub Actions?
“GitHub Actions (GHA) is a 1continuous integration and 2continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline.”
You can use GHA to automate pretty much anything (truly)! But some concrete examples:
main
1Continuous integration is a software practice that requires frequently committing code to a shared repository. 2Continuous deployment is the practice of using automation to publish and deploy software updates. Learn more by exploring GitHub Docs.
Some definitions
Read more on GitHub Docs
(right) An example GitHub Actions workflow for building and deploying a Quarto website.
(left) The repository’s Actions tab, where you can monitor the status of your GitHub Actions workflows.
When should I use GitHub Actions?
Consider using GitHub Actions whenever you want to automate tasks like building, testing, and deploying code from your GitHub repository.
Setting up a GHA workflow from scratch can be a bit intimidating, so it’s great to make use of workflow templates, which can be used as-is or modified for your custom workflow. You can sometimes also find templates provided by other tools for automating specific tasks (e.g. Quarto provides templates for executing R or Python code and rendering output to GitHub Pages).
It can be helpful to read a bit more about the YAML syntax used in workflow files before diving into creating or modifying your own workflows.
BONUS: An example GHA workflow for automating Quarto website builds and deployments
I’m very much a newbie when it comes to GitHub Actions, but I’ve found Melissa Van Bussel’s video on Publishing a Quarto Project with GitHub Pages + GitHub Actions to be a really helpful place to start (and it inspired this demo, along with discussion with Camila Vargas Poulsen & Nick Lyon).
Let’s demonstrate with mysite
(from week 0)
The next few slides walk through setting up a GitHub Action that automates the building and deployment of a basic Quarto website, which may contain R code (e.g. rendered as part of a blog post). Up until now, I’ve been manually building mysite
locally, then pushing the rendered files (in the docs/
folder) to GitHub, where GitHub Pages deploys from.
Rather than building a workflow from scratch, we’ll use a workflow template provided in the Quarto documentation.
We’ll follow these general steps:
gh-pages
branch, where our rendered website files will be storedgh-pages
branchNOTE: GHA workflows will (likely) take longer than local builds
Whenever an event triggers a GHA workflow, GitHub spins up a virtual machine (i.e. a runner) where our defined jobs are executed. You can think of a runner as a brand new (mostly blank slate) computer. We’ll make use of a GitHub-hosted runner, though you can host your own runners.
You must provide all necessary pieces of software to this runner (e.g. R, RStudio, Quarto, repo code, etc.). You do so in your workflow script.
As a result, an automated GHA workflow will take more time to complete than if you were to build your website locally, then push all rendered files (in a docs/
folder) to GitHub for GitHub Pages to deploy. This is, in part, because you already have all necessary pieces of software for rendering your website installed on your local machine.
1. Set up a virtual environment
We’ll want to set up a virtual environment for our project to ensure that our code is reproducible across different machines (including both our local development environment and the GitHub-hosted runner, where our workflow will be executed). Here, we’ll do so using the {renv}
package.
Steps:
{renv}
(if necessary)renv::init()
to initialize renv
in our existing Quarto projectY
when it asks if you want to proceed{yaml}
, if prompted(Re)Familiarizing yourself with the {renv}
workflow is helpful here!
2. Create a gh-pages
branch
The gh-pages
branch is a special branch that you can use to store your built website (i.e. only your website’s rendered files, not the source files (e.g. any .qmd
s)). We’ll eventually configure GitHub Pages to deploy our website from this gh-pages
branch.
Steps:
gh-pages
> click the green Create new branch3. Add a GHA workflow to the repository
Rather than building a workflow from scratch, we can use the one conveniently provided in the Quarto documentation!
Steps:
.github
in your root directory (you can use the Terminal or the New Folder button).github/
create another folder called workflows
workflows/
add a file named publish.yml
(you can use the Terminal or New File > Text file button)publish.yml
output-dir: docs
from _quarto.yml
docs/
folder (we’ll no longer be rendering to / deploying from this directory)4. Reconfigure GitHub Pages
After pushing your updated files to GitHub, you will probably receive an automated email with the Subject line: [yourUserName/repoName] Run failed: pages build and deployment – this is because GitHub Pages is currently looking to redeploy your site from the docs/
folder (we’ve just removed this) on the main
branch.
Our final step is to reconfigure our GitHub Pages to now serve our rendered website from the gh-pages
branch.
Steps:
main
to gh-pages
and from /docs
to /(root)
> click Save > check out Actions tab while your site redeploys (remember, this will take a bit longer than you’re used to!)Try out your GHA!
Test it out!
Your Action will also be triggered if you merge a pull request into main
.
Take a Break
05:00